Python 3 File Paths

If you’ve used Python for a while you will probably be familiar with the os module for working with files and directories; often called pathnames by Linux users. In moving to Python 3 you may continue to use the same os and os.path functions from Python 2.7, however a new pathlib module provides an alternative object-oriented (OO) approach.

In this posting, we examine the common file handling situations; comparing the OO approach of pathlib against the procedural approach of os functions.

Current Directory

To obtain the current working directory with os functions we use getcwd():

import os
cwd = os.getcwd()
print(cwd)

The cwd variable is a standard Python string object containing the directory path:

/home/feabhas

On Windows this would be, for example:

C:\Users\Feabhas

In contrast, the OO approach provided by pathlib is to create a Path object with no arguments:

from pathlib import Path
cwd = Path()
print(cwd)

Path objects encapsulate the concept of a path (the name of a file or directory on disk) but do not necessarily imply that the path exists. A Path object can be queried and used to manipulate pathnames. Converting a Path to a string for printing returns the simplest form of the path which for the current directory is always:

.

To get the full, or absolute, pathname use:

print(cwd.absolute())

which will display the same string value as returned by os.getcwd().

Directory Listing

The names in the current directory are returned by os.listdir() as a list of strings:

names = os.listdir()

Normally you’d want to examine these files. In this example, we’ll simply get the file size and modification date. To start with we need to convert each name into a pathname using os.path.join():

for name in os.listdir(cwd):
    path = os.path.join(cwd, name)

We can now use this path variable to access the file on disk. If you’ve read the wrong tutorial you may, in the past, have written:

path = cwd + '/' + name

Which works but isn’t very Pythonic If you’re a Windows user and write:

path = cwd + '\\' + name

then you have missed the part that says Python treats both forward and backward slash characters as directory separators on both Linux and Windows. In other words, it doesn’t matter which form of slash character you use – it’s all the same to Python. Of course, you may have got around the forward/backward slash issue by using the host separator character from the os module:

path = cwd + os.sep + name

None of these concatenation solutions is as clean or maintainable as the os.path.join() function.

Returning to the directory listing example code, we can read the file size and modification timestamp using two separate os.path function calls, e.g.:

import os
From datetime import datetime
for name in os.listdir(cwd):
    path = os.path.join(cwd, name)
    size = os.path.getsize(path)
    mtime = datetime.fromtimestamp(os.path.getmtime(path))
    print(f'{name} {size} bytes, modified {mtime}')

For the pathlib.Path code example we can use iterdir() which is an iterator function, which also fits nicely with our for loop approach. As the iterator yields up Path objects there is no need to join the file name with the parent directory:

for path in cwd.iterdir():

Accessing file information from a Path object is a single method call. This returns all statistics in a single object which is likely to be more efficient than the multiple os.path function calls:

stats = path.stat()

The revised OO code looks thus:

from pathlib import Path
for path in cwd.iterdir():
    stats = path.stat()
    size = stats.st_size
    mtime = datetime.fromtimestamp(stats.st_mtime)
    print(f'{path} {size} bytes, modified {mtime}')

Building Pathnames

We’ve already looked at how os.path.join() is used to build filenames. With Path objects there is a similar joinpath() method used to access files from a directory path.

Consider where we want to access a file called ‘example.settings’ in a ‘conf’ sub-directory. Using os.path.join() we would write:

settings = os.path.join(cwd, 'conf', 'example.settings')

Using a Path object we now write:

settings = cwd.joinpath('conf').joinpath('example.settings')

Alternatively, we could use the / operator on a Path object and a string (or anther Path object):

settings = cwd / 'config' / 'example.settings'

Path objects also support wildcard filename expansion, e.g.:

py_files = list(cwd.glob('*.py'))

The glob() method returns an iterator; which in this case we’ve used to initialise a list containing Path objects.

The traditional approach for expanding wildcards is to use the glob module to get back a list of strings:

import glob
py_files = glob.glob('*.py')

Both versions of the glob() method/function also support recursive wildcards using ‘**’ (but the glob.glob() method requires a recursive=True argument to enable this feature):

py_files = list(cwd.glob('**/*.py'))
py_files = glob.glob('**/*.py', recursive=True)

This capability means that you no longer need to use the cumbersome os.walk() method if all you want to do is find files that match a wildcard pattern.

Manipulating Files

The os module has many functions for manipulating files. The following example checks for a file called example.log in the current directory and then deletes it.

Using the os module:
logfile = './example.log'
if os.path.exists(logfile):
    os.remove(logfile)

The same solution using the pathlib module

logfile = Path('./example.log')
if logfile.exists():
    logfile.unlink()

Directories can be created using the os.mkdir() function or Path.mkdir() method and empty directories removed with the rmdir() function or method.

Opening Files

Finally, Python 3.6 upgraded the os and os.path functions as well as file open() to work with both strings and Path objects: this is formalised as a Path Like Object.

The procedural approach for opening all Python files in the current directory looks like:

for path in glob.iglob('*.py'):
    with open(path) as fp:
        pass

The object oriented approach looks like:

for path in Path().glob('*.py'):
    with open(path) as fp:
        pass

The Path class has a lot more capabilities in addition to those shown here. It is a more or less complete replacement for all of the os, os.path and glob functions and is described in the Python 3 pathlib documentation.

Summary

If you like OO languages such as C++ or Java you’ll probably prefer the Python 3 object-oriented approach using Path object. Whereas if you’re used to the procedural approach of C you may be be more comfortable using the os, os.path and glob functions.

As long as you’re aware that both approaches are available with Python 3 you can make an informed decision over which one to use, rather than continue to use the procedural approach without being aware that there is an alternative.

Martin Bond
Latest posts by Martin Bond (see all)
Dislike (0)
+ posts

An independent IT trainer Martin has over 40 years academic and commercial experience in open systems software engineering. He has worked with a range of technologies from real time process controllers, through compilers, to large scale parallel processing systems; and across multiple sectors including industrial systems, semi-conductor manufacturing, telecomms, banking, MoD, and government.

About Martin Bond

An independent IT trainer Martin has over 40 years academic and commercial experience in open systems software engineering. He has worked with a range of technologies from real time process controllers, through compilers, to large scale parallel processing systems; and across multiple sectors including industrial systems, semi-conductor manufacturing, telecomms, banking, MoD, and government.
This entry was posted in Python, Python3. Bookmark the permalink.

Leave a Reply