Python 3 File Paths

If you’ve used Python for a while you will probably be familiar with the os module for working with files and directories; often called pathnames by Linux users. In moving to Python 3 you may continue to use the same os and os.path functions from Python 2.7, however a new pathlib module provides an alternative object-oriented (OO) approach.

In this posting, we examine the common file handling situations; comparing the OO approach of pathlib against the procedural approach of os functions.

Current Directory

To obtain the current working directory with os functions we use getcwd():

import os
cwd = os.getcwd()
print(cwd)

The cwd variable is a standard Python string object containing the directory path:

/home/feabhas

On Windows this would be, for example:

C:\Users\Feabhas

In contrast, the OO approach provided by pathlib is to create a Path object with no arguments:

from pathlib import Path
cwd = Path()
print(cwd)

Path objects encapsulate the concept of a path (the name of a file or directory on disk) but do not necessarily imply that the path exists. A Path object can be queried and used to manipulate pathnames. Converting a Path to a string for printing returns the simplest form of the path which for the current directory is always:

.

To get the full, or absolute, pathname use:

print(cwd.absolute())

which will display the same string value as returned by os.getcwd().

Directory Listing

The names in the current directory are returned by os.listdir() as a list of strings:

names = os.listdir()

Normally you’d want to examine these files. In this example, we’ll simply get the file size and modification date. To start with we need to convert each name into a pathname using os.path.join():

for name in os.listdir(cwd):
    path = os.path.join(cwd, name)

We can now use this path variable to access the file on disk. If you’ve read the wrong tutorial you may, in the past, have written:

path = cwd + '/' + name

Which works but isn’t very Pythonic If you’re a Windows user and write:

path = cwd + '\\' + name

then you have missed the part that says Python treats both forward and backward slash characters as directory separators on both Linux and Windows. In other words, it doesn’t matter which form of slash character you use – it’s all the same to Python. Of course, you may have got around the forward/backward slash issue by using the host separator character from the os module:

path = cwd + os.sep + name

None of these concatenation solutions is as clean or maintainable as the os.path.join() function.

Returning to the directory listing example code, we can read the file size and modification timestamp using two separate os.path function calls, e.g.:

import os
From datetime import datetime
for name in os.listdir(cwd):
    path = os.path.join(cwd, name)
    size = os.path.getsize(path)
    mtime = datetime.fromtimestamp(os.path.getmtime(path))
    print(f'{name} {size} bytes, modified {mtime}')

For the pathlib.Path code example we can use iterdir() which is an iterator function, which also fits nicely with our for loop approach. As the iterator yields up Path objects there is no need to join the file name with the parent directory:

for path in cwd.iterdir():

Accessing file information from a Path object is a single method call. This returns all statistics in a single object which is likely to be more efficient than the multiple os.path function calls:

stats = path.stat()

The revised OO code looks thus:

from pathlib import Path
for path in cwd.iterdir():
    stats = path.stat()
    size = stats.st_size
    mtime = datetime.fromtimestamp(stats.st_mtime)
    print(f'{path} {size} bytes, modified {mtime}')

Building Pathnames

We’ve already looked at how os.path.join() is used to build filenames. With Path objects there is a similar joinpath() method used to access files from a directory path.

Consider where we want to access a file called ‘example.settings’ in a ‘conf’ sub-directory. Using os.path.join() we would write:

settings = os.path.join(cwd, 'conf', 'example.settings')

Using a Path object we now write:

settings = cwd.joinpath('conf').joinpath('example.settings')

Alternatively, we could use the / operator on a Path object and a string (or anther Path object):

settings = cwd / 'config' / 'example.settings'

Path objects also support wildcard filename expansion, e.g.:

py_files = list(cwd.glob('*.py'))

The glob() method returns an iterator; which in this case we’ve used to initialise a list containing Path objects.

The traditional approach for expanding wildcards is to use the glob module to get back a list of strings:

import glob
py_files = glob.glob('*.py')

Both versions of the glob() method/function also support recursive wildcards using ‘**’ (but the glob.glob() method requires a recursive=True argument to enable this feature):

py_files = list(cwd.glob('**/*.py'))
py_files = glob.glob('**/*.py', recursive=True)

This capability means that you no longer need to use the cumbersome os.walk() method if all you want to do is find files that match a wildcard pattern.

Manipulating Files

The os module has many functions for manipulating files. The following example checks for a file called example.log in the current directory and then deletes it.

Using the os module:
logfile = './example.log'
if os.path.exists(logfile):
    os.remove(logfile)

The same solution using the pathlib module

logfile = Path('./example.log')
if logfile.exists():
    logfile.unlink()

Directories can be created using the os.mkdir() function or Path.mkdir() method and empty directories removed with the rmdir() function or method.

Opening Files

Finally, Python 3.6 upgraded the os and os.path functions as well as file open() to work with both strings and Path objects: this is formalised as a Path Like Object.

The procedural approach for opening all Python files in the current directory looks like:

for path in glob.iglob('*.py'):
    with open(path) as fp:
        pass

The object oriented approach looks like:

for path in Path().glob('*.py'):
    with open(path) as fp:
        pass

The Path class has a lot more capabilities in addition to those shown here. It is a more or less complete replacement for all of the os, os.path and glob functions and is described in the Python 3 pathlib documentation.

Summary

If you like OO languages such as C++ or Java you’ll probably prefer the Python 3 object-oriented approach using Path object. Whereas if you’re used to the procedural approach of C you may be be more comfortable using the os, os.path and glob functions.

As long as you’re aware that both approaches are available with Python 3 you can make an informed decision over which one to use, rather than continue to use the procedural approach without being aware that there is an alternative.

Posted in Python, Python3 | Leave a comment

Brace initialization of user-defined types

Uniform initialization syntax is one of my favourite features of Modern C++.  I think it’s important, in good quality code, to clearly distinguish between initialization and assignment.

When it comes to user-defined types – structures and classes – brace initialization can throw up a few unexpected issues, and some counter-intuitive results (and errors!).

In this article, I want to have a look at some of the issues with brace initialization of user-defined types – specifically, brace elision and initializer_lists.

Read on for more…

Continue reading

Posted in C/C++ Programming, General | Tagged , , , , , , , | Leave a comment

Thanks for the memory (allocator)

One of the design goals of Modern C++ is to find new ways – better, more effective – of doing things we could already do in C++.  Some might argue this is one of the more frustrating aspects of Modern C++ – if it works, don’t fix it (alternatively: why use lightbulbs when we have perfectly good candles?!)

This time we’ll look at a new aspect of Modern C++:  the Allocator model for dynamic containers.  This is currently experimental, but has been accepted into C++20.

The Allocator model allows programmers to provide their own memory management strategy in place of their library’s default implementation.  Although it is not specified by the C++ standard, many implementations use malloc/free.

Understanding this feature is important if you work on a high-integrity, or safety-critical, project where your project standards say ‘no’ to malloc.

Continue reading

Posted in C/C++ Programming, General | Tagged , , , , , , , , , , , | 3 Comments

Python 3 Unicode and Byte Strings

A notable difference between Python 2 and Python 3 is that character data is stored using Unicode instead of bytes. It is quite likely that when migrating existing code and writing new code you may be unaware of this change as most string algorithms will work with either type of representation; but you cannot intermix the two.

If you are working with web service libraries such as urllib (formerly urllib2) and requests, network sockets, binary files, or serial I/O with pySerial  you will find that data is now stored as byte strings.

Continue reading

Posted in Python, Python3 | Leave a comment

Python 3 Type Hints

The expected end of support for Python 2.7 is 1st January 2020, at least according to Guido van Rossum’s blog post. Starting now, you should consider developing all new Python applications in Python 3, and migrating existing code to Python 3 as and when time and workload permit.

Moving to Python 3

If you are unaware of the changes introduced in Python 3 that broke backward compatibility with Python 2 then there is a good summary on this What’s New In Python 3.0 web page.

The biggest difference you will notice moving to Python 3 is that the print statement is now a print function. But there are plenty of other changes that you should be aware of. This and subsequent blogs will look at aspects of Python has been added or improved in Python 3.

Continue reading

Posted in Python, Python3, Testing | Leave a comment

Peripheral register access using C Struct’s – part 1

When working with peripherals, we need to be able to read and write to the device’s internal registers. How we achieve this in C depends on whether we’re working with memory-mapped IO or port-mapped IO. Port-mapped IO typically requires compiler/language extensions, whereas memory-mapped IO can be accommodated with the standard C syntax.

Embedded “Hello, World!”

We all know the embedded equivalent of the “Hello, world!” program is flashing the LED, so true to form I’m going to use that as an example.

The examples are based on a STM32F407 chip using the GNU Arm Embedded Toolchain .

The STM32F4 uses a port-based GPIO (General Purpose Input Output) model, where each port can manage 16 physical pins. The LEDS are mapped to external pins 55-58 which maps internally onto GPIO Port D pins 8-11.

Flashing the LEDs

Flashing the LEDs is fairly straightforward, at the port level there are only two registers we are interested in.

  • Mode Register – this defines, on a pin-by-pin basis what its function is, e.g. we want this pin to behave as an output pin.
  • Output Data Register – Writing a ‘1‘ to the appropriate pin will generate voltage and writing a ‘0‘ will ground the pin.

Mode Register (MODER)

Each port pin has four modes of operation, thus requiring two configuration bits per pin (pin 0 is configured using mode bits 0-1, pin 2 uses mode bits 2-3, and so on):

  • 00 Input
  • 01 Output
  • 10 Alternative function (details configured via other registers)
  • 11 Analogue

So, for example, to configure pin 8 for output, we must write the value 01 into bits 16 and 17 in the MODER register (that is, bit 16 => 1, bit 17 => 0).

Output Data Register (ODR)

In the Output Data Register (ODR) each bit represents an I/O pin on the port. The bit number matches the pin number.

If a pin is set to output (in the MODER register) then writing a 1 into the appropriate bit will drive the I/O pin high. Writing 0 into the appropriate bit will drive the I/O pin low.

There are 16 IO pins, but the register is 32bits wide. Reserved bits are read as ‘0’.

Port D Addresses

The absolute addresses for the MODER and ODR of Port D are:

  • MODER – 0x40020C00
  • ODR – 0x40020C14

Pointer access to registers

Typically when we access registers in C based on memory-mapped IO we use a pointer notation to ‘trick’ the compiler into generating the correct load/store operations at the absolute address needed. Continue reading

Posted in ARM, C/C++ Programming, CMSIS, Cortex | Tagged , , | 3 Comments

A brief introduction to Concepts – Part 2

In part 1 of this article we looked at adding requirements to parameters in template code to improve the diagnostic ability of the compiler.  (I’d recommend reading this article first, if you haven’t already)

Previously, we looked at a simple example of adding a small number of requirements on a template parameter to introduce the syntax and semantics.  In reality, the constraints imposed on a template parameter could consist of any combination of

  • Type traits
  • Required type aliases
  • Required member attributes
  • Required member functions

Explicitly listing all of this requirements for each template parameter, and every template function / class gets onerous very quickly.

To simplify the specification of these constraints we have Concepts.

Continue reading

Posted in C/C++ Programming | Tagged , , , , , , , | 4 Comments

A brief introduction to Concepts – Part 1

Templates are an extremely powerful – and terrifying – element of C++ programs.  I say “terrifying” – not because templates are particularly hard to use (normally), or even particularly complex to write (normally) – but because when things go wrong the compiler’s output is a tsunami of techno-word-salad that can overwhelm even the experienced programmer.

The problem with generic code is that it isn’t completely generic.  That is, generic code cannot be expected to work on every possible type we could substitute.  The generic code typically places constraints on the substituted type, which may be in the form of type characteristics, type semantics or behaviours.  Unfortunately, there is no way to find out what those constraints are until you fail to meet them; and that usually happens at instantiation time, far away from your code and deep inside someone else’s hard-to-decipher library code.

The idea of Concepts has been around for many years; and arguably they trace their roots right back to the very earliest days of C++.  Now in C++17 we are able to use and exploit their power in code.

Concepts allow us to express constraints on template types with the goals of making generic code

  • Easier to use
  • Easier to debug
  • Easier to write

In this pair of articles we’ll look at the basics of Concepts, their syntax and usage.  To be open up-front:  this article is designed to get you started, not to make you an expert on Concepts or generic code.

Continue reading

Posted in C/C++ Programming | Tagged , , , , , , , | Leave a comment

Register for our webinar – ‘Introduction to Docker”Introduction to Docker’

Dec 5, 2018 at 10am BST & 4pm BST

The introduction to Docker series is proving popular with our Blog readers, so we have decided to make it the subject for our next webinar.

Docker is a relatively new technology, only appearing just over five years ago. It has become integral to modern continuous integration (CI) and continuous delivery in an Agile world.

In this 45 minute webinar, presented by Niall Cooling, he will introduce Docker and how it can be used in an embedded development workflow. There will also be time for questions.

If you’d like to submit an advance Docker-related question for Niall to include in the webinar, please let us know. You can submit your question when you register or by emailing us info@feabhas.com. We hope you can join us.

Click here to register and reserve a free place for the 10am BST webinar

Click here to register and reserve a free place for the 4pm BST webinar

Posted in Agile, training, webinar | Leave a comment

An Introduction to Docker for Embedded Developers – Part 5 Multi-Stage Builds

Following on from the previous post, where we spent time reducing the docker image size, in this post I’d like to cover a couple of useful practices to further improve our docker image:

  1. Copying local files rather than pulling from the web
  2. Simplifying builds using a multi-stage build

Copying in Local Files

So far, when installing the GCC-Arm compiler, we have pulled it from the web using wget. This technique can suffer from two issues:

  1. Web links are notoriously fragile
  2. https adds complexity to the packages required with smaller base images such as Alpine-linux

An alternative approach, especially if you are managing your Dockerfiles in a git repository, is to pull the required file (e.g. gcc-arm-none-eabi-6-2017-q2-update-linux.tar.bz2) to your local file system and then copy this file into the docker image during the build process.

First we need to download to our local filesystem the version of GCC-Arm we want to use. The latest version can be found at: https://developer.arm.com/open-source/gnu-toolchain/gnu-rm/downloads

As of today, the latest version is 7-2018-q2-update.

I happen to be working on a Mac, but as our image is Linux based, I want to download the Linux 64-bit image gcc-arm-none-eabi-7-2018-q2-update-linux.tar.bz2.

Once downloaded, the local (build) directory contains two files:

.
├── Dockerfile
└── gcc-arm-none-eabi-7-2018-q2-update-linux.tar.bz2

We now modify the Dockerfile to copy from the local file system into our base image using the following command:

COPY <local file> <destination>

So the command (the trailing ‘.’ is to the current container working directory):

COPY gcc-arm-none-eabi-7-2018-q2-update-linux.tar.bz2 .

will copy the zip file from our local file system into the container. We can now go ahead and un-tar it and configure it as before, e.g. Continue reading

Posted in Agile, ARM, C/C++ Programming, Testing | Tagged , | 5 Comments