Disassembling a Cortex-M raw binary file with Ghidra

BlackHat Europe 2022

During the first week of December, I had the pleasure of attending a training course at BlackHat Europe 2022 titled Assessing and Exploiting Control Systems and IIoT run by Justin Searle.

Part of the course involved Assessing and Exploiting Embedded Firmware by reading on-chip Flash using OpenOCD. Unfortunately, we ran out of time to finish the last labs during the training (we ran 9 am-6 pm each day). So I decided to follow along with the very comprehensive notes and finish the last lab.

Note that this is not meant as any criticism of an excellent course. Ideally, I would have liked an extra day (it was already 4-days). As an instructor, I know how often you run out of time based on student questions and lab work.

Once you’ve got a raw binary file, the challenge is to disassemble that. Many toolchains will supply BinUtils tools, such as GCC’s arm-none-eabi-objdump. However, in my experience, this tends to have limited success with raw binary files (I’m sure people far more skilled than myself have greater success).

The most widely referenced tool for reverse engineering code is IDA Pro. IDA Pro is a powerful commercial tool, and I can see why it’s the tool of choice for many professionals. However, the free version doesn’t support Arm (Intel only), and the full version is out of the price range for the casual experimenter.

National Security Agency – Ghidra Software Reverse Engineering Framework

Yes, you’ve read that correctly, the NSA. At the 2019 RSA Conference, the NSA published a press release announcing the release of a tool for reverse engineering. They now have their very own GitHub account.

Taken from the Ghidra GitHub account

Ghidra is a software reverse engineering (SRE) framework created and maintained by the National Security Agency Research Directorate. This framework includes a suite of full-featured, high-end software analysis tools that enable users to analyze compiled code on various platforms, including Windows, macOS, and Linux.

So, let’s give it a go…

Ghidra

Installation

I’m not going to repeat the instructions here, but as I’m running this on a Mac, as per the instructions, I had to install JDK 17. I’ve also successfully installed it on both Win11 and Ubuntu.

Importing the binary

Create the project

Once you create a new (non-shared) project, choose a directory to store it and give it a name. Ghidra then creates a couple of local files (.grd and .rep).

Import the binary

Next, choose the action from the File->Import File… menu item and select the binary file. Note, Ghida can import many different file formats (including .elf, for example).

File Language

As this is a raw binary file, the tool cannot detect underly Application Binary Interface (ABI). If, for example, we had imported an .elf file, it would automatically populate the Language dialogue.

Clicking in the three dots ... to the right of the dialogue presents a list of available ABIs. Knowing our binary is from a Cortex-M4 (something you’d most likely know), we can filter and select little endian.

Once imported, the Import Results Summary dialogue is presented:

After clicking OK, you are informed that the file has not been analyzed and given the option to run the analyzer.

After selecting Yes, a set of Analysis Options is shown. The extra one worth checking, not checked by default, is the ARM Aggressive Instruction Finder.

Finally, the initial listing is presented. As this is a Cortex-Mx binary, Ghidra has already tagged address 0x0000 0000 as the Main Stack Point (0x2002 0000) and address 0x0000 0004 as the Reset vector address (0x0800 029d).

Reset is invoked on power up or a warm reset. The exception model treats reset as a special form of exception. When reset is asserted, the operation of the processor stops, potentially at any point in an instruction. When reset is deasserted, execution restarts from the address provided by the reset entry in the vector table. Execution restarts as privileged execution in Thread mode.

Ghidra creates default function names based on their address. The screenshot shows it has made a label: LAB_0800029c+1. If you’re unfamiliar with the Arm Thumb ISA (Instruction Set Architecture), then the +1 is when running in Thumb state; any pointer to a function must have the least significant bit set. This is a requirement for architectures that can support both Arm and Thumb interworking.

Double click on a label (e.g. LAB_0800029c+1) will jump you to that address, e.g. 0x0800029c.

Unfortunately, Ghidra hasn’t decoded this correctly!

Flash Aliasing

I quickly realized what the problem was. Ghidra was loading the binary image to address 0x0000'0000 in the memory map. However, this code was compiled for the STM32F407. The default model for the ARMv7-M family is, as previously mentioned, for the initial MSP is loaded from 0x0000'0000 and the Reset vector address from 0x0000'0004.

However, for the STM32f40x/STM32F41x family, the based address of Flash memory is at offset 0x0800'0000, as can be seen from the following table:

Like many manufacturers, ST remaps the Flash memory to 0x0000'0000 at boot time. The following table lists the various options ST allow for remapping.

When we Flash the real target, the binary image is loaded to address 0x0800'0000, not address 0x0000'0000 and then aliased at boot time.

So somehow, when loading the binary image, I need to specify that Ghidra should offset the image.

Loading an image at an offset

The default loading at address 0x0000'0000 stumped me for a time, lots of googling, plenty of dead-ends, until I happened upon this most excellent YouTube video Bare-metal ARM firmware reverse engineering with Ghidra and SVD-Loader.

When loading the initial image

Select Options...

The default load is at Base Address 00000000 (with a strangely calculated length). By naming this block, modifying this to the base address of 0x080000000 and setting the length of the available Flash, e.g.

The image is now loaded correctly:

However, it is worth setting up a couple of other memory blocks before analyzing the codebase, specifically the aliased Flash and SRAM. Creating multiple memory blocks aids Ghidra in creating better analysis mappings. So when presented with the option to analyze the project, choose No.

After being presented with the default screen:

Choose Window->Memory Map, which will display the initial Flash mapping:

We can now add two additional memory blocks. Here, the Flash Alias is being added. Note that for this block, you also want to specify the application binary image populates it. This will then give correct mappings from the aliased region to the actual Flash region.

For the SRAM region, specify it as Uninitialized. We now have three memory block regions, e.g.

Run the Analyzer as before (remember to add the ARM Aggressive Instruction Finder). Now when we double-click on the label LAB_080029c+1 from the vector table, we see the initial disassembled code:

For reference, the actual code for the reset handle is:

void __attribute__ ((section(".after_vectors"),noreturn))
Reset_Handler (void)
{
 _start ();
}

Double-clicking the label FUN_08000188 takes you to the code for the start function, etc.

We can now start investigating the code by following the call chain and decoding memory access.

Summary

Ghidra is a handy open-source tool for analyzing firmware. Defensively, we must appreciate the sophisticated tools available to a skilled hacker. Even with blown fuses, firmware can still be extracted using glitching techniques.

Learning and understanding the techniques and tools used for exploitation, I believe, ultimately make you a better embedded engineer.

Finally, if you’re planning to try out Ghidra with Arm Cortex-M based code, then install the beneficial script explained in this blog SVD-Loader for Ghidra: Simplifying bare-metal ARM reverse engineering
.

Niall Cooling
Dislike (0)
Website | + posts

Co-Founder and Director of Feabhas since 1995.
Niall has been designing and programming embedded systems for over 30 years. He has worked in different sectors, including aerospace, telecomms, government and banking.
His current interest lie in IoT Security and Agile for Embedded Systems.

About Niall Cooling

Co-Founder and Director of Feabhas since 1995. Niall has been designing and programming embedded systems for over 30 years. He has worked in different sectors, including aerospace, telecomms, government and banking. His current interest lie in IoT Security and Agile for Embedded Systems.
This entry was posted in ARM, Cortex, Security, training and tagged , , , . Bookmark the permalink.

Leave a Reply