Contents
Creating a flash image
The primary purpose of the ihex file in the embedded space is to create a file that is used to program/reprogram a target system. There are various file formats around, with the Intel Hex (ihex) format being among the most widely used.
The output of the linker stage of a build process is typically to generate a .elf
file (Executable and Linkable Format). Many debuggers and programmers can work directly with the ELF file format. However, in many cases to availability of the significantly smaller ihex file can improve reprogramming times (especially for Over-The-Air updates).
The ihex files are generated by the arm-none-eabi-objcopy from the .elf
file, e.g.
$ arm-none-eabi-objcopy -O ihex <filename>.elf <filename>.hex
One other advantage of the ihex file format is that it is a plain text file. The details of ihex file format are covered, as expected, in Wikipedia.
ihex raw file
For our explanation, we are going to dissect an ihex file generated by the Arm GCC toolchain. Given a raw hex file, such as
:020000040800F2
:10000000000002208D0100089301000897010008FC
:10001000090200080D020008550200080000000057
:100020000000000000000000000000009D02000829
...
:100180008901000889010008FEE7FFFF08B500F0BB
:1001900093F800BEFEE71EF0040F0CBFEFF30880DB
:1001A000EFF309807146414A10472DE9F04182B0D2
:1001B00004468846124B5E6B9F6B9D6A1B6B13F067
...
:10769400000041534349490000000000000000007D
:1076A40000000000000000000000000000000000D6
:0876B40000000000325476983A
:04000005080002B934
:00000001FF
we can start to deconstruct it.
Extended Linear Address Records (ELAR)
This first line is a extended linear address record
. This record contains the upper 16-bits of the Cortex-M 32-bit address range. The ELAR always has two data bytes and is broken down into the following fields:
:02 0000 04 0800 F2
02
– indicates there are two bytes of data.0000
– the address field. For the ELAR this field is always0000
and can be ignored.04
– the record type04
(extended linear address record).0800
– the upper 16-bits of the addressFC
– the checksum of the record.
This particular file is built to run on an STMicro ST32F407VG Cortex-M4 microcontroller. The Flash start address is defined in the linker script as:
FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 1024K
so we can see the upper 16-bit of this address being 0x0800
, thus, a starting address of 0x08000000
. All other records defined in the hex file are defined relative to this address.
The checksum is simply calculated by an 8-bit sum of the record bytes and then complimenting the sum to generate the checksum, e.g.:
(uint8_t)(~(0x02 + 0x00 + 0x00 + 0x04 + 0x08 + 0x00)+1)
For a very large image, it is possible to have multiple ELARs, but this is less likely on a smaller microcontroller image.
Data Records
Our next record is a data record containing the raw byte values to be stored in program memory. The first data line is:
:10000000000002208D0100089301000897010008FC
We can break this down into a series of fields:
: 10 0000 00 [00 00 02 20] [8D 01 00 08] [93 01 00 08] [97 01 00 08] FC
With the following format:
10
– number of bytes (in hex)0000
– address offset.00
– indicates a data record- a further 16 (
0x10
) data bytes FC
– the checksum field.
The address for this record is calculated by bit-oring the address with the address from the extended linear address record
, e.g. 0x80000000 | 0x0000 => 0x80000000
.
The data bytes
As this is a record for the STM32, the first record is likely to be the Interrupt Vector Table. These four words, therefore, give a series of 32-bit addresses:
[00 00 02 20] [8D 01 00 08] [93 01 00 08] [97 01 00 08]
As the values are stored as little-endian, the 4-byte groups can be viewed as the following table entries:
20020000
0800018D
08000193
08000197
As the Cortex-M4 is programmed using the Thumb-2 ISA, all function addresses have a +1
offset (to indicate Thumb-2 mode). By cross-referencing with the .map
file the IVT entries become apparent:
20020000 - __stack = (ORIGIN (RAM) + LENGTH (RAM))
0800018C - Reset_Handler
08000192 - NMI_Handler
08000196 - HardFault_Handler
The next data record
Give the following record:
:10001000090200080D020008550200080000000057
We can deduce the address offset as 0010
, giving a load address of 0x08000010
.
Investigating code
The earlier records indicated that the Reset_Handler
is located at the address 0x0800018c
. If we move further down the hex file to the record beginning :10018000
, it contains the bytes destined to be loaded between 0x08000180
and 0x08000190
.
The actual record contents are:
:100180008901000889010008FEE7FFFF08B500F0BB
Breaking this down into 16-bit values (the smallest opcode size for thumb-2), we get the following record contents:
:10 0180 00 [8901 0008][8901 0008][FEE7 FFFF][08B5 00F0] BB
The 16-bit value at the address 0x0800018C
is 0xB508
. This maps onto the 16-bit Thumb-2 instruction, e.g.
800018c: b508 push {r3, lr}
0b1011'0101'0000'1000'
0b1011010
-> Push0b1
-> M-bit set, so pushlr
0b0000'1000
-> pushr3
(bit 3 set)
The next half-word is 0xf000
which, as it starts with 0xf
is a 32-bit Thumb-2 opcode. Combining with the next data record:
:10 0190 00 [93F8 00BE][FEE7 1EF0][040F 0CBF][EFF3 0880] DB
Give the opcode:
800018e: f000f893 bl 80002b8
This address is a C function called start
.
And continuing with:
08000192 <NMI_Handler>:
be00 bkpt 0x0000
e7fe b.n 8 <NMI_Handler+0x2>
08000196 <HardFault_Handler>:
8000196: f01e 0f04 tst.w lr, #4
800019a: bf0c ite eq
800019c: f3ef 8008 mrseq r0, MSP
The mixed assembler-hex output can be generated from the .elf
file using the command:
$ arm-none-eabi-objdump -d -S <filename>.elf
End of file records
The final few lines are:
:1076A40000000000000000000000000000000000D6
:0876B40000000000325476983A
:04000005080002B934
:00000001FF
Start Linear Address Record
The penultimate line of the file is a Start Linear Address
record:
:04000005080002B934
where
04
– Byte count, always 04.0000
– not used (always 0000).05
– is the record type 05 (a start linear address record).080002B9
– is the 4-byte linear start address of the application.34
is the checksum.
This record indicates the start address of the application. The entry point is informative for debuggers and simulators. The record entry may also be used by a bootloader to vector into the application rather than go through the Interrupt Vector Table entry. Notice in our example that the indicated address 0x080002B8
differs from the IVT entry of 0x0800018D
. This start address is set in the Linker configuration script with the command:
ENTRY(_start)
In the .map
file the entry for this address is:
0x080002b8 _start
End Of File record
This record must occur exactly once per file in the last line of the file. The data field is empty (thus, byte count is 00), and the address field can be ignored (and is typically 0000), e.g.
:00000001FF
Summary
In the unlikely case, you ever need to dissect an Intel-Hex file, hopefully, this post will help you navigate the format. It may also prove useful if embarking on writing a bootloader or having to reverse-engineer a raw hex file.
Addendum
Joseph Yiu of Arm kindly pointed out to me that having ENTRY(_start) could be problematic in some cases. The debugger could start running the code from _start after reset and skip the reset handler (which calls SystemInit()). It means the application could fail when running from a debugger because SystemInit() is not called.
As a result, if the project has a reset handler called Reset_Handler, it might be better to use ENTRY(Reset_Handler).
- Navigating Memory in C++: A Guide to Using std::uintptr_t for Address Handling - February 22, 2024
- Embedded Expertise: Beyond Fixed-Size Integers; Exploring Fast and Least Types - January 15, 2024
- Disassembling a Cortex-M raw binary file with Ghidra - December 20, 2022
Co-Founder and Director of Feabhas since 1995.
Niall has been designing and programming embedded systems for over 30 years. He has worked in different sectors, including aerospace, telecomms, government and banking.
His current interest lie in IoT Security and Agile for Embedded Systems.