Understanding Arm Cortex-M Intel-Hex (ihex) files

Creating a flash image

The primary purpose of the ihex file in the embedded space is to create a file that is used to program/reprogram a target system. There are various file formats around, with the Intel Hex (ihex) format being among the most widely used.

The output of the linker stage of a build process is typically to generate a .elf file (Executable and Linkable Format). Many debuggers and programmers can work directly with the ELF file format. However, in many cases to availability of the significantly smaller ihex file can improve reprogramming times (especially for Over-The-Air updates).

The ihex files are generated by the arm-none-eabi-objcopy from the .elf file, e.g.

$ arm-none-eabi-objcopy -O ihex <filename>.elf <filename>.hex

One other advantage of the ihex file format is that it is a plain text file. The details of ihex file format are covered, as expected, in Wikipedia.

ihex raw file

For our explanation, we are going to dissect an ihex file generated by the Arm GCC toolchain. Given a raw hex file, such as

:020000040800F2
:10000000000002208D0100089301000897010008FC
:10001000090200080D020008550200080000000057
:100020000000000000000000000000009D02000829
...
:100180008901000889010008FEE7FFFF08B500F0BB
:1001900093F800BEFEE71EF0040F0CBFEFF30880DB
:1001A000EFF309807146414A10472DE9F04182B0D2
:1001B00004468846124B5E6B9F6B9D6A1B6B13F067
...
:10769400000041534349490000000000000000007D
:1076A40000000000000000000000000000000000D6
:0876B40000000000325476983A
:04000005080002B934
:00000001FF

we can start to deconstruct it.

Extended Linear Address Records (ELAR)

This first line is a extended linear address record. This record contains the upper 16-bits of the Cortex-M 32-bit address range. The ELAR always has two data bytes and is broken down into the following fields:

:02 0000 04 0800 F2
  • 02 – indicates there are two bytes of data.
  • 0000 – the address field. For the ELAR this field is always 0000 and can be ignored.
  • 04 – the record type 04 (extended linear address record).
  • 0800 – the upper 16-bits of the address
  • FC – the checksum of the record.

This particular file is built to run on an STMicro ST32F407VG Cortex-M4 microcontroller. The Flash start address is defined in the linker script as:

  FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 1024K

so we can see the upper 16-bit of this address being 0x0800, thus, a starting address of 0x08000000. All other records defined in the hex file are defined relative to this address.

The checksum is simply calculated by an 8-bit sum of the record bytes and then complimenting the sum to generate the checksum, e.g.:

(uint8_t)(~(0x02 + 0x00 + 0x00 + 0x04 + 0x08 + 0x00)+1)

For a very large image, it is possible to have multiple ELARs, but this is less likely on a smaller microcontroller image.

Data Records

Our next record is a data record containing the raw byte values to be stored in program memory. The first data line is:

:10000000000002208D0100089301000897010008FC

We can break this down into a series of fields:

: 10 0000 00 [00 00 02 20] [8D 01 00 08] [93 01 00 08] [97 01 00 08] FC

With the following format:

  • 10 – number of bytes (in hex)
  • 0000 – address offset.
  • 00 – indicates a data record
  • a further 16 (0x10) data bytes
  • FC – the checksum field.

The address for this record is calculated by bit-oring the address with the address from the extended linear address record, e.g. 0x80000000 | 0x0000 => 0x80000000.

The data bytes

As this is a record for the STM32, the first record is likely to be the Interrupt Vector Table. These four words, therefore, give a series of 32-bit addresses:

[00 00 02 20] [8D 01 00 08] [93 01 00 08] [97 01 00 08]

As the values are stored as little-endian, the 4-byte groups can be viewed as the following table entries:

20020000
0800018D 
08000193
08000197

As the Cortex-M4 is programmed using the Thumb-2 ISA, all function addresses have a +1 offset (to indicate Thumb-2 mode). By cross-referencing with the .map file the IVT entries become apparent:

20020000 - __stack = (ORIGIN (RAM) + LENGTH (RAM))
0800018C - Reset_Handler 
08000192 - NMI_Handler
08000196 - HardFault_Handler

The next data record

Give the following record:

:10001000090200080D020008550200080000000057

We can deduce the address offset as 0010, giving a load address of 0x08000010.

Investigating code

The earlier records indicated that the Reset_Handler is located at the address 0x0800018c. If we move further down the hex file to the record beginning :10018000, it contains the bytes destined to be loaded between 0x08000180 and 0x08000190.

The actual record contents are:

:100180008901000889010008FEE7FFFF08B500F0BB

Breaking this down into 16-bit values (the smallest opcode size for thumb-2), we get the following record contents:

:10 0180 00 [8901 0008][8901 0008][FEE7 FFFF][08B5 00F0] BB

The 16-bit value at the address 0x0800018C is 0xB508. This maps onto the 16-bit Thumb-2 instruction, e.g.

 800018c:   b508        push    {r3, lr}

0b1011'0101'0000'1000'

  • 0b1011010 -> Push
  • 0b1 -> M-bit set, so push lr
  • 0b0000'1000 -> push r3 (bit 3 set)

The next half-word is 0xf000 which, as it starts with 0xf is a 32-bit Thumb-2 opcode. Combining with the next data record:

:10 0190 00 [93F8 00BE][FEE7 1EF0][040F 0CBF][EFF3 0880] DB

Give the opcode:

 800018e:   f000f893    bl  80002b8 

This address is a C function called start.

And continuing with:

08000192 <NMI_Handler>:
be00        bkpt    0x0000
e7fe        b.n 8 <NMI_Handler+0x2>

08000196 <HardFault_Handler>:
 8000196:   f01e 0f04   tst.w   lr, #4
 800019a:   bf0c        ite eq
 800019c:   f3ef 8008   mrseq   r0, MSP

The mixed assembler-hex output can be generated from the .elf file using the command:

$ arm-none-eabi-objdump -d -S <filename>.elf

End of file records

The final few lines are:

:1076A40000000000000000000000000000000000D6
:0876B40000000000325476983A
:04000005080002B934
:00000001FF

Start Linear Address Record

The penultimate line of the file is a Start Linear Address record:

:04000005080002B934

where

  • 04 – Byte count, always 04.
  • 0000 – not used (always 0000).
  • 05 – is the record type 05 (a start linear address record).
  • 080002B9 – is the 4-byte linear start address of the application.
  • 34 is the checksum.

This record indicates the start address of the application. The entry point is informative for debuggers and simulators. The record entry may also be used by a bootloader to vector into the application rather than go through the Interrupt Vector Table entry. Notice in our example that the indicated address 0x080002B8 differs from the IVT entry of 0x0800018D. This start address is set in the Linker configuration script with the command:

ENTRY(_start) 

In the .map file the entry for this address is:

0x080002b8                _start

End Of File record

This record must occur exactly once per file in the last line of the file. The data field is empty (thus, byte count is 00), and the address field can be ignored (and is typically 0000), e.g.

:00000001FF

Summary

In the unlikely case, you ever need to dissect an Intel-Hex file, hopefully, this post will help you navigate the format. It may also prove useful if embarking on writing a bootloader or having to reverse-engineer a raw hex file.

Addendum

Joseph Yiu of Arm kindly pointed out to me that having ENTRY(_start) could be problematic in some cases. The debugger could start running the code from _start after reset and skip the reset handler (which calls SystemInit()). It means the application could fail when running from a debugger because SystemInit() is not called.

As a result, if the project has a reset handler called Reset_Handler, it might be better to use ENTRY(Reset_Handler).

Niall Cooling
Dislike (0)
Website | + posts

Co-Founder and Director of Feabhas since 1995.
Niall has been designing and programming embedded systems for over 30 years. He has worked in different sectors, including aerospace, telecomms, government and banking.
His current interest lie in IoT Security and Agile for Embedded Systems.

About Niall Cooling

Co-Founder and Director of Feabhas since 1995. Niall has been designing and programming embedded systems for over 30 years. He has worked in different sectors, including aerospace, telecomms, government and banking. His current interest lie in IoT Security and Agile for Embedded Systems.
This entry was posted in ARM, Build-systems, C/C++ Programming, Cortex, Toolchain and tagged , . Bookmark the permalink.

Leave a Reply