You are currently browsing the archives for the Cortex category.

ARM TechCon 2013

August 15th, 2013

ARM’s Technical Conference called TechCon™ is running between October 29th and 31st at the Santa Clara Convention Center in California.

Logo

This year I shall be making the trip over to present three classes:

For those of you who are regular readers of this blog you’ll recognise the Generic Hard-Fault Handler from a previous post.

The class “Can Existing Embedded Applications Benefit from Multicore Technology?” came about as it seemed that not a day would goe by without an announcement of a major development in multicore technology. With so much press about multicore, I started to wonder whether I should consider using multicore technology in my typical embedded applications? 

From a software developer’s perspective, however, all the code examples seem to demonstrate the (same) massive performance improvements to rendering fractals or ray-tracing programs. The examples always refer to Amdahl’s Law, showing gains when using, say, 16 or 128 cores. This is all very interesting, but not what I, and hopefully most embedded developers, might consider embedded. This class discusses multicore from a more traditional embedded viewpoint.

Many Embedded-C programmers still believe that C++ leads to slow, bloated programs. Though this viewpoint may have had limited foundation over a decade ago, it is misplaced for the core aspects of C++ (classes, inheritance, and dynamic polymorphism). With a modern ARM C++ cross-compiler, it is also misplaced for the more advanced features (templates and exception handling). In the class “Virtual Functions in C++ on the ARM Architecture”, I will be focusing on the performance and memory of the C++ virtual functions, type info and look at the use of multiple inheritance in an ARM embedded environment.

If you are planning to attend TechCon this year then please look me up. I will be making the presentations available via the blog after the event.

Niall.

Test Driven Development (TDD) with the mbed

May 24th, 2013

One of the most useful fallout’s from the acceptance of Agile techniques is the use of Test-Driven-Development (TDD) and the growth of associated test frameworks, such as GoogleTest and CppUTest, etc.

I won’t get into the details of TDD here as they are well covered elsewhere (I recommend James Grenning’s book “Test Driven Development for Embedded C” for a good coverage of the subject area), but the principle is

  1. Write a test
  2. Develop enough code to compile and build (but will fail the test)
  3. Write the application code to pass the test
  4. repeat until done

Obviously that is massively simplifying the process, but that’s the gist. The key to it all is automation, in that you want to write a new test and then the build-deploy-test-report (BDTR) cycle is automated.

To build a TDD environment with the mbed I needed to solve the following obstacles:

  1. Build – Using a TDD framework and building the project
  2. Deploy –  Download to the mbed
  3. Test – auto-executing the test code on the mbed
  4. Report –  Getting test reports back from the mbed to the host

Read more »

Rehosting ARMCC for the mbed with CMSIS-DAP

April 26th, 2013

In this posting I will look at porting the C standard library output (e.g. puts / printf ) to use a UART rather than the default ARM/Keil semihosting.

A-simples-life-001

In my last post, I looked at getting basic user I/O out from a native-mbed via UART0 to a terminal emulator (e.g. Tera Term). This was driven by the fact that, currently, neither printf (via semihosting) or ITM_SendChar do not function on the mbed. Unfortunately, my solution uses a propriety API, such as init_serial0 and putchar0, etc., rather than puts, printf, etc.

ITM_SendChar uses the ITM (Instrumented Trace Macrocell) on the Cortex-M3 core, which in turn uses a Trace Port (either SWO or 4-pin) to send messages. To get output from a Trace Port you need a debug unit with trace capabilities, e.g. a ULINK or J-Link device. The current implementation of CMSIS-DAP does not support any trace capabilities; however I am led to believe that ARM are planning to add some trace capabilities in future versions or variants of CMSIS-DAP (no timeframes).

To reference the ARM website:

Semihosting is implemented by a set of defined software instructions, for example, SVCs, that generate exceptions from program control. The application invokes the appropriate semihosting call and the debug agent then handles the exception. The debug agent provides the required communication with the host.

On a Cortex-M3 (ARMv7–M) you’d typically see “BKPT 0xAB” opcode instead of SVC’s. For the same reason as the ITM_SendChar, currently semihosting is not supported on the mbed.

So, ideally, it would be nice to still be able to use puts/printf (the greatest debug tool of all) but redirect the output to our UART; i.e. rehosting.

Rehosting in the Keil environment is very easy, once you know how! It is easy to go down a couple of dead ends, which hopefully I’ll help you avoid.

First, in our main, where we’re using printf, we need to include the following pre-processor directive in main.c:

#pragma import(__use_no_semihosting_swi)

Read more »

User I/O from mbed with CMSIS-DAP

April 19th, 2013

Following on from my last posting regarding using native C/C++ on the mbed I have found that I currently cannot get output via the standard CMSIS ITM_SendChar function as used in the Cortex-M hard fault handler (I am currently in dialog with the guys at ARM trying to resolve this).

MdebHello

In the standard mbed environment, the mbed can communicate with a host PC through a “USB Virtual Serial Port” over the same USB cable that is used for programming using printf(), e.g.

#include “mbed.h”
int main()
{
printf(“Hello World!\n”);
}

To achieve the same output, an mbed SerialPC object can be defined, e.g.

#include “mbed.h”
Serial pc(USBTX, USBRX); // tx, rx
int main()
{
pc.printf(“Hello World!\n”);
}

Currently, with a native minimal project, the semi-hosting of printf is not supported. This can be overcome by “re-targeting the project”, so I’ll cover that in the future, but for now the is a simple way of getting basic user I/O.

User I/O via UART0

Luckily for us, if we push characters out over the UART0 serial interface they are transmitted via the same channel that the mbed SerialPC uses. To test this out I quickly (using the best agile techniques of course) put together a very basic UART driver. Read more »

Native C/C++ Application development for the mbed using CMSIS-DAP

April 12th, 2013

If you have been following the Feabhas blog for some time, you may remember that in April of last year I posted about my experiences of using the MQTT protocol. The demonstration code was ran the ARM Cortex-M3 based mbed platform.mbed-microcontroller-angled

For those that are not familiar with the mbed, it is an “Arduino-like” development platform for small microcontroller embedded systems. The variant I’m using is built using an NXP LPC1768 Cortex-M3 device, which offers a plethora of connection options, ranging from simple GPIO, through I2C and SPI, right up to CAN, USB and Ethernet. With a similar conceptual model to Arduino’s, the drivers for all these drivers are supplied in a well-tested (C++) library. The mbed is connect to a PC via a USB cable (which also powers it), so allows the mbed to act as a great rapid prototyping platform. [I have never been a big fan of the 8-bit Arduino (personal choice no need to flame me  ) and have never used the newer ARM Cortex-M based Arduino's, such as the Due.]

However, in its early guise, there were two limitations when targeting an mbed (say compared to the Arduino).

First was the development environment; initially all software development was done through a web-based IDE. This is great for cross-platform support; especially for me being an Apple fanboy. Personally I never had a problem using the online IDE, especially as I am used to using offline environments such as Keil’s uVision, IAR’s Embedded Workbench and Eclipse. Over the years the mbed IDE has evolved and makes life very easy for importing other mbed developers libraries, creating your own libraries and even have an integrated distributed version control feature. But the need Internet connection inhibit the ability to develop code on a long flight or train journey for example.

Second, the output from the build process is a “.bin” (binary) file, which you save on to the mbed (the PC sees the mbed as a USB storage device). You then press the reset button on the mbed to execute your program. I guessing you’re well ahead of me here, but of course that means there is no on-target debug capabilities (breakpoints, single-step, variable and memory viewing, etc.). Now of course one could argue, as we have a well-defined set of driver libraries and if we followed aTest-Driven-Development (TDD) process that we don’t need target debugging (there is support for printf style debugging via the USB support serial mode); but that is a discussion/debate for another session! I would hazard a guess most embedded developers would prefer at least the option of target based source code debugging? Read more »

Setting up the Cortex-M3/4 (ARMv7-M) Memory Protection Unit (MPU)

February 25th, 2013

An optional part of the ARMv7-M architecture is the support of a Memory Protection Unit (MPU). This is a fairly simplistic device (compared to a fully blow Memory Management Unit (MMU) as found on the Cortex-A family), but if available can be programmed to help capture illegal or dangerous memory accesses.
When first looking at programming the MPU it may seem rather daunting, but in reality it is very straightforward. The added benefit of the ARMv7-M family is the well-defined memory map.
All example code is based around an NXP LPC1768 and Keil uVision v4.70 development environment. However as all examples are built using CMSIS, then they should work on an Cortex-M3/4 supporting the MPU.
First, let’s take four types of memory access we may want to capture or inhibit:

  1. Tying to read at an address that is reserved in the memory map (i.e. no physical memory of any type there)
  2. Trying to write to Flash/ROM
  3. Stopping areas of memory being accessible
  4. Disable running code located in SRAM (eliminating potential exploit)

Before we start we need to understand the microcontrollers memory map, so here we can look at the memory map of the NXP LPC1768 as defined in chapter 2 of the LPC17xx User Manual (UM10360).

  • 512kB FLASH @ 0×0000 0000 – 0×0007 FFFF
  • 32kB on-chip SRAM @ 0×1000 0000 – 0×1000 7FFFF
  • 8kB boot ROM @ 0x1FFF 0000 – 0x1FFF 1FFF
  • 32kB on-chip SRAM @ 0×2007 C000 [AHB SRAM]
  • GPIO @ 0x2009C000 – 0×2009 FFFF
  • APB Peripherals  @ 0×4000 0000 – 0x400F FFFF
  • AHB Peripheral @ 0×5000 0000 – 0x501F FFFF
  • Private Peripheral Bus @ 0xE000 0000 – 0xE00F FFFF

Based on the above map we can set up four tests:

  1. Read from location 0×0008 0000 – this is beyond Flash in a reserved area of memory
  2. Write to location 0×0000 4000 – some random loaction in the flash region
  3. Read the boot ROM at 0x1FFF 0000
  4. Construct a function in SRAM and execute it

The first three tests are pretty easy to set up using pointer indirection, e.g.:

int* test1 = (int*)0x000004000;   // reserved location
x= *test1;                        // try to read from reserved location
int* test2 = (int*)0x000004000;   // flash location
*test2 = x;                       // try to write to flash
int* test3 = (int*)0x1fff0000 ;   // Boot ROM location
x = *test3 ;                      // try to read from boot ROM

The fourth takes a little more effort, e.g.

// int func(int r0)
// {
//    return r0+1;
// }
uint16_t func[] = { 0x4601, 0x1c48, 0x4770 };
int main(void)
{
   funcPtr test4= (funcPtr)(((uint32_t)func)+1);  // setup RAM function (+1 for thumb)
   x = test4(x);                                  // call ram function
   while(1);
}

Default Behavior

Without the MPU setup the following will happen (output from the previous Fault Handler project):

  • test1 will generate a precise bus error

f1

  • test2 will generate an imprecise bus error

f2

Test3 and test4 will run without any fault being generated.

Setting up the MPU

There are a lot of options when setting up the MPU, but 90% of the time a core set are sufficient. The ARMv7-M MPU supports up to 8 different regions (an address range) that can be individually configured. For each region the core choices are:

  • the start address (e.g. 0×10000000)
  •  the size (e.g. 32kB)
  •  Access permissions (e.g. Read/Write access)
  • Memory type (here we’ll limit to either Normal for Flash/SRAM, Device for NXP peripherals, and Strongly Ordered for the private peripherals)
  • Executable or not (refereed to a Execute Never [XN] in MPU speak)

Both access permissions and memory types have many more options than those covered here, but for the majority of cases these will suffice. Here I’m not intending to cover privileged/non-privileged options (don’t worry if that doesn’t make sense, I shall cover it in a later posting).
Based on our previous LPC1768 memory map we could define as region map thus:

No.  Memory             Address       Type      Access Permissions  Size
0    Flash              0x00000000    Normal    Full access, RO    512KB
1    SRAM               0x10000000    Normal    Full access, RW     32KB
2    SRAM               0x2007C000    Normal    Full access, RW     32KB
3    GPIO               0x2009C000    Device    Full access, RW     16KB
4    APB Peripherals    0x40000000    Device    Full access, RW    512KB
5    AHB Peripherals    0x50000000    Device    Full access, RW      2MB
6    PPB                0xE0000000    SO        Full access, RW      1MB

Not that the boot ROM has not been explicitly mapped. This means any access to that region once the MPU has been initialized will get caught as a memory access violation.
To program a region, we need to write to two registers in order:

  • MPU Region Base Address Register (CMSIS: SCB->RBAR)
  • MPU Region Attribute and Size Register (CMSIS: SCB->RASR)

MPU Region Base Address Register

Bits 0..3 specify the region number
Bit 4 needs to be set to make the region valid
bits 5..31 have the base address of the region (note the bottom 5 bits are ignored – base address must also be on a natural boundary, i.e. for a 32kB region the base address must be a multiple of 32kB).

So if we want to program region 1 we would write:

#define VALID 0x10
SCB->RBAR = 0x10000000 | VALID | 1;  // base addr | valid | region no

MPU Region Attribute and Size Register

This is slightly more complex, but the key bits are:

bit 0 – Enable the region
bits 1..5 – region size; where size is used as 2**(size+1)
bits 16..21 – Memory type (this is actually divided into 4 separate groups)
bits 24..26 – Access Privilege
bit 28 – XN

So given the following defines:

#define REGION_Enabled  (0x01)
#define REGION_32K      (14 << 1)      // 2**15 == 32k
#define NORMAL          (8 << 16)      // TEX:0b001 S:0b0 C:0b0 B:0b0
#define FULL_ACCESS     (0x03 << 24)   // Privileged Read Write, Unprivileged Read Write
#define NOT_EXEC        (0x01 << 28)   // All Instruction fetches abort

We can configure region 0 thus:

SCB->RASR = (REGION_Enabled | NOT_EXEC | NORMAL | REGION_32K | FULL_ACCESS);

We can now repeat this for each region, thus:

void lpc1768_mpu_config(void)
{
   /* Disable MPU */
   MPU->CTRL = 0;
   /* Configure region 0 to cover 512KB Flash (Normal, Non-Shared, Executable, Read-only) */
   MPU->RBAR = 0x00000000 | REGION_Valid | 0;
   MPU->RASR = REGION_Enabled | NORMAL | REGION_512K | RO;
   /* Configure region 1 to cover CPU 32KB SRAM (Normal, Non-Shared, Executable, Full Access) */
   MPU->RBAR = 0x10000000 | REGION_Valid | 1;
   MPU->RASR = REGION_Enabled | NOT_EXEC | NORMAL | REGION_32K | FULL_ACCESS;
   /* Configure region 2 to cover AHB 32KB SRAM (Normal, Non-Shared, Executable, Full Access) */
   MPU->RBAR = 0x2007C000 | REGION_Valid | 2;
   MPU->RASR = REGION_Enabled | NOT_EXEC | NORMAL | REGION_32K | FULL_ACCESS;
   /* Configure region 3 to cover 16KB GPIO (Device, Non-Shared, Full Access Device, Full Access) */
   MPU->RBAR = 0x2009C000 | REGION_Valid | 3;
   MPU->RASR = REGION_Enabled |DEVICE_NON_SHAREABLE | REGION_16K | FULL_ACCESS;
   /* Configure region 4 to cover 512KB APB Peripherials (Device, Non-Shared, Full Access Device, Full Access) */
   MPU->RBAR = 0x40000000 | REGION_Valid | 4;
   MPU->RASR = REGION_Enabled | DEVICE_NON_SHAREABLE | REGION_512K | FULL_ACCESS;
   /* Configure region 5 to cover 2MB AHB Peripherials (Device, Non-Shared, Full Access Device, Full Access) */
   MPU->RBAR = 0x50000000 | REGION_Valid | 5;
   MPU->RASR = REGION_Enabled | DEVICE_NON_SHAREABLE | REGION_2M | FULL_ACCESS;
   /* Configure region 6 to cover the 1MB PPB (Privileged, XN, Read-Write) */
   MPU->RBAR = 0xE0000000 | REGION_Valid | 6;
   MPU->RASR = REGION_Enabled |STRONGLY_ORDERED_SHAREABLE | REGION_1M | FULL_ACCESS;
   /* Enable MPU */
   MPU->CTRL = 1;
   __ISB();
   __DSB();
}

After the MPU has been enabled, ISB and DSB barrier calls have been added to ensure that the pipeline is flushed and no further operations are executed until the memory access that enables the MPU completes.

Using the Keil environment, we can examine the MPU configuration:

f3

Rerunning the tests with MPU enabled

To get useful output we can develop a memory fault handler, building on the Hard Fault handler, e.g.

void printMemoryManagementErrorMsg(uint32_t CFSRValue)
{
   printErrorMsg("Memory Management fault: ");
   CFSRValue &= 0x000000FF; // mask just mem faults
   if((CFSRValue & (1<<5)) != 0) {
      printErrorMsg("A MemManage fault occurred during FP lazy state preservation\n");
   }
   if((CFSRValue & (1<<4)) != 0) {
      printErrorMsg("A derived MemManage fault occurred on exception entry\n");
   }
   if((CFSRValue & (1<<3)) != 0) {
      printErrorMsg("A derived MemManage fault occurred on exception return.\n");
   }
   if((CFSRValue & (1<<1)) != 0) {
      printErrorMsg("Data access violation.\n");
   }
   if((CFSRValue & (1<<0)) != 0) {
      printErrorMsg("MPU or Execute Never (XN) default memory map access violation\n");
   }
   if((CFSRValue & (1<<7)) != 0) {
      static char msg[80];
      sprintf(msg, "SCB->MMFAR = 0x%08x\n", SCB->MMFAR );
      printErrorMsg(msg);
   }
}

Test 1 – Reading undefined region

Rerunning test one with the MPU enabled gives the following output:

f4

The SCB->MMFAR contains the address of the memory that caused the access violation, and the PC guides us towards the offending instruction

f5

Test 2 – Writing to RO defined region

f6

f7

Test 3 – Reading Undefined Region (where memory exists)

f8

f9

Test 4 – Executing code in XN marked Region

f10

The PC gives us the location of the code (in SRAM) that tried to be executed

f11

The LR indicates the code where the branch was executed

f12

So, we can see with a small amount of programming we can (a) simplify debugging by quickly being able to establish the offending opcode/memory access, and (b) better defend our code against accidental/malicious access.

Optimizing the MPU programming.

Once useful feature of the Cortex-M3/4 MPU is that the Region Base Address Register and Region Attribute and Size Register are aliased three further times. This means up to 4 regions can be programmed at once using a memcpy. So instead of the repeated writes to RBAR and RASR, we can create configuration tables and initialize the MPU using a simple memcpy, thus:

uint32_t table1[] = {
/* Configure region 0 to cover 512KB Flash (Normal, Non-Shared, Executable, Read-only) */
(0x00000000 | REGION_Valid | 0),
(REGION_Enabled | NORMAL_OUTER_INNER_NON_CACHEABLE_NON_SHAREABLE | REGION_512K | RO),
/* Configure region 1 to cover CPU 32KB SRAM (Normal, Non-Shared, Executable, Full Access) */
(0x10000000 | REGION_Valid | 1),
(REGION_Enabled | NOT_EXEC | NORMAL | REGION_32K | FULL_ACCESS),
/* Configure region 2 to cover AHB 32KB SRAM (Normal, Non-Shared, Executable, Full Access) */
(0x2007C000 | REGION_Valid | 2),
(REGION_Enabled | NOT_EXEC | NORMAL_OUTER_INNER_NON_CACHEABLE_NON_SHAREABLE | REGION_32K | FULL_ACCESS),
/* Configure region 3 to cover 16KB GPIO (Device, Non-Shared, Full Access Device, Full Access) */
(0x2009C000 | REGION_Valid | 3),
(REGION_Enabled | DEVICE_NON_SHAREABLE | REGION_16K | FULL_ACCESS)
};

uint32_t table2[] = {
/* Configure region 4 to cover 512KB APB Peripherials (Device, Non-Shared, Full Access Device, Full Access) */
(0x40000000 | REGION_Valid | 4),
(REGION_Enabled | DEVICE_NON_SHAREABLE | REGION_512K | FULL_ACCESS),
/* Configure region 5 to cover 2MB AHB Peripherials (Device, Non-Shared, Full Access Device, Full Access) */
(0x50000000 | REGION_Valid | 5),
(REGION_Enabled | DEVICE_NON_SHAREABLE | REGION_2M | FULL_ACCESS),
/* Configure region 6 to cover the 1MB PPB (Privileged, XN, Read-Write) */
(0xE0000000 | REGION_Valid | 6),
(REGION_Enabled | NOT_EXEC | DEVICE_NON_SHAREABLE | REGION_1M | P_RW_U_NA),
};

void lpc1768_mpu_config_tbl(void)
{
   /* Disable MPU */
   MPU->CTRL = 0;
   memcpy((void*)&( MPU->RBAR), table1, sizeof(table1));
   memcpy((void*)&( MPU->RBAR), table2, sizeof(table2));
   /* Enable MPU */
   MPU->CTRL = 1;
   __ISB();
   __DSB();
}

I hope this is enough to get you started with your ARMv7-M MPU.

Developing a Generic Hard Fault handler for ARM Cortex-M3/Cortex-M4

February 1st, 2013

This posting assumes you that you have a working ARM Cortex-M3 base project in Keil uVision. If not, please see the “howto” video: Creating ARM Cortex-M3 CMSIS Base Project in uVision

Divide by zero error

Given the following C function

int div(int lho, int rho)
{
    return lho/rho;
}

called from main with these arguments

int main(void)
{
   int a = 10;
   int b = 0;
   int c;
   c = div(a, b);
   // other code
}

You would expect a hardware “divide-by-zero” (div0) error. Possibly surprisingly, by default the Cortex-M3 will not report the error but return zero.

Configuration and Control Register (CCR)

To enable hardware reporting of div0 errors we need to configure the CCR. The CCR is part of the Cortex-M’s System Control Block (SCB) and controls entry trapping of divide by zero and unaligned accesses among other things. The CCR bit assignment for div0 is:

[4] DIV_0_TRP Enables faulting or halting when the processor executes an SDIV or UDIV instruction with a divisor of 0:0 = do not trap divide by 01 = trap divide by 0. When this bit is set to 0, a divide by zero returns a quotient of 0.

So to enable DIV_0_TRP, we can use CMSIS definitions for the SCB and CCR, as in:

SCB->CCR |= 0x10;

If we now build and run the project you will need to stop execution as it will appear to run forever. When execution is stopped you’ll find debugger stopped in the file startup_ARMCM3.s in the CMSIS default Hard Fault exception handler:

CMSIS_handler

Override The Default Hard Fault_Handler

As all the exceptions handlers are build with “Weak” linkage in CMSIS, it is very easy to create your own Hard Fault handler. Simply define a function with the name “HardFault_Handler”, as in:

void HardFault_Handler(void)
{ 
   while(1); 
}


If we now build, run and then stop the project, we’ll find the debugger will be looping in our new handler rather than the CMSIS default one (alternatively we could put a breakpoint at the while(1) line in the debugger).

Empty_handler

Rather than having to enter breakpoints via your IDE, I like to force the processor to enter debug state automatically if a certain instruction is reached (a sort of debug based assert). Inserting the BKPT (breakpoint) ARM instruction in our code will cause the processor to enter debug state. The immediate following the opcode normally doesn’t matter (but always check) except it shouldn’t be 0xAB (which is used for semihosting).

#include "ARMCM3.h" 
void HardFault_Handler(void)
{
  __ASM volatile("BKPT #01"); 
  while(1); 
}

If we now build and run, the program execution should break automatically at the BKPT instruction.

BKPT

Error Message Output

The next step in developing the fault handler is the ability to report the fault. One option is, of course, to use stdio (stderr) and semihosting. However, as the support for semihosting can vary from compiler to compiler, I prefer to use Instrumented Trace Macrocell (ITM) utilizing the CMSIS wrapper function ITM_SendChar, e.g.

void printErrorMsg(const char * errMsg)
{
   while(*errMsg != ''){
      ITM_SendChar(*errMsg);
      ++errMsg;
   }
}

InitalErrMsg

Printf1

Fault Reporting

Now that we have a framework for the Hard Fault handler, we can start reporting on the actual fault details. Within the Cortex-M3′s System Control Block (SCB) is the HardFault Status Register (SCB->HFSR). Luckily again for use, CMSIS has defined symbols allowing us to access these register:

void HardFault_Handler(void)
{
   static char msg[80];
   printErrorMsg("In Hard Fault Handler\n");
   sprintf(msg, "SCB->HFSR = 0x%08x\n", SCB->HFSR);
   printErrorMsg(msg);
   __ASM volatile("BKPT #01");
   while(1);
}

Building and running the application should now result in the following output:

HFSR

By examining the HFSR bit configuration, we can see that the FORCED bit is set.

HFSR2

When this bit is set to 1, the HardFault handler must read the other fault status registers to find the cause of the fault.

Configurable Fault Status Register (SCB->CFSR)

A forced hard fault may be caused by a bus fault, a memory fault, or as in our case, a usage fault. For brevity, here I am only going to focus on the Usage Fault and cover the Bus and Memory faults in a later posting (as these have additional registers to access for details).

CFSR

So given what we know to date, our basic Fault Handler:

  • checks if the FORCED bit is set ( if ((SCB->HFSR & (1 << 30)) != 0) )
  • prints out the contents of the CFSR
void HardFault_Handler(void)
{
   static char msg[80];
   printErrorMsg("In Hard Fault Handler\n");
   sprintf(msg, "SCB->HFSR = 0x%08x\n", SCB->HFSR);
   printErrorMsg(msg);
   if ((SCB->HFSR & (1 << 30)) != 0) {
       printErrorMsg("Forced Hard Fault\n");
       sprintf(msg, "SCB->CFSR = 0x%08x\n", SCB->CFSR );
       printErrorMsg(msg);
   }
   __ASM volatile("BKPT #01");
   while(1);
}

Running the application should now result in the following output:

CFSR

The output indicated that bit 25 of the UsageFault Status Register (UFSR) part of the CFSR is set.

UsageFault Status Register

The bit configuration of the UFSR is shown below, and unsurprisingly the outpur shows that bit 9 (DIVBYZERO) is set.

UFSR

We can now extend the HardFault handler to mask the top half of the CFSR, and if not zero then further report on those flags, as in:

void HardFault_Handler(void)
{
   static char msg[80];
   printErrorMsg("In Hard Fault Handler\n");
   sprintf(msg, "SCB->HFSR = 0x%08x\n", SCB->HFSR);
   printErrorMsg(msg);
   if ((SCB->HFSR & (1 << 30)) != 0) {
       printErrorMsg("Forced Hard Fault\n");
       sprintf(msg, "SCB->CFSR = 0x%08x\n", SCB->CFSR );
       printErrorMsg(msg);
       if((SCB->CFSR & 0xFFFF0000) != 0) {
         printUsageErrorMsg(SCB->CFSR);
      }
   }
   __ASM volatile("BKPT #01");
   while(1);
}
void printUsageErrorMsg(uint32_t CFSRValue)
{
   printErrorMsg("Usage fault: ");
   CFSRValue >>= 16;                  // right shift to lsb
   if((CFSRValue & (1 << 9)) != 0) {
      printErrorMsg("Divide by zero\n");
   }
}

A run should now result in the following output:

Div0msg

Register dump

One final thing we can do as part of any fault handler is to dump out known register contents as they were at the time of the exception. One really useful feature of the Cortex-M architecture is that a core set of registers are automatically stacked (by the hardware) as part of the exception handling mechanism. The set of stacked registers is shown below:

IntEntryStack

Using this knowledge in conjunction with AAPCS we can get access to these register values. First we modify our original HardFault handler by:

  • modifying it’s name to “Hard_Fault_Handler”
  • adding a parameter declared as an array of 32–bit unsigned integers.
void Hard_Fault_Handler(uint32_t stack[])

Based on AAPCS rules, we know that the parameter label (stack) will map onto register r0. We now implement the actual HardFault_Handler. This function simply copies the current Main Stack Pointer (MSP) into r0 and then branches to our Hard_Fault_Handler (this is based on ARM/Keil syntax):

__asm void HardFault_Handler(void) 
{
  MRS r0, MSP
  B __cpp(Hard_Fault_Handler) 
}

Finally we implement a function to dump the stack values based on their relative offset, e.g.

enum { r0, r1, r2, r3, r12, lr, pc, psr};

void stackDump(uint32_t stack[])
{
   static char msg[80];
   sprintf(msg, "r0  = 0x%08x\n", stack[r0]);  printErrorMsg(msg);
   sprintf(msg, "r1  = 0x%08x\n", stack[r1]);  printErrorMsg(msg);
   sprintf(msg, "r2  = 0x%08x\n", stack[r2]);  printErrorMsg(msg);
   sprintf(msg, "r3  = 0x%08x\n", stack[r3]);  printErrorMsg(msg);
   sprintf(msg, "r12 = 0x%08x\n", stack[r12]); printErrorMsg(msg);
   sprintf(msg, "lr  = 0x%08x\n", stack[lr]);  printErrorMsg(msg);
   sprintf(msg, "pc  = 0x%08x\n", stack[pc]);  printErrorMsg(msg);
   sprintf(msg, "psr = 0x%08x\n", stack[psr]); printErrorMsg(msg);
}

This function can then be called from the Fault handler, passing through the stack argument. Running the program should result is the following output:

StackDump

Examining the output, we can see that the program counter (pc) is reported as being the value 0×00000272, giving us the opcode generating the fault. If we disassemble the image using the command:

fromelf -c CM3_Fault_Handler.axf —output listing.txt

By trawling through the listing (listing.txt) we can see the SDIV instruction at offending line (note also r2 contains 10 and r1 the offending 0).

.text
div
0x00000270: 4602          MOV r2,r0
0x00000272: fb92f0f1      SDIV r0,r2,r1
0x00000276: 4770          BX lr
.text

Finally, if you’re going to use the privilege/non–privilege model, you’ll need to modify the HardFault_Handler to detect whether the exception happened in Thread mode or Handler mode. This can be achieved by checking bit 3 of the HardFault_Handler’s Link Register (lr) value. Bit 3 determines whether on return from the exception, the Main Stack Pointer (MSP) or Process Stack Pointer (PSP) is used.

__asm void HardFault_Handler(void)
{
  TST lr, #4     // Test for MSP or PSP
  ITE EQ
  MRSEQ r0, MSP
  MRSNE r0, PSP
  B __cpp(Hard_Fault_Handler)
}

In a later post I shall develop specific handler for all three faults.

Notes:

  1. The initial model for fault handling can be found in Joseph Yiu’s excellent book “The Definitive Guide to the ARM Cortex-M3
  2. The code shown was built using the ARM/Keil MDK-ARM Version 4.60 development environment (a 32Kb size limited evaluation is available from the Keil website)
  3. The code, deliberately, has not been refactored to remove the hard-coded (magic) values.
  4. Code available on GitHub at git://github.com/feabhas/CM3_Fault_Handler.git

Embedded System Conference – India

July 13th, 2012

 

This year I have honour of being invited to present at the Embedded Systems Conference in Bengaluru (Bangalore), India. Based on previous visits these classes are very well attended and always generate a lot of post-class discussions.

This year I’ve extended my previous 1/2 day class to a full day titled “Programming in C for the ARM Cortex-M Microcontroller”. Having a full day allows me to delve in too much greater detail. The class is broken down in to four subsections:

  • Cortex-M Architecture
  • C Programming and the Cortex-M
  • CMSIS (including CMSIS-RTOS)
  • Debug (including CoreSight)

An overview is covered here.

The other class is one I have presented at ESC in the US before but not in India; Understanding Mutual Exclusion: The Semaphores vs. the Mutex. This presentation is based around much of the material in some of my previous postings (see RTOS related blog postings ). I still find this class really interesting. In general, whenever you bring up the mutex/semaphore discussion most people jump to a per-conceived idea that they know the differences; by the end of this class most understand that their mental model was incorrect.

If anyone is attending then please come and say hello.

Can existing embedded applications benefit from Multicore Technology?

June 15th, 2012

It feels that not a day goes by without a new announcement regarding a major development in multicore technology. With so much press surrounding multicore, you have to ask the question “Is it for me?” i.e. can I utilise multicore technology in my embedded application?

However, from a software developer’s perspective, all the code examples seem to demonstrate the (same) massive performance improvements to “rendering fractals” or “ray tracing programs”. The examples always refer to Amdahl’s Law, showing gains when using, say, 16- or 128-cores. This is all very interesting, but not what I would imagine most embedded developers would consider “embedded”.  These types of programs are sometimes referred to as “embarrassingly parallel” as it is so obvious they would benefit from parallel processing. In addition the examples use proprietary solutions, such as TBB from Intel, or language extensions with limited platform support, e.g. OpenMP. In addition, this area of parallelisation is being addressed more and more by using multicore General Purpose Graphics Processing Units (GPGPU), such as PowerVR from Imagination Technologies and Mali from ARM, using OpenCL; however this is getting off-topic.

So taking “fractals”, OpenMP and GPGPUs out of the equation, is multicore really useful for embedded systems? Read more »

CMSIS-RTOS Presentation

May 11th, 2012

I have finally finished and sent off my presentation for next weeks Hitex one-day ARM User Conferences titled “ARM – the new standard across the board?” at the National Motorcycle Museum in Solihull.

Back in February, at the embeddedworld exhibition and conference in Nuremberg, Germany, ARM announced the latest version (version 3) of the Cortex(tm) Microcontroller Software Interface Standard (CMSIS). The major addition is the introduction of an abstraction layer for Real-Time Operating Systems (RTOS).

CMSIS_Logo_Final

The presentation I’m giving explains; what the abstraction layer offers, how it maps on to an underlying RTOSs API (e.g. ARM/Keil’s RTX), and what is required to re-target another RTOS.

If you can’t make the event (I would recommend it if you can, the last one had a lot of very useful information), then I plan to make the presentation available as a slide deck and a narrated video after the event, so watch this space.

%d bloggers like this: