You are currently browsing the Sticky Bits blog archives for October, 2009.

Task Synchronisation

October 15th, 2009
Synchronisation is an everyday event, both in the real-world and the computer program. For example meeting a friend for a coffee requires synchronisation, in that both parties need to arrive within a given timeframe to make the event worthwhile (sometimes referred to as a rendezvous – however this tends to have more romantic implications). Alternatively, receiving a PO via fax is a form of synchronisation. The company waiting on the PO will-not/cannot start working on the project until this event occurs. Finally, in an automated robot manufacturing system, the movement and work done by each robot must be synchronised with each other and in conjunction with the actual production line.
In the field of embedded systems’ software there are also many requirements for synchronisation with a program. In multi-tasking system using a RTOS examples are:
  • Asynchronous device driver where we are dealing with slow devices. We don’t necessarily want tasks blocked waiting on the device.
  • At system start-up, many RTOSs start tasks as active (ready to run). We may have an ordering dependency for execution (e.g. initialisation of global resources) where all tasks must wait for a given condition (the concept of a barrier which can be very important in multi-processor systems).
  • Having a managed task abort notification, rather than deleting tasks (which can lead to resource issues). Similar in concept to the UNIX/Linux kill signal. Also used to manage task pools.
The definition of synchronisation found on dictionary.com is :
synchronisation
noun
  1. the relation that exists when things occur at the same time; “the drug produces an increased synchrony of the brain waves” [syn: synchronism] [ant: asynchronism] 
  2. an adjustment that causes something to occur or recur in unison [syn: synchronization]
  3. coordinating by causing to indicate the same time; “the synchronization of their watches was an important preliminary” [syn: synchronization]
So synchronisation is about making sure event happen at the same time (as in a common clock in communications systems) as opposed to asynchronous which means not happen at the same time. Unfortunately, as we shall see, most texts related to RTOS descriptions misuse/misunderstand this term.
It should be noted that synchronisation and mutual exclusion often get lumped together and confused. Mutual exclusion is about making sure thing don’t happen at the same time, whereas synchronisation is about making sure they do.

In regard to task synchronisation there four classes of interaction we need to address:

  1. one-to-one – only two task synchronising
  2. one-to-many
  3. many-to-one
  4. many-to-many

Initially we address the condition where only two tasks are synchronising.

ONE-TO-ONE : SINGLE CONDITION
In the simplest case of synchronisation, we have two tasks (Task1 and Task2) that need to synchronise their execution.
  • Task2 runs until it reaches the synchronisation point as defined by an RTOS synchronisation object (SO), at which point it waits for Task1
  • Task1 now runs and reaches the synchronisation point, signals Task2 via the SO. Both tasks are now ready to run.
  • The higher priority task now continues execution and the lower priority task is made ready (If the tasks are of equal priority typically Task1 will continue as this avoids a context switch).
We can now say that Task1 and Task2 have synchronized their threads of execution.

However, what should the behavior be if Task1 arrives first? In terms of the dictionary definition of synchronisation, Task1 should wait for Task2. Unfortunately, with most RTOSs this is not the case and Task1 will continue execution without blocking. This means we need to further refine our definition of synchronisation to include the concepts of:
  • Symmetric or Bilateral Synchronisation
  • Asymmetric or Unilateral Synchronisation
BILATERAL SYNCHRONISATION
Bilateral synchronisation is the condition when whichever task arrives first it waits for the other one. This is often called the Rendezvous (as support by the Ada programming language). Surprisingly this is rarely supported by the majority of RTOSs. One limitation of bilateral synchronisation is that it cannot be used for ISR-to-task synchronisation (as an ISRs must never block).
UNILATERAL SYNCHRONISATION
Unilateral Synchronisation, which feels like a paradox, is the case where:
  • if Task2 arrives at the SO first it will wait for Task1, and then synchronisation with Task1 when it arrives
  • if Task1 arrives at the SO first it will not wait for Task2, thus unilateral synchronisation.

When reading literature associated with RTOSs and synchronisation, this is the most commonly described model (e.g. Li, 2003).
This, however, brings us yet another dilemma; what happens when Task2 now reaches the synchronisation point (assuming Task1 has already passed the synchronisation point)? Does Task2 block and wait for Task1, or does it see that Task1 has already visited to SO and continue? Or to put it another way, is the notification of synchronisation to the synch object from Task1 persistent or non-persistent?
NON-PERSISTENT UNILATERAL SYNCHRONISATION OBJECT
In a non-persistent model, the fact that Task1 has already passed the synchronisation point is not remembered, therefore Task2 blocks until Task1 signals again . Due to how most RTOSs actually support unilateral synchronisation (discussed later), this, like bilateral synchronisation, is also an uncommon model. Interestingly, Win32 supports this model using a concept called a PulseEvent. If no tasks are waiting when the PulseEvent is called then the event is discarded.
PERSISTENT UNILATERAL SYNCHRONISATION OBJECT
The key to this model is that the fact that Task1 has already signalled the SO (passed the synchronisation point) is remembered. When Task2 arrives at the synchronisation point is doesn’t block, but continues (this particular use case is actually asynchronous event notification). 
However, we have yet another dichotomy; does Task2 consumes the signal Task1 has set (auto-reset) or does Task1 clear the signal (manual-reset) at some later time.

In the consuming model, if Task2 now arrives back at the SO and waits, it will block until again signalled by Task1. In the manual-reset model, Task2 will continue to pass the synchronisation point until Task1 (or indeed Task2) explicitly clears the signal (normally with a separate API call).
Finally, in the consuming model what happens if Task1 signals the SO more than once before Task2 arrives at the synchronisation point, and therefore the original signal has not been consumed? One model is there is no effect, the signal remains set and is consumed when Task2 arrives (binary model). The alternative is that a count is kept of the number of signals set by Task1. Each time Task2 waits on the SO this count is decremented, and Task2 will only block if the count is zero.

So we can classify RTOS synchronisation into the following:

In the next posting I shall be looking at synchronisation involving more than two tasks and then following that one by examining some actual RTOSs and their support for synchronisation.

Mutex vs. Semaphores – Part 3 (final part): Mutual Exclusion Problems

October 5th, 2009

As hopefully you can see from the previous posting, the mutex is a significantly safer mechanism to use for implementing mutual exclusion around shared resources. Nevertheless, there are still a couple of problems that use of the mutex (in preference to the semaphore) will not solve. These are:

  • Circular deadlock
  • Non-cooperation

Circular Deadlock
Circular deadlock, often referred to as the “deadly embrace” problem is a condition where two or more tasks develop a circular dependency of mutual exclusion. Simply put, one task is blocked waiting on a mutex owned by another task. That other task is also block waiting on a mutex held by the first task.

So how can this happen? Take as an example a small control system. The system is made up of three tasks, a low priority Control task, a medium priority System Identification (SI) task and a high priority Alarm task. There is an analogue input shared by the Control and the SI tasks, which is protected by a mutex. There is also an analogue output protected by a different mutex.

The Control task waits for mutexes ADC and DAC:

mutex_lock (ADC);
mutex_lock (DAC);
/* critical section */
mutex_unlock (ADC);
mutex_unlock (DAC);

The SI Task waits for mutexes DAC and ADC:

mutex_lock (DAC);
mutex_lock (ADC);
/* critical section */
mutex_unlock (DAC);
mutex_unlock (ADC);

Unfortunately, under certain timing conditions, this can lead to deadlock. In this example the Control task has locked the ADC, but before locking the DAC has been pre-empted by the higher priory SI task. The SI task then locks the DAC and tries to lock the ADC. The SI task is now blocked as the ADC is already owned by the Control task. The Control task now runs and tries to lock the DAC. It is blocked as the DAC is held by the SI task. Neither task can continue until the mutex is unlocked and neither mutex can be unlocked until either task runs – classic deadlock.

For circular deadlock to occur the following conditions must all be true:

  • A thread has exclusive use of resources (Mutual exclusion)
  • A thread can hold on to a resource(s) whilst waiting for another resource (Hold and wait)
  • A circular dependency of thread and resources is set up (Circular waiting)
  • A thread never releases a resource until it is completely finished with it (No resource preemption)

These conditions can be addressed in a number of ways. For example, a design policy may stipulate that if a task needs to lock more than one mutex it must either lock all or none.

Priority Ceiling Protocol

With the Priority Ceiling Protocol (PCP) method each mutex has a defined priority ceiling, set to that of the highest priority task which uses the mutex. Any task using a mutex executes at its own priority – until a second task attempts to acquire the mutex.  At this point it has its priority raised to the ceiling value, preventing suspension and thus eliminating the “hold and wait” condition.

In the deadlock example shown before, the significant point is when the SI task tries to lock the DAC. Before that succeeded and lead to cyclic deadlock. However with a PCP mutex, both the ADC and DAC mutex will have a ceiling priority equal to the SI’s task priority. When the SI task tries to lock the DAC, then the run-time system will detect that the SI’s task priority is not higher than the priority of the locked mutex ADC. The run-time system suspends the SI task without locking the DAC mutex. The control task now inherits the priority of the SI task and resumes execution.

Non-cooperation

The last, but most important aspect of mutual exclusion covered in these ramblings relies on one founding principle: we have to rely on all tasks to access critical regions using mutual exclusion primitives. Unfortunately this is dependent on the design of the software and cannot be detected by the run-time system. This final problem was addressed by Tony Hoare, called the Monitor.

The Monitor

The monitor is a mechanism  not typically supplied by the RTOS, but something the programmer tends to build (a notable exception is Ada95’s protected object mechanism). A monitor simply encapsulates the shared resource and the locking mechanism into a single construct (e.g. a C++ Object that encapsulates the mutex mechanism). Access to the shared resource, then, is through a controlled interface which cannot be bypassed (i.e. the application never explicitly calls the mutex, but calls upon access functions).

Finishing Off…
This goal of these initial postings is to demonstrate that common terms used in the real-time programming community are open to ambiguity and interpretation. Hopefully you should now be clear about the core differences between the Binary Semaphore, General (counting) Semaphore and the Mutex.

The underlying difference between the Semaphores and the Mutex is the Principle of Ownership. Given the principle of ownership a particular implementation of a mutex may support Recursion, Priority inheritance and Death Detection.

ENDNOTE
An aspect of the mutex I haven’t covered here is that many operating systems support the concept of a condition variable. A condition variable allows a task to wait on a synchronization primitive within a critical region. The whole aspect Synchronization Patterns (e.g. semaphore as a signal) within the context of RTOSs will be the subject of my next posting.

%d bloggers like this: