Using a watchdog in a multi-task (RTOS) environment

Clementine, a NASA satellite to test sensors and spacecraft components under extended exposure to the space environment, was launched on 25 January 1994. For the lack of a few lines of watchdog code, her mission was lost on 7 May 1994.

Clementine had performed lunar mapping for approximately two consecutive months when she left lunar orbit and headed for her next target, the near-Earth asteroid Geographos. Soon, however, a malfunction in one of Clementine’s on-board computers occurred, effectively cutting NASA from operating the spacecraft and causing one of its thrusters to fire uncontrolled.
NASA spent 20 minutes trying to bring the system back to life, but to no avail. A hardware reset command finally brought Clementine back online, but it was too late: she had already used up all of her fuel, and the mission’s continuation had to be canceled.
Subsequently, the development team responsible for Clementine’s software wished they had used the hardware’s watchdog timer, when it became evident that the software timeouts they had implemented had been insufficient.

How could a watchdog have helped?

A watchdog is a piece of hardware that’s either integrated directly into a microcontroller, or is attached to a microcontroller externally. Its main purpose is to perform an error handling (usually a hardware reset) when it can safely assume that the system has hung or is otherwise executing improperly.
A watchdog’s main component is a counter that initially gets configured for a certain value and subsequently counts down to zero. The software must frequently re-set this counter to its initial value to ensure that it never reaches zero. Otherwise, a malfunction is assumed and, usually, the CPU will be reset. This suggests watchdogs for a last resort, an option taken only when everything else has failed. As it could have been the case with Clementine.

How to feed the watchdog

Properly using a watchdog timer, however, is not as simple as restarting the counter (a process often referred to as “feeding” or “kicking” the watchdog). With a watchdog timer running in their system, developers must carefully choose the watchdog’s timeout period so the watchdog can intervene before a malfunctioning system can perform any irreversible malicious actions.

In simple applications, specifically without the use of an RTOS, developers would usually feed the watchdog from the main loop. This approach merely requires configuration of an appropriate initial counter value, which can be as simple as choosing any value that exceeds the worst-case execution time of the entire main loop by at least one timer cycle. This often is a fairly robust approach: While some systems will require immediate recovery, others merely need to ensure they are not hung indefinitely – and this will definitely get the job done.

In a multitask (RTOS) environment

In more complex systems, however, specifically with multi-tasking systems, various threads could potentially hang on various occasions and for various reasons. Some threads are OK to not run for long times, such as a thread waiting for potential network communication. A clean method to feed the watchdog periodically, while still ensuring that each distinct process is in good health, became a major challenge for developers of these systems, who for example need to focus on:

  • Whether the OS is executing properly
  • Whether high-priority tasks are exhausting the CPU, preventing low-priority tasks from running at all
  • Whether a deadlock has occurred that inhibits the execution of one or several tasks
  • Whether a task routine is executing properly and entirely

Developers also need to ensure that any modification performed to their source code, whether it be a dedicated watch dog tasks or specific modifications to the monitored tasks, must be small and optimized for efficiency in order to keep intrusiveness at a minimum.

Utilize the watchdog support of your RTOS

For this reason, state-of-the-art RTOS’s like SEGGER’s embOS offer comprehensive watchdog solutions to their customers in order to simplify the watchdog handling and thereby reduce the time spend on any development process.

The general principles applied with these solutions may vary between different RTOS’s. At SEGGER, however, versatility and ease-of-use are deemed of capital importance, while still keeping the required footprint to a minimum in both memory usage and execution time. To the embedded experts it therefore was evident that a comprehensive set of API functions was required that allows for both

  • the individual registration of tasks, timers, and even ISRs with the underlying embOS watchdog module, as well as
  • the possibility to test the intended watchdog conditions flexibly from any desired context.

The final implementation now consists of mere five API functions, yet is powerful enough to suffice any intended purpose.
Using these API functions, a task would simply register itself with the embOS watchdog module and would simultaneously configure its timeout period individually. The task could then signal its proper execution periodically by calling one simple embOS API function. Whether all monitored tasks have signaled their proper execution within their specified timeout period, subsequently gets checked by another single embOS API call, which may either be performed from within a dedicated watchdog task, from within OS_Idle(), or even from within the periodic OS timer interrupt service routine or any other ISR.

Users would merely need to provide and register two functions: The first performs the hardware-dependent feeding of the watchdog, while the other specifies further actions in case the watchdog counter reaches zero. E.g., this allows the storage of a log file to non-volatile memory, containing further information on the system status before performing a hardware reset or taking any other action.


When starting to design and develop an application with a watchdog, make sure you decide early on how you intend to use it – and consider the available tools that will aid you in achieving it more swiftly. At least, you wouldn’t want to get stranded in space, would you?

Performance tuning our software

As you may have noticed, SEGGER have introduced a cryptographic algorithm library, emCrypt. We released this product as existing and new customers wanted to use the “hidden” cryptographic capabilities of emSSL but didn’t need to run SSL/TLS as a protocol. Well, that is not entirely true, some customers already had licenses for emSSL but also […]

Getting printf Output from Target to Debugger

  Erich Styger recently posted a great tutorial on how to add console functionality using Single Wire Output (SWO) on ARM Cortex-M targets. This inspired me to write a more general post on debug output (“printf”) implementations on embedded target, including SWO and RTT. Debug Output from a Target There are different methods to get debug output from the […]

Update on: Comparing Performance on Windows, Linux and OS X

If you haven’t read the original post, have a look at it: Comparing Performance on Windows, Linux and OS X I got my computer upgraded 🙂 It is very tiny, an Intel NUC Kit. But what matters is what is inside: An Intel i7 with 4 cores and hyper-threading, so like 8 processors, 16 GB […]

Why you should benchmark your embedded system

There are plenty of potential reasons why an embedded system may not deliver the full CPU performance. This is not even that easy to detect, so here is a way to check if your system gives you the performance you expect. What can go wrong? Today’s embedded systems are complex computers. Microcontrollers are usually easiest […]

Comparing Performance on Windows, Linux and OS X

Last week, I compared the speed of the 64-bit and the 32-bit build of Embedded Studio and the GCC compiler. The 64-bit version was the clear winner, with a performance gain of about 5 – 20%. But what can we get from working with different operating systems? At SEGGER, we developers are free to select the operating system […]