The SEGGER Linker – Replacing the GNU linker

At SEGGER, we pretty much use our own tools and products to develop our products. That includes using our middleware, such as embOS, embOS/IP, emUSB, emFile, web and FTP Servers and so on, as part of the firmware of our J-Link, J-Trace and Flasher products.  And the other way round, utilizing the same hardware products, most of all the J-Link, to develop, test and constantly improve our middleware. Using our own products in house helps us to check usability and improve them. I think we have come a long way and have great products pretty much in every area.

For an IDE, we have completely switched to Embedded Studio. It is a great piece of software that can be used free of charge for non-commercial purposes! Ready for development out of the box, with both GCC and Clang/LLVM compilers.

Embedded Studio comes with our own runtime library, which I believe is second to none. Optimized to the bone, it leaves not only the GNU runtime, but also its  commercial counterparts, in the dust.
When comparing Embedded Studio to other commercial products, we realized that one weak point is the GNU linker.

Old-school GNU linker blues

The GNU linker has evolved from the Unix world, where megabytes of linear virtual addressing is commonplace, disk space is unbounded, and processing power is plentiful.  This is as far from a low-end embedded system as you could imagine.

Small embedded systems – usually microcontrollers with built-in memories -are complex.  Typically they have separate memory areas for flash and RAM.  But, to enhance performance, RAM is usually divided into distinct regions so they can be accessed simultaneously by the CPU and peripherals, or even other CPUs in the same device.

The GNU linker has a number of deficiencies in this alien world:

  • It’s not flexible enough to deal with typical “keep-out” areas common to embedded firmware envonments, e.g. calibration data, flash protection bytes, fixed-address jump tables for ROM or bootloader APIs and so on.
  • With multiple RAM regions, it cannot automatically split data over those RAM regions, requiring the user to choose the placement of data manually across the regions.
  • Linkage speed is acceptable, but not fast. When linking large megabyte-order firmware images, with even larger multi-megabyte debug data, time can seep away when linking over and over again.
  • It does not automatically handle initialization of read-write sections, delegating that to the loader in Unix systems.  In embedded systems the user is responsible for copying the initialized image from flash to RAM and zeroing “bss” sections before entering main(). And the GNU linker cannot compress the initialization image to reduce flash use or automatically compute a CRC to support image integrity checks.
  • The map file is almost incomprehensible.  Because memory allocation, function and data sizes, and what goes into a microcontroller is highly important, not having an accurate, easy-to-read map file is unforgivable.

So…we decided to write our own linker!

Yes, we would write the SEGGER Linker, from scratch and without any legacy code or legacy thinking.  The linker’s design brief is simply to avoid the disadvantages of the GNU linker, making linking simple, and solving linking problems for the embedded developer.

A new, zero-legacy SEGGER linker

The design goals of the SEGGER linker are easily stated:

  • High linkage speed, even for large applications
  • Modular linkage: only link in what is required, automatically
  • Straightforward, easy-to-read map file
  • Option to compute how much code / data is pulled in because of one or multiple particular symbols — important to measure code size for middleware options, such as”How much Flash (RO) memory does a particular cipher (for emSSL) need?”
  • Compression options to minimize flash-copy space for initialzed data and code in RAM
  • Compatibility options for other popular linkers such as IAR and ARM
  • Tail optimization for code that calls another function as last operation, so that the tail can be merged with the called function
  • Different ways of sorting input fragments (functions and data): alphabetically, by call distance to improve locality, by alignment to improve packing, by size — these are but a few
  • Automatic inlining of small functions
  • Optionally eliminating functions that have identical bodies at the instruction level
  • …and more…

First results

Here it is…not yet ready for release, the first results are very encouraging.

We have used a 350kB application (debug build) of embOS with our TCP/IP stack embOS/IP and with an emSSL test programs, with multiple cipher suites, public key algorithms, and so on, so a representative application.

Where the GNU linker needs about 1 second to link (not so bad), our new linker  has built-in timing analysis and only needs 60 ms to link the application whilst also provding better section placement to minimize application size:

Copyright (c) 2017 SEGGER Microcontroller GmbH & Co. KG    www.segger.com
SEGGER Linker v1.00 compiled Sep 20 2017 22:03:06

Performance:
  File I/O
    677 ELF modules from:
      32 ELF files
      15 archive files
    Data in: 49543 KB
  Processing
    Ingest                  44.95 ms
    Linker symbols           0.00 ms
    Find sections            4.79 ms
    Parse script             0.69 ms
    Map image                0.11 ms
    Rewrite headers          0.02 ms
    Relocate image #0        1.99 ms
    Create inittab           0.71 ms
    Relocate image #1        0.73 ms
    Write image              6.62 ms
    Print map                0.00 ms
    Link total:             60.66 ms

It’s interesting to observe that 50 MB of ELF input produces just 350 KB of application code and readonly data, and that file I/O dominates linking as the Ingest and Write Image phases together account for 51.6ms of the total 60.7ms link time.

This is one fast linker!

The application obviously runs without any problem, and loading the resulting file into the debugger is also faster since the debug information in the generated ELF file is neat and only contains information related to what is actually linked in. Here is the output of the sample program:

0:000 MainTask - INIT: embOS/IP init started. Version 3.23b
 0:000 MainTask - *********************************************************************
 0:000 MainTask - *                      embOS/IP Configuration                       *
 0:000 MainTask - *********************************************************************
 0:000 MainTask - * IP_DEBUG: 2
 0:000 MainTask - * Memory added: 24576 bytes
 0:000 MainTask - * Buffer configuration:
 0:000 MainTask - *   12 buffers of 256 bytes
 0:000 MainTask - *   6 buffers of 1516 bytes
 0:001 MainTask - * TCP Tx/Rx window size per socket: 4380/4380 bytes
 0:001 MainTask - * Number of interfaces added: 1
 0:001 MainTask - * Interface #0 configuration:
 0:001 MainTask - *   Type: ETH
 0:001 MainTask - *   MTU: 1500
 0:001 MainTask - *   HW addr.: 00:22:C7:AB:FF:22
 0:001 MainTask - *********************************************************************
 0:018 MainTask - INIT: Link is down
 0:018 MainTask - DRIVER: Found PHY with Id 0x181 at addr 0x0
 0:018 MainTask -
 0:022 MainTask - 3:000 IP_Task - LINK: Link state changed: Full duplex, 100MHz
 4:000 IP_Task - DHCPc: Sending discover!
 4:000 IP_Task - DHCPc: IFace 0: Offer: IP: 10.0.0.183, Mask: 255.255.255.0, GW: 10.0.0.3.
 5:000 IP_Task - DHCPc: IP addr. checked, no conflicts
 5:000 IP_Task - DHCPc: Sending Request.
 5:002 IP_Task - DHCPc: IFace 0: Using IP: 10.0.0.183, Mask: 255.255.255.0, GW: 10.0.0.3.
 Scanning cipher suites on http://www.google.com:443
 C009  TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA          TLS 1.2  1637 ms,    26 ms socket,  1611 ms connect
 C02B  TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256       TLS 1.2  1662 ms,    25 ms socket,  1637 ms connect
 C00A  TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA          TLS 1.2  1648 ms,    26 ms socket,  1622 ms connect
...

SEGGER vs GNU linker

We benchmarked the SEGGER linker against the GNU linker using identical application object code and libraries–the same ELF files and archives were provided, in the same order, to both linkers.

Because the SEGGER linker automatically creates initialization code for the application (to initialize data before entering main), a small section of one startup file (thumb_crt0.s) required modification, but otherwise its contents were unchanged.  In fact, the startup file for the SEGGER linker is far shorter and simpler than the GNU equivalent as there is no need for explicit user-written initialization code that is always included, even for seldom-used sections, and you don’t need to write that code (or forget to write it) as the linker does it for you so that “it simply works!”

Here is the outcome:

          Flash     RAM
 GNU     354,936  80,160
 SEGGER  348,824  80,132

The new SEGGER linker is more than 10 times as fast and reduces code size by 2%!

Best of all, this does not even use compression for the initialized segments and does not put any code in RAM.  So GNU linker loses about 2% efficiency right from the starting line.  We will investigate further and keep you posted on the progress and findings.

An exciting project … And lots of fun. Obviously, the SEGGER Linker will be free for non-commercial use just like all of Embedded Studio.

Using a watchdog in a multi-task (RTOS) environment

Clementine, a NASA satellite to test sensors and spacecraft components under extended exposure to the space environment, was launched on 25 January 1994. For the lack of a few lines of watchdog code, her mission was lost on 7 May 1994. Clementine had performed lunar mapping for approximately two consecutive months when she left lunar […]

Performance tuning our software

As you may have noticed, SEGGER have introduced a cryptographic algorithm library, emCrypt. We released this product as existing and new customers wanted to use the “hidden” cryptographic capabilities of emSSL but didn’t need to run SSL/TLS as a protocol. Well, that is not entirely true, some customers already had licenses for emSSL but also […]

Getting printf Output from Target to Debugger

  Erich Styger recently posted a great tutorial on how to add console functionality using Single Wire Output (SWO) on ARM Cortex-M targets. This inspired me to write a more general post on debug output (“printf”) implementations on embedded target, including SWO and RTT. Debug Output from a Target There are different methods to get debug output from the […]

Update on: Comparing Performance on Windows, Linux and OS X

If you haven’t read the original post, have a look at it: Comparing Performance on Windows, Linux and OS X I got my computer upgraded 🙂 It is very tiny, an Intel NUC Kit. But what matters is what is inside: An Intel i7 with 4 cores and hyper-threading, so like 8 processors, 16 GB […]

Why you should benchmark your embedded system

There are plenty of potential reasons why an embedded system may not deliver the full CPU performance. This is not even that easy to detect, so here is a way to check if your system gives you the performance you expect. What can go wrong? Today’s embedded systems are complex computers. Microcontrollers are usually easiest […]

Comparing Performance on Windows, Linux and OS X

Last week, I compared the speed of the 64-bit and the 32-bit build of Embedded Studio and the GCC compiler. The 64-bit version was the clear winner, with a performance gain of about 5 – 20%. But what can we get from working with different operating systems? At SEGGER, we developers are free to select the operating system […]