Every byte counts - Smallest "Hello world"

When searching on-line for small C-programs, there seems to be a lot of confusion about what is doable and what is not.

There are a lot of posts wondering why even for minimal programs such as “Hello world” applications are so big, but not many explanations or fixes.
I will show how to make a very small “Hello world” application using Embedded Studio.

“Hello world” application

The below is the classic “Hello world” application that we will use for discussion.

#include <stdio.h>

int main(void) {
  printf("Hello world!");
  return 0;
}

Windows

On Windows, Microsoft’s Visual Studio is the de-facto standard. Microsoft managed to not only make the executables big, but also now requires a package to redistribute a “redistributable package”.
https://support.microsoft.com/en-us/help/2977003/the-latest-supported-visual-c-downloads

It is hard to see why they would do this. They supply the operating system and the toolchain, and this was not necessary with older versions of Visual Studio.
It is still possible to generate really small windows executables these days,
but that might be focus of another article.

Linux

The situation is better under Linux. No such thing as re-distributables.

$ gcc ./hello.c
$ ./a.out

Gives us the resulting text:

Hello world!

in the terminal window.
The size of a.out is 16608 (on my 64-bit system).
Using strip,

strip ./a.out

we can eliminate unused (symbol) information and bring it down to 14408 bytes.
We can let GCC produce an ASM version of main.c, and see that main is translated well, so that most of the size comes from startup and library code.
The actual application program only uses a few instructions:

.file "main.c"
.text
.section .rodata
.LC0:
.string "Hello world"
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
leaq .LC0(%rip), %rdi
movl $0, %eax
call printf@PLT
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Debian 8.3.0-6) 8.3.0"
.section .note.GNU-stack,"",@progbits

Embedded Studio

For Embedded Systems, small code is a lot more important. Yet most toolchains are not much better at producing small code.
Let’s take a look at how Embedded Studio does things and how big the output is.
As a starting point, I use my blinky project.
The next step is to change the application program to the “Hello world” application shown above.
Rebuilding gives me a 1.7KB Program.

While this is not bad, it is nowhere near how small the program can be. As the screenshot above shows, only 29+32 = 61 bytes actually come from the application and startup code. Where is all the rest coming from?

Easy. It comes from RTT, SEGGER’s Real-time-transfer technology and the printf formatter. The formatter is the piece of software that replaces the place holders such as %d and %s by the parameters given (which are not even needed here, since “Hello world!” does not take any parameters).
The RTT code stores output data in a RAM buffer, which is continuously monitored and drained by the debugger. It is the fact that writing to RTT is basically just a memcpy that makes it so fast. The CPU does not need to be stopped and real time behavior is hardly affected. RTT is minimally intrusive, the cost of copying the string is way below 0.5 us on a typical Cortex-M microcontroller, allowing this type of communication to be used even in very time critical environments with hard real time requirements.
This is why we make RTT the default printf() implementation.

Host-side formatting for printf

As an option available for all processors,
breakpoint-based implementations can be selected from the list of options:

This is a good option for most programs. Both work seamlessly in real hardware using J-Link as well as in the simulator.
The interesting point to notice here is that the formatting is done on the host, so by the debugger. All of the formatting code now no longer needs to be included in the generated program, making it really small, as both the Project Explorer and the Output window report:

Nice!
Running the program, we get the expected:

Pressing F5 again we see the expected in the Debug terminal:

“Hello world” in only 125 bytes in Flash!

But, making every byte count, we can go one step further.
Embedded Studio comes with two sets of system libraries, one optimized for size, and the other one optimized for speed.
The default for a new project is speed. Let’s change that to size:

Rebuilding now brings us down even further.

117bytes. Hard to beat!
I could dig down into how exactly host-side formatting works, but this will lead to far. It is basically simple: The printf() runs into a breakpoint(ed) instruction. The debugger does all the rest.
You can easily try this yourself by downloading Embedded Studio. No license is required for evaluation or non-commercial work, it can easily be downloaded and installed on any supported platform.

Size of the elf file

For Embedded Systems, the number of bytes in Flash memory is the important point, not so much the size of the ELF file. However, we can also look at the ELF File, which is 7470 bytes.
This includes all symbolic information, which is not required. We can strip the symbolic information (which does not normally make a lot of sense,
just here to have an apples to apples comparison) by telling the linker to do so:

This brings the size of the ELF file down to 1076 bytes, so about 14 times smaller than the Linux executable.
However, the important number is the 117 bytes of ROM (typically Flash) memory used.

At SEGGER, we make every byte work.
Try the same with other toolchains for Embedded Systems.
I’d be very surprised if you can even come close to this number.

SEGGER Blog

See what's cooking at SEGGER