Floating-point face off, part 3: How we do it

This posting continues to explore the performance of floating point and how microcontrollers can efficiently execute basic floating-point operations.

So, how much do you know?

When you are using a high-level language, such as C, it’s hard to know, for sure, what the overhead of using some particular language feature will be in terms of code size or execution speed. The machine instructions that implement language features are mostly hidden from you as a programmer, buried in a library or constructed by a compiler. And for some, being blind to such detail is simply not an issue they need to be aware of.

Microcontroller applications have a space and speed budget that engineers must manage, constrained by the devices available on the market and the price you’re willing to pay. When you want things to go faster, you usually have to pay with additional code (in flash) or additional working storage (in RAM). When you want things to be smaller, you usually pay in additional processor cycles with less-efficient, but more compact, algorithms.

As an expert embedded engineer, can you estimate how many instructions it would take, on average, to add two floating-point values? What about the maximum or minimum number of cycles? Did you take into account exceptional conditions such as infinite and not-a-number inputs, division by zero, overflow, and underflow? What about the particular instruction set of the target, whether it has a barrel shifter, a multiplier, or even a divider?

When implementing floating point support functions, each of these considerations must be taken into account. And the SEGGER Runtime Library does because it is customizable, so you can balance execution speed and code size.

Add, Subtract, Multiply, Divide

From early schooldays, we are taught how to add and subtract first, then multiply, and then how to divide. To be sure, you would never be taught how to divide before being taught how to add, subtract, and multiply, right? For long division you need to understand how to subtract and multiply, and to multiply you need to know how to add.

It’s clear that division must be the most complex operation for a computer, correct? Well, you would be dead wrong. Floating-point addition and subtraction are far trickier to code than floating-point division and, should you ever try to implement these fundamental operations, you would understand why.

Floating-point division is actually very easy to implement. Here’s the kernel of a 64-bit floating divide:

uint64_t a, b, c;
//
a = 0;
for (i = 0; i < 55; ++i) {
  a <<= 1;
  if (b >= c) {
    b -= c;
    a |= 1;
  }
  b <<= 1;
}

With some boilerplate code before and code after to extract and pack, that’s all there is to a floating divide. Remember those trade offs? This algorithm is small, straightforward, but also slow: it needs 55 iterations to provide a quotient and rounding bit. But a variant of this is in the SEGGER Runtime Library so that a highly compact runtime is possible.

Dividing faster

Let’s consider a processor that has a division capability. To implement a single-precision division, it’s possible to use a division instruction when calculating the significand part of the floating-point quotient, something like this:

  //
  // Divide in extended precision.
  //
  q = (((uint64_t)b) << 32) / c;

This uses a 64-by-32 division, which is usually very fast, to calculate the floating quotient. Again, the SEGGER Runtime Library uses a variation of this when beneficial on the architecture.

But there are even faster ways to divide b by c. If you can quickly compute the reciprocal of c and multiply by b, then you have an almost-correct quotient. Why almost-correct? Well, consider dividing 6 by 3 using this method. The fraction 1/3 is a recurring decimal (0.33333…) and it’s also a recurring binary fraction (0.010101…) which cannot be represented exactly. When this inexact approximation is multiplied by 6, the calculated quotient is not exactly two, it’s slightly less. But for a problem there is usually a solution, and we can use some simple mathematics to provide a correction factor that is applied to the quotient and always returns an IEEE-correct answer.

So the problem now is how to quickly calculate a reciprocal? This is a problem with a wealth of supporting literature. But, in essence, it reduces to looking up an estimate of the reciprocal of c in a table using the leading bits of c‘s significand, and refining it using the remaining bits. Clearly, more entries in the table improve the accuracy of the estimate, with each additional bit taken from c doubling the size of the table but potentially reducing the time needed to correct the quotient.

So it’s a trade off again, between table size and computation. And SEGGER has more than one implementation of floating division using such an algorithm.

The SEGGER Library provides a number of ways of calculating quotients, coded in C, Arm, and RISC-V assembly language, to suit code size and execution speed.

Some closing remarks

It’s interesting to compare the size of the source code for the floating-point library. For Arm, it’s 11,500 source lines with multiple variations for each of the architectures and architectural features. The same for RISC-V RV32 is just 5,000 lines as there are far fewer architectural variations.

Floating-point face-off, part 2: Comparing performance

I used to think that floating-point was not for Embedded Systems. Too slow, too much code overhead and rounding is always a problem. It turns out that while scaled integers still have a performance benefit, floating-point computations can be done with a surprisingly high performance these days on modern Embedded CPUs. This is true not […]

Finding the right (Q)SPI Flash for your project

Not all SPI Flashes are created equal SPI Flashes have become very popular as an inexpensive way to add nonvolatile storage (flash memory) to an Embedded System. They come in various capacities, so increasing memory is fast and easy. Any microcontroller can interface to them via SPI or QSPI peripheral interfaces, or simple general purpose […]

The Good, The Best, and The Only

In almost any area that invites comparison people want to know: “Who’s the best?” This goes for hotels, restaurants, manufacturers, and certainly sports teams. The embedded world is no exception. Except in the embedded world there is no contest or competition to decide a winner. The Good The embedded industry has a lot of good […]

IoT Station#1 Launched (using emMQTT)

Here’s more of “What’s cooking at SEGGER”.  We were looking for a quick and easy way to demonstrate an integration of SEGGER’s software libraries, embOS, emNet, emSSL, emMQTT and the IoT Toolkit, with IoT cloud services, such as Amazon Web Services (AWS). Having the idea, the software, the hardware, and the expertise, all we needed […]

Using AppWizard to create interesting buttons

In my previous post, I described how to make a nice knob / dial using the SEGGER AppWizard. This time it’s all about creating some cool buttons. The AppWizard Button Widget The AppWizard, SEGGER’s new, still-under-development GUI builder tool, offers a variety of widgets. One of the is the Button widget. It allows adding a […]

Turning an FPGA into a powerful microcontroller. No external memory required.

FPGAs can do things that CPUs cannot do as quickly, or as efficiently, or simply cannot do at all. The bottleneck on rapidly expanding market usage for a powerful FPGA, with hard CPU cores, is that almost all applications use Linux. This requires a lot of memory, most of all RAM, and insufficient built-in RAM […]

Creating a cool knob using the SEGGER AppWizard

This article documents the process of creating a decent-looking knob for an embedded GUI, using the new SEGGER AppWizard, which – at the time of this writing – is still under construction. I describe the hunt for suitable graphics, the tools I used to adapt them to meet my needs, and how I ended up with […]