emVDSP vs CMSIS-DSP

In recent times, artificial intelligence (AI) and machine learning (ML) have become hot topics, enabling useful applications such as assistive and autonomous driving. Intelligent accessories in the home are now mainstream, employing adaptive audio and acoustic beamforming.

This series of articles introduces what’s on the bench at SEGGER Labs…and coming soon.

SEGGER emVDSP

SEGGER’s emVDSP product is a signal processing and vector library that is targeted to multiple architectures. emVDSP presents a regular API across all data types for all targets. Where algorithms can be accelerated, they take advantage of underlying hardware features: that’s the “V” part of the name, emVDSP will use vector instructions to run multiple operations in parallel to deliver blistering performance.

emVDSP currently supports the following architectures:

  • Cortex-M with DSP and SIMD instructions (v7EM)
  • Cortex-A with NEON (Advanced SIMD) instructions (v7A, v8A)
  • Cortex-M with Helium instructions (v8.1M+MVE)
  • Older Arm cores with the DSP E extension (v5TE)
  • RISC-V with the Packed SIMD P extension (RV32P, RV64P)
  • RISC-V with the Vector extension (RV32V, RV64V)
  • Intel IA32/AMD64 with MMX and Advanced Vector Extensions (AVX, AVX2, and AVX-512)
  • Portable C code for use on any processor

The library contains a range of general-purpose algorithms that are well-tuned for typical digital signal processors and conventional processors.

Why construct emVDSP?

The answer is simple: to provide a quality library featuring a DSP and vector API that doesn’t lock you in and ensures the API is regular. This means that if an algorithm is available for a particular type, it should most likely be available for all supported types (only where it makes sense, of course!) This is in contrast to other DSP libraries that offer algorithms only for the operations and data types supported by the underlying hardware. Because emVDSP can run on conventional processors, all emVDSP functions are offered across all architectures—sure, things might run a little slower without hardware-level support, but not so much as to be unusable.

As there is no standardized API for DSP work, changing architectures may require overcoming porting inertia to use a different signal processing API. Using a vendor-neutral API such as emVDSP provides agility and independence as you’re able to switch processors without rewriting existing software.

Configuring emVDSP

Configuration of the library is controlled by a single file that parameterizes the C-level algorithms to use particular features of an architecture for best performance.

Retargeting the library to a new architecture starts with the portable C code and a minimal configuration file to deliver working code on the intended target. This is known as driving a spike through the library: the library works, but may not be efficient. Development continues by widening the spike, tailoring the configuration file to extract the best from the architecture.

At each stage it’s possible to run the emVDSP test suite and benchmarks to ensure correct operation and measure performance gains.

Preliminary results

Although unreleased, preliminary results comparing emVDSP against CMSIS-DSP and the Intel Performance Primitives are good.

Below is the benchmark of a selection of emVDSP functions against corresponding CMSIS-DSP functions on a Cortex-A9. The “Real.SD%” column is the relative standard deviation as a percentage, a measure of how repeatable the timing of the benchmark is. The relative standard deviations indicate that cycle timings are very accurate.

As you will see, emVDSP outperforms CMSIS-DSP across the board in the standard distribution without tuning. And, in fact, each function can be individually tuned in emVDSP, whereas CMSIS-DSP only offers coarse-grain optimization by unrolling as a project-wide option.

 

SEGGER Vector-DSP Library Benchmark
Copyright (c) 2019-2021 SEGGER Microcontroller GmbH

Target:   Cortex-A
Compiler: SEGGER cc 11.4.4
Config:   VDSP_DEFAULT_UNROLL   = 2
Config:   VDSP_DEFAULT_PIPELINE = 2

                         SEGGER VDSP              CMSIS-DSP      
                     ------------------  ----------------------------
Function               Cycles   Rel.SD%    Cycles   Rel.SD%  Rel.Perf
-------------------  ------------------  ------------------  --------
Abs, Q7                  2334      0.14     32112      0.01    13.75x
Abs, Q15                 2333      0.09      8232      0.01     3.53x
Abs, Q31                 2336      0.17      2333      0.14     1.00x
Abs, F32                 2593      0.14      2844      0.08     1.10x
-------------------  ------------------  ------------------  --------
Neg, Q7                  2335      0.15     37930      0.00    16.24x
Neg, Q15                 2334      0.12     36393      0.01    15.59x
Neg, Q31                 2334      0.13      2745      0.16     1.18x
Neg, F32                 2590      0.14      5151      0.03     1.99x
-------------------  ------------------  ------------------  --------
MinReduce, Q7            1008      0.31     22839      0.02    22.65x
MinReduce, Q15            984      0.41     10809      0.03    10.98x
MinReduce, Q31            972      0.37      3482      0.69     3.58x
MinReduce, F32           1149      0.20      5433      0.36     4.73x
-------------------  ------------------  ------------------  --------
MaxReduce, Q7            1008      0.33     22842      0.01    22.66x
MaxReduce, Q15            980      0.31     10807      0.03    11.02x
MaxReduce, Q31            971      0.34      3454      0.10     3.56x
MaxReduce, F32           1143      0.95      5436      0.30     4.76x
-------------------  ------------------  ------------------  --------
Add, Q7                  3230      0.13     53292      0.01    16.50x
Add, Q15                 3231      0.11     53805      0.00    16.65x
Add, Q31                 3230      0.08      3624      0.07     1.12x
Add, F32                 3296      0.13      3605      0.06     1.09x
-------------------  ------------------  ------------------  --------
Add, Scalar, Q7          2532      0.10     36909      0.01    14.57x
Add, Scalar, Q15         2527      0.16     36394      0.00    14.40x
Add, Scalar, Q31         2527      0.17      3107      0.09     1.23x
Add, Scalar, F32         2783      0.13      7191      0.03     2.58x
-------------------  ------------------  ------------------  --------
Sub, Q7                  3424      0.11     53294      0.01    15.56x
Sub, Q15                 3422      0.07     53807      0.01    15.72x
Sub, Q31                 3429      0.08      3623      0.06     1.06x
-------------------  ------------------  ------------------  --------
Mul, Q7                  6420      0.07     42033      0.01     6.55x
Mul, Q15                 3358      0.12     55341      0.00    16.48x
Mul, Q31                 3741      0.10      6960      0.03     1.86x
Mul, F32                 3488      0.12      3606      0.08     1.03x
-------------------  ------------------  ------------------  --------
Mul, Scalar, Q7          4759      0.07     38965      0.01     8.19x
Mul, Scalar, Q15         3100      0.14     37425      0.01    12.07x
Mul, Scalar, Q31         2848      0.11     11575      0.02     4.06x
Mul, Scalar, F32         2909      0.15      4385      0.06     1.51x
-------------------  ------------------  ------------------  --------
Mean, Q7                 3671      0.09     22594      0.01     6.15x
Mean, Q15                1658      0.23     19050      0.02    11.49x
Mean, Q31                2116      0.17      3618      0.67     1.71x
Mean, F32                1178      0.54      5149      0.12     4.37x
-------------------  ------------------  ------------------  --------

STOP

 

Conclusion

emVDSP has excellent performance on Cortex devices. Comparing libraries for Intel x86 (both 32-bit and 64-bit) and for RISC-V (both Packed SIMD and vector extensions), the results are equally good.

Stay tuned for more articles on emVDSP’s features, its portable API, and the tools we use to tune it.

Interested?

If you’re interested in learning more about emVDSP, you can contact us at info@segger.com.