Edge Impulse
Edge Impulse
Edge Impulse
Shawn Hymel * Colby Banbury * Daniel Situnayake Alex Elium Carl Ward Mat Kelcey Mathijs Baaijens
Mateusz Majchrzycki Jenny Plunkett David Tischler Alessandro Grande Louis Moreau Dmitry Maslov
Artie Beavis Jan Jongboom Vijay Janapa Reddi
A BSTRACT
Edge Impulse is a cloud-based machine learning operations (MLOps) platform for developing embedded and edge
arXiv:2212.03332v3 [cs.DC] 28 Apr 2023
ML (TinyML) systems that can be deployed to a wide range of hardware targets. Current TinyML workflows are
plagued by fragmented software stacks and heterogeneous deployment hardware, making ML model optimizations
difficult and unportable. We present Edge Impulse, a practical MLOps platform for developing TinyML systems
at scale. Edge Impulse addresses these challenges and streamlines the TinyML design cycle by supporting
various software and hardware optimizations to create an extensible and portable software stack for a multitude of
embedded systems. As of Oct. 2022, Edge Impulse hosts 118,185 projects from 50,953 developers.
Figure 1. The challenges associated with the ML Workflow and features of Edge Impulse that solve those challenges.
specific optimization strategies. In addition, the ab- alongside their model and deployment code. Rather than
sence of automated machine learning (AutoML) tools relying on prebuilt datasets or requiring users to construct
to assist non-domain experts in developing model archi- their own data gathering technology, Edge Impulse offers
tectures for embedded systems restricts accessibility. a variety of methods to gather data in real-world environ-
5. Monitoring challenge. In contrast to traditional cloud- ments. The second contribution is pairing preprocessing
based machine learning systems that rely on mature feature extraction with deep learning, which allows users
software and hardware stacks, there is no unified to explore a range of possible solutions to their individual
MLOps framework for programmatically updating problem or task. The Edge Optimized Neural (EON) Tuner
datasets, training models, and deploying them to em- assists in this task by automatically exploring a user-defined
bedded devices. Additionally, few benchmarking tools search space of both preprocessors and ML models. The
exist to quantify model performance on diverse embed- third contribution is an extensible and portable inferencing
ded architectures that are highly heterogeneous. library that can be deployed across a wide range of edge and
embedded systems. The EON Compiler removes the over-
Edge Impulse, an online platform designed to simplify the head required by the TFLM interpreter, thereby reducing
process of collecting data, train deep learning models, and usage of the limited RAM and flash space.
deploy them to embedded and edge computing devices,
allows us to address these aforementioned issues. Edge In this paper, we outline the challenges of developing and
Impulse targets customers in the business sector who want to deploying ML models to embedded devices from an indus-
develop edge machine learning (ML) solutions for a variety try practice perspective. Next, we describe the architec-
of problems. However, the Edge Impulse platform also ture and use cases for a platform designed specifically to
facilitates a research- and classroom-friendly environment. address these obstacles. We provide several examples to
illustrate how this platform has been utilized successfully
Figure 1 illustrates the end-to-end ML workflow of Edge in industry, academia, and research institutions to develop
Impulse. Edge Impulse simplifies the process of data collec- novel machine learning-based solutions. Finally, we pro-
tion and curation for users and streamlines the training and vide an evaluation of the performance and portability of the
evaluation of models. Users can interact with the training platform-generated inference code.
and deployment process via a combination of a web-based
graphical user interface (GUI) and an API. Edge Impulse
also provides an extensible and portable C/C++ library that 2 E MBEDDED E COSYSTEM C HALLENGES
encapsulates the preprocessing code and trained model to In Section 1, we highlight the challenges faced by an em-
make inferencing simple across a wide range of target de- bedded ML developer. In this section, we highlight the
vices as well as a number of target-specific optimizations to challenges posed by the Embedded ecosystem which make
reduce inference time and model memory consumption. platform and framework development difficult.
Edge Impulse offers several key technical contributions that
are unique. The first contribution is a data collection sys-
tem that helps users collect and store training and test data
Edge Impulse: An MLOps Platform for Tiny Machine Learning
2.1 Device Resource Constraints memory access patterns, which often makes TinyML appli-
cations difficult to port across devices. This complexity is
TinyML systems often have very limited computational ca-
exacerbated when a creating an application at scale, which
pabilities, due to their small size, cost, and energy budget.
must be deployed to a wide variety of devices, each with
Microcontrollers, which are the most common and general
their own libraries and deployment method.
purpose processors in the TinyML space, often have much
lower clock speeds and fewer architectural features (Ban-
2.3 Software Fragmentation
bury et al., 2021b) than their mobile or server-class counter-
parts. This becomes a challenge when trying to keep pace Due to TinyML’s infancy, the software stack has not yet
with the flow of data from a sensor or hit latency constraints. reached a state of stability concerning particular formats
ML workloads often require gigabytes of working mem- and best practices. Occasionally, TinyML applications are
ory and storage to store activations and model weights, but deployed with a full operating system (OS) like Linux, a
TinyML systems are often equipped with only a few hun- real-time OS like Zephyr (Zep), an inference framework like
dred kilobytes of SRAM and a few megabytes of eFlash. TFLM (David et al., 2021), or even a bare-metal implemen-
This enforces a strict constrain on the models. TinyML sys- tation as a C++ library with no external dependencies. This
tems often have very flat memory hierarchies, due to small diversity restricts the interoperability of new optimizations
or non-existent caches and often no off-chip memory (Ban- and tools A standard TinyML training pipeline incorporates
bury et al., 2021b). This means the typical data access tools and techniques from multiple sources, resulting in a
patterns that neural networks have been designed around tangled web of software versions and ports that can hinder
no longer apply, which has forced the design of new model collaboration, portability, robustness, and reproducibility.
architectures (Banbury et al., 2021b; Lin et al., 2020b).
2.4 Co-Optimization and Cross-Stack Collaboration
Finally, many TinyML applications operate on battery
power, and the battery life of the system directly impacts Each software and hardware optimization depends on the
the usefulness of the application. Due to the small size other layers of the development and deployment stack to be
and cost of TinyML systems, these batteries are often small effective. This necessitates a complex optimization problem
and low capacity (e.g. a coin cell). Due to the limited en- with many interconnected knobs to tune for optimal perfor-
ergy budget, any wireless transmission can quickly deplete mance, as TinyML applications have stringent constraints.
the battery (Siekkinen et al., 2012). Since data is often Even without the addition of model hyper-parameters and
only transmitted once a specific prediction is made (e.g. optimizations such as quantization, applying consistent pre-
“OK Google”, “Alexa”, “Hey Siri”, etc.), false positives processing across projects is a complicated jumble of hy-
contribute to battery drain with no benefit. Therefore, the perparameters that often requires deep, domain-specific in-
accuracy of a model can directly impact the energy con- sights into a signal. Due to this inherent complexity, TinyML
sumption of the system. These device constraints force development is a time consuming process that requires a
TinyML application developers to leverage every compres- wide range of specific technical expertise that is often not
sion and optimization technique at their disposal, which, as readily available in the industry. In addition, the cross-
described in the next sections, poses it’s own challenges. product of options and versions at each layer complicates
collaboration and the reproducibility of ML applications.
2.2 Hardware Heterogeneity
Additionally, data consistency poses it’s own challenges, es-
Despite resource limitations being fairly constant across pecially when using an internal or collected dataset. Signifi-
TinyML hardware, the embedded computing systems them- cant operational challenges are posed by maintaining train,
selves are quite diverse. TinyML devices range from mi- validation, and test splits, adding or removing individual
crocontrollers (Eggimann et al., 2019) and digital signal samples, and preserving metadata. To facilitate large-scale
processors (Gruenstein et al., 2017), to application specific collaborative projects and aid in the resolution of the ML
accelerators (Prakash et al., 2022; ARM, 2022) and neur- reproducibility crisis (Hutson, 2018), one must version con-
morphic processors (Qiao et al., 2015). The STM32 32-bit trol the data, preprocessing, model, and deployment code
Arm Cortex MCU family alone, for example, includes 17 while tracking a complex web of external dependencies.
series of microcontrollers. Each STM32 microcontroller
series is based on an ARM processor core that is either
3 OVERVIEW AND D ESIGN O BJECTIVES
Cortex-M7, Cortex-M4, Cortex-M33, Cortex-M3, Cortex-
M0+, or Cortex-M0 (STM). Their capabilities can also vary Edge Impulse is a combination of software-as-a-service,
at the instruction set architecture level. The same is true of developer tooling, embedded software, and documentation
other vendors. Each hardware platform supports different to help embedded development teams create software that
deployment processes, model types, numerical formats, and makes use of embedded machine learning at scale. At the
Edge Impulse: An MLOps Platform for Tiny Machine Learning
time of writing, Edge Impulse is being used by 50,953 devel- within an ecosystem of community users, tools, and content.
opers in 118,185 projects, of which, 3,219 have been made
When a user creates a project, they are guided through
public; it is in use at over 5500 enterprise organizations,
the process of gathering data, analyzing that data, creat-
excluding universities and other educational institutions.
ing a DSP preprocessing block, training a machine learning
Edge Impulse is designed and engineered according to the model, evaluating that model, and ultimately deploying it to
following seven guiding principles, based on the developer a hardware platform of their choice. These steps are shown
challenges as well as the embedded ecosystem challenges in the ML workflow in Figure 1. Figure 2 shows a user’s
that we described in Section 1 and Section 2, respectively. view inside an Edge Impulse Studio project with the ML
workflow steps shown on the left side of the page. Projects
1. Accessible. Edge Impulse’s primary objective is to make
in Edge Impulse are divided into a series of blocks that rep-
embedded machine learning simpler and more accessible
resent the dataflow. For this keyword spotting example, data
while focusing on producing Tiny ML solutions for resource-
arrives (i.e. from a microphone) in the left block labeled
constrained devices (Section 2.1). This effectively broadens
“Time series data” is preprocessed into Mel-frequency cep-
the pool of potential embedded ML developers by helping
stral coefficients (MFCCs) in the middle block, and then
the embedded engineers with ML and the ML engineers
sent to a NN for inference in the block labeled “Classifica-
with embedded systems (Section 2.4).
tion (Keras).” In the rest of the project design, users can
2. End-to-end. Edge Impulse provides users the ability to modify block parameters to adjust functionality or create
easily experiment with the end-to-end ML workflow holisti- their own blocks to transform the data. s
cally (Figure 1). Using Edge Impulse, one (or a team) could
With the design objectives established, a few things are
collect a dataset, train a efficient, optimized model, evaluate
beyond the scope of Edge Impulse. Edge Impulse is not
its performance, and deploy embedded firmware.
intended to eliminate the need for a design process informed
3. Data-centric. Edge Impulse prioritizes a data-centric by domain expertise, stakeholder consultation, and machine
approach because data collection and analysis has been his- learning workflow insight. For instance, a team of engineers
torically slowed down in the ML pipeline. Given the scarcity utilizing Edge Impulse must still assess the suitability of
of sensor datasets in the embedded ecosystem (Challenge machine learning as a solution to the problem they are at-
#1, Section 1), Edge Impulse enables users to ingest data tempting to solve. The team must have the necessary domain
from various sources. Therefore, Edge Impulse encourages expertise to comprehend the problem and develop a responsi-
a data-centric approach to ML development, rather than ble solution. They must still comprehend the general nature
(over) emphasizing a model-centric approach. of the machine learning (ML) workflow, including iterative
development and adopt appropriate evaluation metrics.
4. Iterative. Since cross-stack optimization is critical (Sec-
tion 2.4), Edge Impulse promotes short developer feedback
loops that allows developers to quickly experiment and iter- 4 I MPLEMENTATION
ate over different design space optimizations. To this end,
In this section, we describe all the different aspects of the
Edge Impulse strives to provide a rich set of AutoML tools.
end-to-end flow that Edge Impulse supports, as illustrated
Short design cycles and AutoML tools removes some of the
in Figure 1. For each stage, we present the rationale for the
burden of expertise (Section 2.4).
stage and describe its implementation specifics.
5. Easy integration and extensible. Edge Impulse priori-
tizes integration and extensibility to address the challenge of 4.1 Data Collection and Analysis
software fragmentation (Section 2.3) and cross-stack collab-
oration (Section 2.4). Experts should be able to connect the Since every ML project begins with data that is often hard to
technology with their preferred downstream stacks, ideally gather easily, Edge Impulse provides a number of features
using open standards where possible, and deploy to a wide designed to help users collect data, manage their dataset,
variety of embedded and edge platforms (Section 2.2). and perform feature extraction through digital signal pro-
cessing (DSP). Edge Impulse projects can accept data stored
6. Team Oriented. To scale well (Challenge #5, Section 1), in a several file formats: CSV, CBOR, JSON, WAV, JPG,
Edge Impulse facilitates the teamwork and communication or PNG. The platform also offers several methods to help
required for many embedded machine learning projects by users gather data for their project, including command line
supporting multiple users on projects, versioning of projects, interface (CLI) tools that interface with device firmware to
and sharing of projects. In addition, it is well-documented ingest data in real time and web-based API to upload data
with accessible content that serves every user type. directly or from an existing cloud-based store (e.g. AWS
7. Community supported. Finally, the technology should S3 bucket). A GUI allows users to visualize training and
promote a strong commitment to the community and exist test set split as well as class allocation grouped into buckets.
Edge Impulse: An MLOps Platform for Tiny Machine Learning
Figure 2. Screenshot showing the user’s view inside an Edge Impulse project where the blocks are connected depicting the dataflow.
Users can also examine raw data in each sample through 4.3 ML Design and Training
time series plots or images, depending on the data type.
Traditionally, a user would need to write code to train a
neural network on their data. But to make machine learning
4.2 DSP Pipeline
more accessible to everyone, Edge Impulse employs a vi-
Many of the embedded machine learning applications or use sual editor that allows a user to train on their data without
cases rely on sensor data preprocessing. Edge Impulse offers entering any code. There are preset neural network architec-
the ability to perform various preprocessing of raw signal tures that are suggested based on the type of data coming
data automatically prior to use in training or inference. This into the machine learning block. However, the layers or the
preprocessing step is known as digital signal processing network can be customized by the user. Advanced users
(DSP) in the Edge Impulse workflow. By preprocessing raw can download the containerized code for the block to train
data, model size can often be reduced, and preprocessing locally or use expert mode to customize further using the
can incorporate more efficient algorithms than can be real- Keras framework (Chollet, 2015) and Python.
ized with typical neural net architectures. For example, an Arbitrary combinations of building blocks (as shown in the
FFT is an O(n ⋅ log(n)) algorithm for extracting frequency Figure 2 screenshot) allows for rich flexibility in model ar-
information, whereas using 1D convolutional layers to ac- chitecture, but it is important that the model is trainable.
complish the same thing would require O(n2 ) operations. Edge Impulse provides a number of subtle, but important,
Edge Impulse simplifies the preprocessing workflow by pro- optimisation pieces to ensure stable training including, but
viding a rich array of continuum blocks that trade off model not limited to, learning rate finding, classifier bias initial-
size and complexity (edg), a visual explorer for tuning DSP isation, best model checkpoint restoration. Such building
block hyperparameters (frame length, stride, window size, blocks and optimizations make the training process accessi-
number of coefficients, etc.), and estimates of memory and ble to machine learning novices while the extinsibility of the
latency requirements for a given choice of hyperparameters. expert mode allow domain experts to develop more complex
ML models on Edge Impulse.
Edge Impulse offers sensible defaults for a variety of tasks
to ensure minimal knowledge is required by users, though Edge Impulse provides a transfer learning (Pan & Yang,
domain experts can choose preprocessing steps and hyperpa- 2009) block for audio keyword detection. This allows the
rameters that reduce the amount of training noise. Addition- users to quickly develop a robust keyword spotting applica-
ally, users can automatically select these hyperparameters tion, even when working with a relatively small dataset.
via the DSP autotune feature, or optimize them via the Eon Edge Impulse maintains partnerships with silicon vendors
Tuner (Sec. 4.7) who have developed specific neural network accelerator
Edge Impulse: An MLOps Platform for Tiny Machine Learning
hardware, such as the Syntiant NDP101. In addition to devices. Compression techniques available out-of-the-box
generating models for general purpose processors, Edge in Edge Impulse include fully int-8, weight and activation
Impulse supports a variety of architecture-specific devices quantization (Jacob et al., 2017) and operator fusion (goo,
and optimizations, such as CMSIS-NN (Lai et al., 2018) to 2022). Quantization-aware training is supported when con-
maximize the performance and minimize the memory foot- verting a model to Brainchip’s Neuromorphic format (bus,
print of neural networks on Cortex-M processor cores, thus 2022).
alleviating many issues found with hardware heterogeneity.
Code optimization involves optimizations to run an algo-
Edge Impulse also supports several unsupervised learning rithm or set of algorithms on a given target. This includes
algorithms to tackle anomaly detection problems. At the model-specific code generation (via EON Compiler (Jong-
moment, Edge Impulse uses K-means clustering and will boom, 2022)), ML kernels optimized for particular proces-
support Gaussian mixture models (GMM) in the near future. sor architectures (e.g. ARM CMSIS-NN (Lai et al., 2018)),
and quantization optimized DSP algorithms. The software
4.4 Estimation and Evaluation development kit (SDK) is designed to make use of available
optimizations depending on the compiler flags that are set.
Since embedded systems are resource-constrained, develop-
ers can benefit from having estimations of model inference Device-specific optimizations are those that apply to specific
latency time, RAM usage, and flash memory usage during targets due to the requirement of hardware support. Exam-
the early-stage design space exploration. Edge Impulse uses ples include the training and conversion of spiking neural
Renode (Hołenko, 2017) and device-specific benchmarking networks (supported for specific targets via integration) and
to produce estimates of preprocessing and model inference sparse neural networks. Additional targets and optimiza-
times. Models are also compiled with varying options (non- tions can be added using the platform’s extensibility via
quantized vs. quantized, TFLM vs. EON Compiler) to custom processing, learning, and deployment blocks.
produce initial insights into RAM and flash memory usage. Edge Impulse’s EON compiler (Jongboom, 2022) compiles
Edge Impulse offers a number of tools to assist in evaluating TFLM neural networks to C++ source code. The EON
the effectiveness of model performance. A confusion matrix Compiler eliminates the need for the TFLM interpreter by
can be generated from the holdout set to provide overall generating code that directly calls the underlying kernels
or per-class accuracy and F1 scores. For supported hard- and enables the linker to eliminate unused instructions. This
ware, new data can be collected to perform live inference. effort reduces the RAM and ROM usage for neural network
Such evaluation options assist users in identifying trade-offs implementations, as we show in Section 5.3.
between model performance and model size and latency.
4.6 Conversion and Compilation
In addition to model evaluation, Edge Impulse enables post-
processing evaluation and tuning using a tool known as Edge Impulse offers several possibilities for DSP and model
performance calibration (Situnayake, 2022) for projects that deployment to target embedded and edge devices, such as
identify events in streaming data. The tool accepts an input standalone C++ library, Arduino library, process runner for
of user-supplied raw data or synthetically generated data Linux, WebAssembly library (Web), and precompiled bina-
along with the trained model. Using a genetic algorithm, it ries for a variety of supported boards. A deployed project
suggests a number of optimal post-processing configurations includes both DSP preprocessing and trained machine learn-
that trade off false acceptance rate (FAR) and false rejection ing model that have been optimized for a given architecture.
rage (FRR). Suggesting optimal post-processing methods
Edge Impulse provides a firmware SDK for collecting data
significantly reduces the engineering risk associated with a
directly on a device that will be used at inference time. This
project and increases the quality of its performance.
SDK can be built by a user or is available in binary format
for a variety of popular microcontrollers, such as Arduino,
4.5 Compression and Optimization Raspberry Pi Pico, etc. As a library, the SDK contains
Many different optimization types can be used to improve several public-facing functions for performing inference
the performance of ML and DSP algorithms when deployed (Hymel, 2022). The precompiled binary presents a simple
to edge devices. Several types of optimization are supported set of AT commands for usage over a serial port.
by Edge Impulse, either out-of-the-box or via extensibility. The SDK provides a pathway to out-of-the-box operation
The optimization areas are model compression and optimiza- that uses a combination of code generation, macros, and
tion, code optimization, and device-specific optimization. runtime checks to include more efficient algorithms and
Model compression and optimization techniques are applied optimizations where possible, but it falls back on pure C++
either during or after training and result in models with a re- where needed to run on a wide range of processor architec-
duced size or computational burden when deployed to edge tures. Porting to a new processor requires an allocator for
Edge Impulse: An MLOps Platform for Tiny Machine Learning
4.7 AutoML
The accuracy of any deep learning system depends critically
on identifying a proper choice of hyperparameters. The in-
herent resource constraints on embedded targets also limits
the selection of hyperparameters. For example, a trade-off
has to be made between allocating resources for the DSP
and deep learning algorithms. For a novice user, the relation- Figure 3. Screenshot of the EON Tuner. Features are annotated
ship between hyperparameters can often be difficult to grasp. with color coded dotted boxes that correspond to the challenges
Therefore, Edge Impulse provides a suite of automated ma- in Figure 1. Purple (top right): The tuner allows users to select
chine learning (AutoML) techniques to assist non-experts the target hardware, which will then inform the constraints set on
in creating usable models and tune hyperparameters. the search. Blue (top left): The tuner computes the configuration’s
accuracy and predicts the resource consumption of the DSP and
To ensure low burden on the user, Edge Impulse’s EON NN components. Pink (bottom): The tuner searches for optimal
Tuner (Jenny Plunkett, 2021) assists in the hyperparameter DSP and NN combinations and displays their configuration.
selection process while taking into account available RAM,
ROM, and CPU clock speed of the target device. The EON
Tuner helps select a number of hyperparameter configura-
4.8 Active Learning
tions, including DSP preprocessing settings. It then trains
the associated models to determine their accuracy. From Datasets can be iteratively improved by leveraging a par-
these results, the user can select a preferred configuration tially trained model to aid in labeling and data cleaning
(e.g. based on accuracy/F1 score or resource usage) and in a process called active learning (Moreau, 2022). Edge
update the associated project to this configuration. Impulse employs an active learning loop for the embed-
ded sensor ecosystem where you can: (1) train a model
Figure 3 shows the user’s perspective after EON Tuner has
on a small, labeled subset of your data, (2) generate se-
been successfully run. Note the stacked bar plots showing
mantically meaningful embeddings using an intermediate
the estimated latency, RAM, and flash usage (Sec. 4.4)
layer of the trained model, (3) visualize the embeddings
for each combination of preprocessing (DSP) and model
(non-labeled and labeled samples) in 2D space using a di-
blocks based on the selected target (e.g. Arduino Nano
mensionality reduction algorithm (Umap (McInnes et al.,
33 BLE Sense). Model details, including the specific DSP
2018) or t-SNE (Van der Maaten & Hinton, 2008)), and (4)
and NN configuration, are shown, which allows user to
manually or automatically label or remove samples based on
select the best combination of blocks to meet their accuracy
their proximity to existing class clusters. This process can
requirements within the desired hardware constraints.
drastically speed up the labeling and data cleaning processes,
To select hyperparameter configurations, the EON Tuner which can lead to major gains in model performance.
combines a random search algorithm (Bergstra et al., 2011)
with a heuristic to quickly estimate the performance of the 4.9 Extensibility
configurations. Future work includes optimizing search
methods using a combination of a Bayesian (Eggensperger Edge Impulse supports the majority of the workflow shown
et al., 2013) and Hyperband (Li et al., 2017) search algo- in Figure 1, with the exception of IoT device management
rithms. Users have the option of overriding the default and production monitoring. However, all Edge Impulse
search algorithm with their own search methods. functionality is exposed via publicly accessible REST APIs
(ei2, 2020), which allows users to automate the data collec-
tion, model training, and deployment processes. This API
can be integrated into custom workflows and third party so-
Edge Impulse: An MLOps Platform for Tiny Machine Learning
Platform Processor Clock Flash RAM Nano 33 BLE Sense ESP-EYE Ras. Pi Pico
Nano 33 BLE Sense Arm Cortex-M4 64 MHz 1 MB 256 kB Float Int8 Float Int8 Float Int8
ESP-EYE (ESP32) Tensilica LX6 160 MHz 4 MB 8 MB Keyword Spotting (KWS) inference times
Ras. Pi Pico (RP2040) Arm Cortex-M0+ 133 MHz 16 MB 264 kB Preprocessing 141.65 138.76 305.53 304.11 590.74 590.87
Inference 2866.11 322.71 648.42 314.14 5700.03 1117.65
Table 1. Embedded platforms used for evaluation. Total 3007.91 461.62 954.02 618.35 6290.95 1708.71
Visual Wake Words (VWW) inference times
lutions to augment IoT device management and production Preprocessing - 9.98 24.25 9.07 - 56.44
monitoring, such as Microsoft Azure IoT (Azu). Inference - 754.74 2309.15 662.85 - 2205.76
Total - 816.56 2346.03 702.63 - 2286.68
In addition, users are able to create their own blocks via Image Classification (IC) inference times
Docker images to transform raw data taken from an exist- Preprocessing 1.36 1.14 1.09 1.03 4.57 6.46
Inference 1518.64 229.54 340.45 191.15 3048.05 554.04
ing store (e.g. AWS S3), perform feature extraction via Total 1520.25 232.56 341.62 197.36 3048.05 561.86
DSP, train a custom ML model, or deploy a model. Finally,
the Edge Impulse inferencing SDK library allows users to Table 2. Preprocessing and inference times (in milliseconds). ‘-’
develop complete embedded ML solutions that include a indicates the model did not fit due to flash or RAM constraints.
variety of model compression and optimization techniques.
these platforms for their differences in clock speeds, flash
Futhermore, Edge Impulse can integrate with existing storage and RAM capacity. We chose several models to eval-
ML development pipelines via the Edge Impulse Python uate the platforms to demonstrate the capabilities of each.
SDK (Situnayake, 2023). This allows users to use specific These models were created to solve three tasks outlined
features, such as profiling or deployment (Sec. 4.4 & 4.6), in the MLPerf Tiny Benchmark (Banbury et al., 2021a):
without needing to use the graphical interface. keyword spotting (KWS), visual wake words (VWW), and
image classification. KWS is a common task in embedded
4.10 Scalable Infrastructure devices that require wake word detection, such as “Alexa”
or “OK Google.” We chose a DS-CNN model (Sørensen
Edge Impulse employs AWS Elastic Kubernetes Ser-
et al., 2020) that achieved at least 78% on a test set from
vice (Man) to dynamically scale compute resources based
the Google Speech Commands dataset (Warden, 2018). For
on workload requirements. All workloads are containerised,
the VWW task, MobileNetV1 was trained using the visual
which has proven vital for efficient dependency manage-
wake words dataset (Chowdhery et al., 2019), which was
ment. Often, ML software infrastructure requires a wide-
derived from the Microsoft COCO dataset (Lin et al., 2014).
range of dependencies and versions of dependencies that are
This dataset is a balanced set of “person” and “non-person”
not always mutually compatible. The choice of Kubernetes
images used to train an image classification model. We
over a vendor-specific tool, such as AWS Elastic Container
achieved at least 72% accuracy on a hold-out set. Finally,
Service (ama), is to enable migration of the Edge Impulse
we trained a simple convolutional neural network (CNN) on
infrastructure to a different cloud provider or on-premise
CIFAR-10 (Krizhevsky et al., 2009).
with a reasonable (1-6 months) amount of effort.
5.2 Cross-Hardware Inference Latency Comparison
5 P ERFORMANCE E VALUATION
Table 2 displays three sets of end-to-end timing results using
ML development often narrowly focuses on the model per- provided hardware timers. The table shows the latency of
formance in isolation (Richins et al., 2021) due to the com- the KWS, VWW, and image classification tasks for both
plexity of co-optimization (Sec. 2.4). However, the DSP floating point and quantized integer (8-bit) models across
stage can be a dominant factor in the overall latency and the three platforms. The preprocessing and classification
memory consumption of a TinyML application. Edge Im- tasks are timed from within the Edge Impulse SDK, and
pulse is designed to quantify the DSP overhead and allow the total time is taken by measuring the difference between
users to explore the rich DSP and NN co-design space. timestamps taken around the call to run classification, which
In this section, we characterize the latency, SRAM, and flash is a combination of preprocessing and inference plus some
consumption of TinyML workloads across multiple devices, overhead not measured in either preprocessing or inference.
optimizations, and AutoML defined configurations, thereby On some tasks, such as keyword spotting, the preprocessing
showing Edge Impulse’s ability to address the challenges of time can easily equal or exceed the inference time of the
hardware heterogeneity (Sec. 2.2), software fragmentation unoptimized model. Therefore, optimizing the network
(Sec. 2.3), and cross-stack optimization (Sec. 2.4). inference via quantization, etc. will not yield the typical
magnitude of latency reduction. Edge Impulse allows users
5.1 Experimental Setup to look at the end-to-end performance of a task, rather than
focus only on isolated network performance.
We evaluated three representative hardware designs. The
details for the platforms are shown in Table 1. We chose Edge Impulse helps users experiment with preprocessing
Edge Impulse: An MLOps Platform for Tiny Machine Learning
Table 3. Preprocessing blocks and models explored with EON Tuner for the keyword spotting task. Latency, RAM, and flash estimates
from EON Tuner for the keyword spotting task on the Arduino Nano 33 BLE Sense (float32 inference, using TFLM).
and models to quickly iterate on designs to find acceptable differently, leading to a slightly higher accuracy and lower
solutions to such problems. Table 3 shows how users can latency (more NN) compared to less RAM and Flash con-
choose to use different preprocessing blocks, Mel-filterbank sumption (more DSP). This process accelerates the initial
energy (MFE) or MFCCs, and sweep different model archi- exploratory phase of ML development and makes cross-
tectures with the EON Tuner for optimizing latency, accu- stack optimizations accessible for novice ML developers.
racy, RAM, and Flash storage. There is no ideal solution;
the ultimate choice is up to the user as they know their de- 6 E COSYSTEM E NABLEMENT
ployment constraints. Edge Impulse simply automates the
possibilities and displays suggested configurations. Since its launch in late 2019, Edge Impulse has seen ex-
citement around embedded ML deployments in a variety of
5.3 Memory Optimization with EON Compiler domains. We highlight a few exemplar use cases here.
Embedded systems are constrained by their memory and
6.1 Education and Learning
storage capacity (Section 2.1), Table 4 details the estimated
memory usage, RAM and flash, for all three tasks. The Edge Edge Impulse provides both a graphical user interface
Impulse EON Compiler removes the need for the TFLM in- through Studio as well as an extensible web-based API.
terpreter for on-device inference, thus reducing the required Consequently, it is well suited for classroom activities, as
RAM and flash in most cases. A consistent decrease in it provides a series of parameters, plots, and visualizations
memory utilization is seen when enabling the EON Com- within the Studio to assist newcomers in building end-to-end
piler as well as quantizing to an INT8 model. Quantization embedded/edge machine learning systems.
can decrease the accuracy of the model due to the lower pre-
We saw a large interest in embedded machine learning
cision, but in some instances (e.g. the image classification
courses upon the delivery of two separate massively open
task) it improves the accuracy due to regularization. These
online courses (MOOCs). Between September 2020 and
optimizations do not impact the preprocessing stage.
June 2021, over 43,000 students enrolled in three of the
Keyword Spotting Visual Wake Words Image Classification Tiny Machine Learning courses on EdX (Tin). In less than
RAM Flash Acc. RAM Flash Acc. RAM Flash Acc.
Preprocessing 13.0 - - 4.0 - - 4.0 - - two years, over 75,000 students have benefited from these
FP (TFLM) 115.8 148.0
78.5
398.4 904.4
81.1
195.8 107.5
70.9 courses alone (Janapa Reddi et al., 2022). Additionally,
FP (EON) 96.8 106.7 327.7 861.4 162.7 78.7
Int8 (TFLM) 38.5 98.1 124.8 361.2 51.9 63.1 over 30,000 students enrolled in the two Embedded Ma-
78.5 79.9 71.1
Int8 (EON) 36.4 65.3 131.0 309.5 44.0 42.1 chine Learning courses between February 2021 and October
2022 that use Edge Impulse (Int). In addition, throughout
Table 4. Memory estimation (all memory estimates given in kilo-
bytes, accuracy in percentage based on the holdout set). Flash 2021 and 2022, the TinyML4D Academic Network (tin,
utilization estimation not provided for DSP preprocessing. 2022), which focuses on improving access to edge ML edu-
cation and technology in developing countries by running
a series of workshops for Africa, Asia, and Latin Amer-
5.4 Design Space Exploration with EON Tuner ica regions, used Edge Impulse to teach embedded ML to
Table 3 shows a number of configurations that are searched professors, lecturers, and students. These 2022 workshops
by the EON Tuner in order to find an optimal keyword had 216 attendees from 48 different countries. The high
spotting model. A user can find a model that balances the attendance in the MOOCs along with the apparent growing
resources allocated to the DSP and NN stages in order to desire for various schools to teach ML demonstrates the
meet the hardware constraints of the application while max- need for approachable tools for education.
imizing accuracy. For example, the configurations on the
3rd and 4th lines from the bottom balance the DSP and NN
Edge Impulse: An MLOps Platform for Tiny Machine Learning
sor (uTe) and Microsoft’s ELL (The) are both TinyML in- design, with easier-to-quantify metrics such as accuracy and
ference engines that cross-compile to specific target embed- latency (Mazumder et al., 2022). Areas for high-impact
ded hardware, but both projects are no longer supported. systems research include: (1) Aggregating and aligning data
TinyEngine (Lin et al., 2020a) and uTVM (mic) compile from unstructured sources and multiple sensors. (2) Visu-
the network graph and perform inter-layer optimizations. alizing statistically meaningful correlations of data from
TinyEngine is a research project and therefore it is not de- multiple sensors without hallucinations. (3) Leveraging non-
signed for production scale and uTVM currently supports technical domain experts for data cleaning and selection.
very few boards. Vendor-specific inference engines, such
as STM32Cube.AI (XCU), provides optimized inference 8.2 SlateSafety
but only support vendor-specific hardware, thus limiting
application portability. All of these engines, however, can The SlateSafety BAND V2 is a wearable device that moni-
be adopted and integrated into Edge Impulse. tors physiological signals of first responders and industrial
workers. Due to the lack of reliable wireless connection
in the deployment environment, SlateSafety required on-
8 I NDUSTRY C ASE S TUDIES device inference to predict heat exhaustion in users.
This section focuses on real-world insights from two in-
dustry case studies: sleep tracking with the Oura Ring and 8.2.1 Challenge
detecting heat exhaustion with SlateSafety Band V2. SlateSafety aimed to leverage existing hardware that was
already in the field instead of going through the expensive
8.1 Oura Ring process of developing an entirely new, ML-capable platform.
Therefore, the resulting model had to run in real-time on an
The Oura Ring is designed to track the user’s sleep patterns
existing microcontroller with limited memory capacity.
using an ML model that predicts the stages of sleep based
on measured physiological signals. In order to improve
8.2.2 Edge Impulse Solutions
their model, Oura conducted a large-scale sleep study (∼100
participants) and collected a large dataset for model training. Edge Impulse’s EON Tuner and Compiler were able to au-
tomatically design a custom model and deploy it efficiently
8.1.1 Challenge to the existing microcontroller via an automatic over-the-air
update. This means that SlateSafety could deploy a new fea-
Incomplete, noisy, and inconsistent data is unavoidable
ture to its users seamlessly and without a long development
when collecting real-world datasets and, given the scale
cycle around new hardware.
of the study, it would be an arduous process to aggregate,
scrub, and analyze the data before using it to train the next
8.2.3 Focus Areas for Future ML Systems Research
generation sleep tracking model. Many current platforms
provide little assistance for the critical stages before training, Much of existing ML research target state-of-the-art hard-
and stand-alone data analysis and visualization tools require ware that has been designed with ML deployment in mind.
significant additional effort to set up ad hoc ML pipelines. However, in practice, ML-based features are deployed to
existing systems. In order to enable broader adoption of
8.1.2 Edge Impulse Solution AI and prevent existing hardware from being replaced and
thrown out (Prakash et al., 2023), ML researchers need
Edge Impulse is able to ingest and align data from multiple
to develop optimization techniques that can be backward
sources and sensors by comparing sensor signatures, which
compatible with past generations of hardware.
simplifies and speeds up a typically manual and error-prone
process. EI integrates analysis tools that enable domain
experts to make design decisions on which sensors and fea- 9 C ONCLUSION
tures are meaningful for sleep prediction. Through this
Edge Impulse is a framework focused on building machine
process, Oura created a model focusing only on heart rate,
learning systems for resource-constrained devices. The
motion, and body temperature and achieves a best-in-class
framework is built around the principles of accessibility,
79% correlation accuracy when compared to polysomnogra-
data-centric co-optimization, and cross-stack collaboration.
phy and human scorers, which use expensive measurements,
As a result, Edge Impulse has put AI in the hands of the indi-
such as brain waves (de Zambotti et al., 2019).
viduals it impacts by reducing the expertise and computing
resources required to participate. Edge Impulse has already
8.1.3 Focus Areas for Future ML Systems Research
been used in various industrial, research, and educational ap-
Current ML Systems researchers too often ignore data- plications, and the lessons learned from these deployments
centric techniques and instead focus on stages, like model can focus future systems research on high-impact problems.
Edge Impulse: An MLOps Platform for Tiny Machine Learning
Banbury, C., Reddi, V. J., Torelli, P., Holleman, J., Jef- Eggimann, M., Mach, S., Magno, M., and Benini, L. A risc-
fries, N., Kiraly, C., Montino, P., Kanter, D., Ahmed, S., v based open hardware platform for always-on wearable
Pau, D., et al. Mlperf tiny benchmark. arXiv preprint smart sensing. In 2019 IEEE 8th International Workshop
arXiv:2106.07597, 2021a. on Advances in Sensors and Interfaces (IWASI), pp. 169–
174. IEEE, 2019.
Banbury, C., Zhou, C., Fedorov, I., Matas, R., Thakker,
Gruenstein, A., Alvarez, R., Thornton, C., and Ghodrat, M.
U., Gope, D., Janapa Reddi, V., Mattina, M., and What-
A cascade architecture for keyword spotting on mobile
mough, P. Micronets: Neural network architectures for
devices. arXiv preprint arXiv:1712.03603, 2017.
deploying tinyml applications on commodity microcon-
trollers. Proceedings of Machine Learning and Systems, Hołenko, M. renode, 2017. URL https://
3:517–532, 2021b. github.com/renode/renode.
Bergstra, J., Bardenet, R., Bengio, Y., and Kégl, B. Algo- Hutson, M. Artificial intelligence faces reproducibility cri-
rithms for hyper-parameter optimization. In Proceedings sis, 2018.
of the 24th International Conference on Neural Informa- Hymel, S. Inferencing sdk, 2022. URL https:
tion Processing Systems, NIPS’11, pp. 2546–2554, Red //docs.edgeimpulse.com/reference/c+
Hook, NY, USA, 2011. Curran Associates Inc. ISBN +-inference-sdk-library/inferencing-
9781618395993. sdk.
Chavarriaga, R., Sagha, H., Calatroni, A., Digumarti, S. T., Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M.,
Tröster, G., del R. Millán, J., and Roggen, D. The op- Howard, A., Adam, H., and Kalenichenko, D. Quan-
portunity challenge: A benchmark database for on-body tization and training of neural networks for efficient
sensor-based activity recognition. Pattern Recognition integer-arithmetic-only inference, 2017. URL https:
Letters, 34(15):2033–2042, 2013. ISSN 0167-8655. //arxiv.org/abs/1712.05877.
doi: https://doi.org/10.1016/j.patrec.2012.12.014. URL
https://www.sciencedirect.com/science/ Janapa Reddi, V., Plancher, B., Kennedy, S., Moroney,
article/pii/S0167865512004205. Smart L., Warden, P., Suzuki, L., Agarwal, A., Banbury,
Approaches for Human Action Recognition. C., Banzi, M., Bennett, M., Brown, B., Chitlangia,
S., Ghosal, R., Grafman, S., Jaeger, R., Krishnan,
Chollet, F. keras, 2015. URL https://github.com/ S., Lam, M., Leiker, D., Mann, C., Mazumder, M.,
fchollet/keras. Pajak, D., Ramaprasad, D., Smith, J. E., Stewart, M.,
and Tingley, D. Widening access to applied machine
Chowdhery, A., Warden, P., Shlens, J., Howard, A., learning with tinyml. Harvard Data Science Review,
and Rhodes, R. Visual wake words dataset. CoRR, 2022. doi: 10.1162/99608f92.762d171a. URL https:
abs/1906.05721, 2019. URL http://arxiv.org/ //hdsr.mitpress.mit.edu/pub/0gbwdele.
abs/1906.05721. https://hdsr.mitpress.mit.edu/pub/0gbwdele.
David, R., Duke, J., Jain, A., Janapa Reddi, V., Jeffries, N., Jenny Plunkett, M. B. Introducing the eon tuner: Edge im-
Li, J., Kreeger, N., Nappier, I., Natraj, M., Wang, T., et al. pulse’s new automl tool for embedded machine learning,
Tensorflow lite micro: Embedded machine learning for July 2021. URL https://www.edgeimpulse.com/
tinyml systems. Proceedings of Machine Learning and blog/introducing-the-eon-tuner-
Systems, 3:800–811, 2021. edge-impulses-new-automl-tool-for-
embedded-machine-learning.
de Zambotti, M., Rosas, L., Colrain, I. M., and Baker,
F. C. The sleep of the ring: Comparison of the Jongboom, J. Introducing eon: Neural networks
Ōura sleep tracker against polysomnography. Be- in up to 55% less ram and 35% less rom, 2022.
havioral Sleep Medicine, 17(2):124–136, 2019. doi: URL https://www.edgeimpulse.com/blog/
10.1080/15402002.2017.1300587. URL https: introducing-eon.
Edge Impulse: An MLOps Platform for Tiny Machine Learning
Klein, S. IoT Solutions in Microsoft’s Azure IoT Suite. Pan, S. J. and Yang, Q. A survey on transfer learning. IEEE
Springer, 2017. Transactions on knowledge and data engineering, 22(10):
1345–1359, 2009.
Koizumi, Y., Saito, S., Uematsu, H., Harada, N., and Imoto,
K. Toyadmos: A dataset of miniature-machine operating Prakash, S., Callahan, T., Bushagour, J., Banbury, C., Green,
sounds for anomalous sound detection. pp. 313–317, 10 A. V., Warden, P., Ansell, T., and Reddi, V. J. Cfu
2019. doi: 10.1109/WASPAA.2019.8937164. playground: Full-stack open-source framework for tiny
machine learning (tinyml) acceleration on fpgas. arXiv
Krizhevsky, A., Nair, V., and Hinton, G. Cifar-10 (canadian preprint arXiv:2201.01863, 2022.
institute for advanced research). 2009. URL http:
//www.cs.toronto.edu/˜kriz/cifar.html. Prakash, S., Stewart, M., Banbury, C., Mazumder, M.,
Warden, P., Plancher, B., and Reddi, V. J. Is tinyml
Lai, L., Suda, N., and Chandra, V. CMSIS-NN: efficient sustainable? assessing the environmental impacts of
neural network kernels for arm cortex-m cpus. CoRR, machine learning on microcontrollers. arXiv preprint
abs/1801.06601, 2018. URL http://arxiv.org/ arXiv:2301.11899, 2023.
abs/1801.06601.
Qiao, N., Mostafa, H., Corradi, F., Osswald, M., Stefanini,
Leroux, S., Simoens, P., Lootus, M., Thakore, K., and F., Sumislawska, D., and Indiveri, G. A reconfigurable
Sharma, A. Tinymlops: Operational challenges for on-line learning spiking neuromorphic processor com-
widespread edge ai adoption. In 2022 IEEE Interna- prising 256 neurons and 128k synapses. Frontiers in
tional Parallel and Distributed Processing Symposium neuroscience, 9:141, 2015.
Workshops (IPDPSW), pp. 1003–1010. IEEE, 2022.
Rauschmayr, N., Kumar, V., Huilgol, R., Olgiati, A., Bhat-
Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., and tacharjee, S., Harish, N., Kannan, V., Lele, A., Acharya,
Talwalkar, A. Hyperband: A novel bandit-based approach A., Nielsen, J., et al. Amazon sagemaker debugger: a
to hyperparameter optimization. The Journal of Machine system for real-time insights into machine learning model
Learning Research, 18(1):6765–6816, 2017. training. Proceedings of Machine Learning and Systems,
3:770–782, 2021.
Lin, J., Chen, W.-M., Lin, Y., Cohn, J., Gan, C., and Han, S.
Mcunet: Tiny deep learning on iot devices, 2020a. URL Richins, D., Doshi, D., Blackmore, M., Nair, A. T., Patha-
https://arxiv.org/abs/2007.10319. pati, N., Patel, A., Daguman, B., Dobrijalowski, D., Il-
likkal, R., Long, K., et al. Ai tax: The hidden cost of ai
Lin, J., Chen, W.-M., Lin, Y., Gan, C., Han, S., et al. Mcunet: data center applications. ACM Transactions on Computer
Tiny deep learning on iot devices. Advances in Neural In- Systems (TOCS), 37(1-4):1–32, 2021.
formation Processing Systems, 33:11711–11722, 2020b.
Shamim, M. Z. M. Hardware deployable edge-ai solution
Lin, T., Maire, M., Belongie, S. J., Bourdev, L. D., Girshick, for pre-screening of oral tongue lesions using tinyml on
R. B., Hays, J., Perona, P., Ramanan, D., Dollár, P., and embedded devices. IEEE Embedded Systems Letters, pp.
Zitnick, C. L. Microsoft COCO: common objects in 1–1, 2022a. doi: 10.1109/LES.2022.3160281.
context. CoRR, abs/1405.0312, 2014. URL http://
arxiv.org/abs/1405.0312. Shamim, M. Z. M. Tinyml model for classifying haz-
ardous volatile organic compounds using low-power
Mazumder, M., Banbury, C., Yao, X., Karlaš, B., Rojas, embedded edge sensors: Perfecting factory 5.0 using
W. G., Diamos, S., Diamos, G., He, L., Kiela, D., Ju- edge ai. IEEE Sensors Letters, 6(9):1–4, 2022b. doi:
rado, D., et al. Dataperf: Benchmarks for data-centric ai 10.1109/LSENS.2022.3201398.
development. arXiv preprint arXiv:2207.10062, 2022.
Siekkinen, M., Hiienkari, M., Nurminen, J. K., and Niemi-
McInnes, L., Healy, J., and Melville, J. Umap: Uniform nen, J. How low energy is bluetooth low energy? com-
manifold approximation and projection for dimension parative measurements with zigbee/802.15. 4. In 2012
reduction. arXiv preprint arXiv:1802.03426, 2018. IEEE wireless communications and networking confer-
ence workshops (WCNCW), pp. 232–237. IEEE, 2012.
Moreau, L. Create active learning pipelines with data
sources and data explorer features, 2022. URL https: Situnayake, D. Announcing performance cal-
//www.edgeimpulse.com/blog/create- ibration, October 2022. URL https://
active-learning-pipelines-with-data- www.edgeimpulse.com/blog/announcing-
sources-and-data-explorer-features. performance-calibration.
Edge Impulse: An MLOps Platform for Tiny Machine Learning