Benchmarking TinyML Systems: Challenges and Direction
Colby R. Banbury, Vijay Janapa Reddi, Max Lam, William Fu, Amin Fazel, Jeremy Holleman, Xinyuan Huang, Robert Hurtado, David Kanter, Anton Lokhmotov, David Patterson, Danilo Pau, Jae-sun Seo, Jeff Sieracki, Urmish Thakker, Marian Verhelst, Poonam Yadav
BB ENCHMARKING T INY
ML S
YSTEMS : C
HALLENGES AND D IRECTION
Colby R. Banbury Vijay Janapa Reddi Max Lam William Fu Amin Fazel Jeremy Holleman
Xinyuan Huang Robert Hurtado David Kanter Anton Lokhmotov David Patterson
Danilo Pau Jae-sun Seo Jeff Sieracki Urmish Thakker Marian Verhelst
15 16
Poonam Yadav A BSTRACT
Recent advancements in ultra-low-power machine learning (TinyML) hardware promises to unlock an entirely newclass of smart applications. However, continued progress is limited by the lack of a widely accepted benchmarkfor these systems. Benchmarking allows us to measure and thereby systematically compare, evaluate, and improvethe performance of systems and is therefore fundamental to a field reaching maturity. In this position paper, wepresent the current landscape of TinyML and discuss the challenges and direction towards developing a fair anduseful hardware benchmark for TinyML workloads. Furthermore, we present our four benchmarks and discussour selection methodology. Our viewpoints reflect the collective thoughts of the TinyMLPerf working group thatis comprised of over 30 organizations.
NTRODUCTION
Machine learning (ML) inference on the edge is an increas-ingly attractive prospect due to its potential for increasingenergy efficiency (Fedorov et al., 2019), privacy, responsive-ness (Zhang et al., 2017), and autonomy of edge devices.Thus far, the field edge ML has predominately focused onmobile inference which has led to numerous advancementsin machine learning models such as exploiting pruning, spar-sity, and quantization. But in recent years, there have majorbeen strides in expanding the scope of edge systems. In-terest is brewing in both academia (Fedorov et al., 2019;Zhang et al., 2017) and industry (Flamand et al., 2018; War-den, 2018a) towards expanding the scope of edge ML tomicrocontroller-class devices.The goal of “TinyML” (tinyML Foundation, 2019) is tobring ML inference to ultra-low-power devices, typically un-der a milliWatt, and thereby break the traditional power bar-rier preventing widely distributed machine intelligence. Byperforming inference on-device, and near-sensor, TinyMLenables greater responsiveness and privacy while avoidingthe energy cost associated with wireless communication,which at this scale is far higher than that of compute (War- Harvard University Samsung Semiconductor, Inc. Syntiant University of North Carolina, Charlotte Cisco Systems California State Polytechnic University, Pomona Real WorldInsights dividiti University of California, Berkeley Google STMicroelectronics, Italy Arizona State University RealityAI Arm ML Research Lab KU Leuven Interuniversity Micro-electronics Centre (IMEC) University of York. Correspondenceto: Colby R. Banbury < [email protected] > .Copyright 2020 by the authors. den, 2018b). Furthermore, the efficiency of TinyML enablesa class of smart, battery-powered, always-on applicationsthat can revolutionize the real-time collection and process-ing of data. This emerging field, which is the culminationof many innovations, is poised only further to accelerate itsgrowth in the coming years.To unlock the full potential of the field, hardware softwareco-design is required. Specifically, TinyML models mustbe small enough to fit within the tight constraints of MCU-class devices (e.g., a few hundred kB of memory and limitedonboard compute horsepower in the order of MHz proces-sor clock speed), thus limiting the size of the input and thenumber of layers (Zhang et al., 2017) or necessitating theuse lightweight, non-neural network-based techniques (Ku-mar et al., 2017). TinyML tools are broadly defined asanything that enables the design, mapping, and deploymentof TinyML algorithms including aggressive quantizationtechniques (Wang et al., 2019), memory aware neural archi-tecture searches (Fedorov et al., 2019), frameworks (Ten-sorFlow), and efficient inference libraries (Lai et al., 2018;Garofalo et al., 2019). Efforts in TinyML hardware in-clude improving inference on the next generation of general-purpose MCUs (arm; Flamand et al., 2018), developinghardware specialized for low power inference, and creatingnovel architectures intended only as inference engines forspecific tasks (Moons et al., 2018).The complexity and dynamicity of the field obscure the mea-surement of progress and make dynamism design decisionsintractable. In order to enable the continued innovation, afair and reliable method of comparison is needed. Sinceprogress is often the result of increased hardware capability, a r X i v : . [ c s . PF ] J a n enchmarking TinyML Systems a reliable TinyML hardware benchmark is required.In this paper, we discuss the challenges and opportunitiesassociated with the development of a TinyML hardwarebenchmark. Our short paper is a call to action for estab-lishing a common benchmarking for TinyML workloads onemerging TinyML hardware to foster the development ofTinyML applications. The points presented here reflect theongoing effort of the TinyMLPerf working group that is cur-rently comprised of over 30 organizations and 75 members.The rest of the paper is organized as follows. In Section 2,we discuss the application landscape of TinyML, includingthe existing use cases, models, and datasets. In Section 3, wedescribe the existing TinyML hardware solutions, includingoutlining improvements to general-purpose MCUs and thedevelopment of novel architectures. In Section 4, we discussthe inherent challenges of the field and how they complicatethe development of a benchmark. In Section 5, we describethe existing benchmarks that relate to TinyML and identifythe deficiencies that still need to be filled. In Section 6 wediscuss the progress of the TinyMLPerf working group thusfar and describe the four benchmarks. In Section 7, weconcluded the paper and discuss future work. INY U SE C ASES , M
ODELS & D
ATASETS
In this section we attempt to summarize the field of TinyMLby describing a set of representative use cases (Section2.1), their relevant datasets (Section 2.2), and the modelarchitectures commonly applied to these specific use cases(Section 2.3).
Despite the general lack of maturity within the field, thereare a number of well established TinyML use cases. Wecategorize the application landscape of tiny ML by inputtype in Table 3, which in the context of TinyML systemsplays a crucial role in the use case definition.Audio wake words is already a fairly ubiquitous example ofalways-on ML inference. Audio wake words is generally aspeech classification problem that achieves very low powerinference by limiting the label space, often to two labels:“wake word” and “not wake word” (Zhang et al., 2017).Anomaly detection and predictive maintenance are com-monly deployed on MCUs in factory settings where audio,motor bearing, or IMU data can be used to detect faults inproducts or equipment.Other deployed TinyML applications, like activity recog-nition from IMU data (Hassan et al., 2018), rely on lowfeature dimensionality to fit within the tight constraints ofthe platforms. Some use cases have been proven viable, buthave yet to reach end users because they are too new, like visual wake words (Chowdhery et al., 2019).Many traditional ML use cases can be considered futuristicTinyML tasks. As ultra-low-power inference hardware con-tinues to improve, the threshold of viability expands. Taskslike large label space image classification or object countingare well suited for low-power always-on applications butare currently too compute and memory hungry for today’sTinyML hardware.Furthermore, TinyML has a significant role to play in futuretechnology. For example, many of the fundamental featuresof augmented reality (AR) glasses are always-on and battery-powered. Due to tight real time constraints, these devicescannot afford the latency of offloading computation to thecloud, an edge server, or even an accompanying mobiledevice. Thus, due to shared constraints, AR applications canbenefit significantly from progress in the field of TinyML.
There are a number of open-source datasets that are relevantto TinyML usecases. Table 3 breaks them down by thetype of data. Despite the availability of these datasets, themajority of deployed TinyML models are trained on muchlarger, proprietary datasets. The open-source datasets thatare competitively large are not TinyML specific. The lackof large, TinyML focused, open-source datasets slows theprogress of academic research and limits the ability of abenchmark to represent real workloads accurately.
Table 3 lists common model types for TinyML use cases.Although neural networks (NN) are a dominant force intraditional ML, it is common to use non-NN based solutionslike decision trees (Kumar et al., 2017), for some TinyMLuse cases, due to their low compute and memory require-ments.Machine learning on MCU-class devices has only recentlybecome feasible; therefore, the community has yet to pro-duce models that have become widely accepted as Mo-bileNets have become for mobile devices. This makes thetask of selecting representative models challenging. How-ever, immaturity also brings opportunity as our decisionscan help direct future progress. Selecting a subset of thecurrently available models, outlining the rules for qualityversus accuracy trade-offs, and prescribing a measurementmethodology that can be faithfully reproduced will encour-age the community to develop new models, runtimes, andhardware that progressively outperform one another. enchmarking TinyML Systems
Table 1.
Survey of TinyML Use Cases, Models, and Datasets I NPUT T YPE U SE C ASES M ODEL T YPES D ATASETS A UDIO A UDIO W AKE W ORDS C ONTEXT R ECOGNITION C ONTROL W ORDS K EYWORD D ETECTION
DNNCNNRNNLSTM S
PEECH C OMMANDS (W ARDEN , 2018 A )A UDIOSET (G EMMEKE ET AL ., 2017)E
XTRA S ENSORY (V AIZMAN ET AL ., 2017)I
MAGE V ISUAL W AKE W ORDS O BJECT D ETECTION I MAGE C LASSIFICATION G ESTURE R ECOGNITION O BJECT C OUNTING T EXT R ECOGNITION
DNNCNNSVMD
ECISION T REES
KNNL
INEAR V ISUAL W AKE W ORDS (C HOWDHERY ET AL ., 2019)CIFAR10 (K
RIZHEVSKY ET AL ., 2009 B )MNIST (L E C UN & C ORTES , 2010)I
MAGE N ET (D ENG ET AL ., 2009)DVS128 G
ESTURE (A MIR ET AL ., 2017)P
HYSIOLOGICAL /B EHAVIORAL M ETRICS S EGMENTATION F ORECASTING A CTIVITY D ETECTION
DNND
ECISION T REE
SVML
INEAR P HYSIONET (G OLDBERGER ET AL ., 2000)HAR (C
RAMARIUC , 2019)DSA (A
LTUN ET AL ., 2010)O
PPORTUNITY (R OGGEN ET AL ., 2010)UCI EMG (L
OBOV ET AL ., 2018)I
NDUSTRY T ELEMETRY S ENSING ( LIGHT , TEMP , ETC )A NOMALY D ETECTION M OTOR C ONTROL P REDICTIVE M AINTENANCE
DNND
ECISION T REE
SVML
INEAR N AIVE B AYES
UCI A IR Q UALITY (D E V ITO ET AL ., 2008)UCI G AS (V ERGARA ET AL ., 2012)NASA’ S PC O E (S
AXENA & G
OEBEL , 2008)
Figure 1.
A logorithmic comparison of the active power consump-tion between TinyML systems and those supported by MLPerf.TinyML systems can be up to four orders of magnitude smaller inthe power budget as compared to state-of-the-art MLPerf systems.
INY H ARDWARE C ONSTRAINTS
TinyML hardware is defined by its ultra-low power con-sumption, which is often in the range of 1 mWatt and below.At the top of this range are efficient 32-bit MCUs, like thosebased on the Arm Cortex-M7 or RISC-V PULP processors, and at the bottom are novel ultra-low-power inference en-gines. Even the largest TinyML devices consume drasticallyless power than the smallest traditional ML devices. Figure1 shows the logarithmic comparison of the active powerconsumption between TinyML devices and those currentlysupported by MLPerf (v0.5 inference results from the openand closed divisions). TinyML devices can be up to four or-ders of magnitude smaller in the power budget as comparedto state-of-the-art MLPerf systems.The advent of low-power, cheap 32-bit MCUs have revolu-tionized the compute capability at the very edge. Cortex-Mbased platforms are now regularly performing tasks thatwere previously infeasible at this scale, mostly due to sup-port for single instruction multiple data (SIMD) and digitalsignal processing (DSP) instructions. This fast vector mathsupports NN and highly efficient SVM implementations, italso accelerates many feature computations using 8bit fixedpoint arithmetic.A feature of MCUs is the prevalence of on-chip SRAMand embedded Flash. Thus, when models can fit within thetight on-chip memory constraints, they are free of the costlyDRAM accesses that hamper traditional ML. Widespreadadoption and dispersion of TinyML are reliant on the capa- enchmarking TinyML Systems bility of these platforms.Although general-purpose MCUs provide flexibility, thehighest TinyML performance efficiency comes from special-ized hardware. Novel architectures can achieve performancein the range of one micro Joule per inference (Holleman,2019). These specialized devices expand the boundaries ofML to the ultra low power end of TinyML processors.
HALLENGES
TinyML systems present a number of unique challenges tothe design of a performance benchmark that can be usedto measure and quantify performance differences betweenvarious systems systematically. We discuss the four primaryobstacles and postulate how they might be overcome.
Low power consumption is one of the defining features ofTinyML systems. Therefore, a useful benchmark shouldostensibly profile the energy efficiency of each device. How-ever, there are many challenges in fairly measuring energyconsumption. Firstly, as illustrated in Figure 1, TinyMLdevices can consume drastically different amounts of power,which makes maintaining accuracy across the range of de-vices difficult.Secondly, determining what falls under the scope of thepower measurement is difficult to determine when data pathsand pre-processing steps can vary significantly betweendevices. Other factors like chip peripherals and underlyingfirmware can impact the measurements. Unlike traditionalhigh-power ML systems, TinyML systems do not have sparecores to load the System-Under-Test (SUT) with minimaloverheads.
Due to their small size, TinyML systems often have tightmemory constraints. While traditional ML systems likesmartphones cope with resource constraints in the orderof a few GBs, tinyML systems are typically coping withresources that are two orders of magnitude smaller.Memory is one of the primary motivating factors for thecreation of a TinyML specific benchmark. TraditionalML benchmarks use inference models that have drasticallyhigher peak memory requirements (in the order of gigabytes)than TinyML devices can provide. This also complicatesthe deployment of a benchmarking suite as any overheadcan significantly impact power consumption or even makethe benchmark too big to fit. Individual benchmarks mustalso cover a wide range of devices; therefore, multiple levelsof quantization and precision should be represented in thebenchmarking suite. Finally, a variety of benchmarks should be chosen such that the diversity of the field is supported.
Despite its nascency, TinyML systems are already diversein their performance, power, and capabilities. Devices rangefrom general-purpose MCUs to novel architectures, likein event-based neural processors (Brainchip) or memorycompute (Kim et al., 2019). This heterogeneity poses anumber of challenges as the system under test (SUT) willnot necessarily include otherwise standard features, likea system clock or debug interface. Furthermore, the taskof normalizing performance results across heterogeneousimplementations is a key challenge.Today’s state-of-the-art benchmarks are not designed to han-dle the challenges readily. They need careful re-engineeringto be flexible enough to handle the extent of hardware het-erogeneity that is commonplace in the TinyML ecosystem.
There are three distinct methods for model deployment onto TinyML systems: hand coding, code generation, and MLinterpreters.Hand coding often produces the best results as it allows forlow-level, application specific optimizations; however, thetask is time consuming and the impact of the optimizationsare often opaque to anyone but the original design team.Moreover, hand coding limits the ability to share knowledgeand adopt new methods, which is detrimental to the rateof progress in TinyML. From a benchmarking perspective,hand coded submission will likely produce the best numeri-cal results at the cost of reproducibility, comparability andtime.Code generation methods produce well optimized code with-out the significant effort of hand coding by abstracting andautomating system level optimizations. However, code gen-eration does not address the issues with comparability, aseach major vendor has their own set of proprietary tools andcompilers, which also makes portability a challenge.ML interpreters allow for significant portability as theirabstract structure is the same across platforms. TensorFlowLite for Microcontrollers, a popular ML framework forTinyML, uses an interpreter to call individual kernels, likeconvolution, during run time. The framework is independentof the model architecture, therefore new models can beeasily swapped in. Additionally, the reference kernels canbe individually optimised and changed to fit the platform.This method comes with a small overhead in binary sizeand performance. From a benchmarking perspective, thisabstraction separates the impact of the model architectureon the system level performance, which makes results moregeneralizable. enchmarking TinyML Systems
Table 2.
Existing Benchmarks B ENCHMARK
ML? P
OWER ? T
INY ? C ORE M ARK × √ √
MLM
ARK √ × ×
MLP
ERF I NFERENCE √ √ × T INY
ML R
EQUIREMENTS √ √ √
A benchmark suite must balance optimality with portabil-ity, and comparibility with representativeness. A TinyMLbenchmark should support many options for model deploy-ment but the impact of that choice on the results must becarefully evaluated.
ELATED W ORK
There are a number of ML related hardware benchmarks,however, none that accurately represent the performanceof TinyML workloads on tiny hardware. Table 2 shows asampling of the widely accepted industry benchmarks thatare directly applicable to the discussion on TinyML systems.EEMBC CoreMark (Gal-On & Levy) has become the stan-dard performance benchmark for MCU-class devices dueto its ease of implementation and use of real algorithms.Yet, CoreMark does not profile full programs, nor does itaccurately represent machine learning inference workloads.EEMBC MLMark (Torelli & Bangale) addresses these is-sues by using actual ML inference workloads. However, thesupported models are far too large for MCU-class devicesand are not representative of TinyML workloads. They re-quire far too much memory (GBs) and have significant runtimes. Additionally, while CoreMark supports power mea-surements with ULPMark-CM (EEMBC), MLMark doesnot, which is critical for a TinyML benchmark.MLPerf, a community-driven benchmarking effort, hasrecently introduced a benchmarking suite for ML infer-ence (Reddi et al., 2019) and has plans to add power mea-surements. However, much like MLMark, the currentMLPerf inference benchmark precludes MCUs and otherresource-constrained platforms due to a lack of small bench-marks and compatible implementations.As Table 2 summarizes, there is a clear and distinct need fora TinyML benchmark that caters to the unique needs of MLworkloads, makes power a first-class citizen and prescribesa methodology that suits TinyML.
ENCHMARKS
To overcome theses challenges, we adopt a set of principlesfor the development of a robust TinyML benchmarking suiteand select a set of 4 benchmarks.
As previously stated, TinyML is a diverse field, thereforenot all systems can be accommodated under strict rules,however, without strict rules, direct comparison of the hard-ware becomes more difficult. To address this issue, weadopt MLPerf’s open and closed structure. More traditionalTinyML solutions can submit to the closed division wheresubmissions must use a model that is considered equivalentto the reference model. TinyML systems that fall outsidethe bounds of the ”closed” benchmark can submit results tothe open division which will allow submissions to deviateas necessary from the closed reference. We believe thisstructure increases the inclusivity of the bechmarking suitewhile maintaining the comparability of the results.Additionally, the open division allows for submissions todemonstrate novel software optimizations. Software basedorganizations can submit results using the reference plat-form while altering the model or inference engine to demon-strate the relative advantage of their unique solutions.
Our use case selection process prioritized diversity, fea-sibility, and industry relevance. Diversity to ensure ourbenchmark suite covered as much of the field as possible,feasibility in terms of access to open source datasets andmodels, and relevance to real world applications.The group has selected four use cases to target: audiowake words, visual wake words, image classification, andanomaly detection. Audio wake words refers to the com-mon, keyword spotting task (e.g. “Alexa”, “Ok Google”,and “Hey Siri”). Visual wake words is a binary image classi-fication task that indicates if a person is visible in the imageor not. The image classification use cases targets small labelset size image classification. Anomaly detection is a broaderuse case that classifies time series data as “normal” or “ab-normal”. We specifically select audio anomaly detection asour use case due to the availability of a relevant dataset.These use cases have been selected to represent the broadrange of TinyML. They encompass three distinct input datatypes and range from relatively resource hungry (visualwake words) to light weight (anomaly detection). Further-more the models traditionally used for these use cases arevaried therefore the benchmarking suite can support a di-verse set of ML techniques.
The group has selected a dataset for each use case, as shownin Table 3. The datasets help specify the use cases, are usedto train the reference models, and are sampled to create thetests sets used during the measurement on device. Further- enchmarking TinyML Systems
Table 3.
TinyMLPerf Benchmarking Suite U SE C ASE D ATASETS M ODEL A UDIO W AKE W ORDS S PEECH C OMMANDS (W ARDEN , 2018 A ) DS-CNN (Z HANG ET AL ., 2017)V
ISUAL W AKE W ORDS V ISUAL W AKE W ORDS D ATASET (C HOWDHERY ET AL ., 2019) DS-CNN (TFLM-P
ERSON -D ETECTION )I MAGE C LASSIFICATION
CIFAR10(K
RIZHEVSKY ET AL ., 2009 A ) R ESNET
E ET AL ., 2016)A
NOMALY D ETECTION T OY ADMOS (T OY C AR )(K OIZUMI ET AL .,2019) D
EEP A UTO E NCODER (K OIZUMI ET AL .,2020) more, the datasets can be used to train a new or modifiedmodel in the open division. We have selected datasets thatare open, well known, and relevant to industry use cases.
The group has selected four reference models. These ref-erence models are the benchmark workloads in the closeddivision and act as a baseline for the open division. The DS-CNN described in (Zhang et al., 2017) have been selectedfor audio wake words. The MobilenetV1(Howard et al.,2017) used in the TensorFlow Lite for Microcontrollersperson detection example (TFLM-Person-Detection) hasbeen selected for visual wake words. An eight layer ResNetmodel (He et al., 2016) has been selected for image clas-sification. The baseline deep autoencoder from Task 2 ofDCASE2020 competetition has been selected for anomalydetection. (Koizumi et al., 2020). The models were se-lected, based on industry input, to be representative of theirrespective use cases.
The benchmarking suite will primarily measure inferencelatency with the option to measure energy consumption. Thescope of the the the measurements is determined by eachbenchmark. In the open division the accuracy of the modelmust remain within a set threshold of the closed divisionmodel.
Perfection is often the enemy of good, therefore, to fillthe community’s need for comparability, our priority is toquickly establish a set of minimum viable benchmarks anditeratively address deficiencies. The benchmarking suitewill continue to evolve to meet the needs of the community. We plan to accept result submissions in March of 2021.
ONCLUSION
In conclusion, TinyML is an important and rapidly evolvingfield that requires comparability amongst hardware inno-vations to enable continued progress and stability. In thispaper, we reviewed the current landscape of TinyML, in-cluding highlighting the need for a hardware benchmark.Additionally, we analyzed challenges associated with devel-oping said benchmark and discussed a path forward. Finally,we have selected use cases, datasets, and models for ourfour benchmarks.If you would like to contribute to the effort, join the work-ing group here: https://groups.google.com/u/4/a/mlcommons.org/g/tiny
The benchmark suite is available here: https://github.com/mlcommons/tiny R EFERENCES
Helium: Enhancing the capabilities of the smallest de-vices. URL .Altun, K., Barshan, B., and Tunc¸el, O. Comparative studyon classifying human activities with miniature inertial andmagnetic sensors.
Pattern Recognition , 43:3605–3620,10 2010. doi: 10.1016/j.patcog.2010.04.019.Amir, A., Taba, B., Berg, D., Melano, T., McKinstry, J.,Nolfo, C. D., Nayak, T., Andreopoulos, A., Garreau,G., Mendoza, M., Kusnitz, J., Debole, M., Esser, S.,Delbruck, T., Flickner, M., and Modha, D. A lowpower, fully event-based gesture recognition system. In enchmarking TinyML Systems
Recognition (CVPR) , pp. 7388–7397, July 2017. doi:10.1109/CVPR.2017.781.Brainchip. Akida neuromorphic sys-tem on chip. URL .Chowdhery, A., Warden, P., Shlens, J., Howard, A.,and Rhodes, R. Visual wake words dataset.
CoRR ,abs/1906.05721, 2019. URL http://arxiv.org/abs/1906.05721 .Cramariuc, A.-C. P. I. M. B. Precis har, 2019. URL http://dx.doi.org/10.21227/mene-ck48 .De Vito, S., Massera, E., Piga, M., and Martinotto, L. Onfield calibration of an electronic nose for benzene estima-tion in an urban pollution monitoring scenario.
Sensorsand Actuators B Chemical , 129:750–757, 02 2008. doi:10.1016/j.snb.2007.09.060.Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. ImageNet: A Large-Scale Hierarchical ImageDatabase. In
CVPR09 , 2009.EEMBC. Ulpmark - an eembc benchmark. URL .Fedorov, I., Adams, R. P., Mattina, M., and Whatmough, P.Sparse: Sparse architecture search for cnns on resource-constrained microcontrollers. In
Advances in Neural In-formation Processing Systems 32 , pp. 4978–4990. CurranAssociates, Inc., 2019.Flamand, E., Rossi, D., Conti, F., Loi, I., Pullini, A., Roten-berg, F., and Benini, L. Gap-8: A risc-v soc for ai atthe edge of the iot. In , pp. 1–4, July 2018. doi:10.1109/ASAP.2018.8445101.Gal-On, S. and Levy, M. Exploring coremark - a bench-mark maximizing simplicity and efficacy. Technical re-port. URL .Garofalo, A., Rusci, M., Conti, F., Rossi, D., and Benini,L. Pulp-nn: accelerating quantized neural networkson parallel ultra-low-power risc-v processors.
Philo-sophical Transactions of the Royal Society A: Mathe-matical, Physical and Engineering Sciences , 378(2164):20190155, Dec 2019. ISSN 1471-2962. doi: 10.1098/rsta.2019.0155. URL http://dx.doi.org/10.1098/rsta.2019.0155 .Gemmeke, J. F., Ellis, D. P. W., Freedman, D., Jansen, A.,Lawrence, W., Moore, R. C., Plakal, M., and Ritter, M. Audio set: An ontology and human-labeled dataset foraudio events. In
Proc. IEEE ICASSP 2017 , New Orleans,LA, 2017.Goldberger, A. L., Amaral, L. A. N., Glass, L., Hausdorff,J. M., Ivanov, P. C., Mark, R. G., Mietus, J. E., Moody,G. B., Peng, C.-K., and Stanley, H. E. PhysioBank, Phys-ioToolkit, and PhysioNet: Components of a new researchresource for complex physiologic signals.
Circulation ,101(23):e215–e220, 2000. Circulation Electronic Pages:http://circ.ahajournals.org/content/101/23/e215.fullPMID:1085218; doi: 10.1161/01.CIR.101.23.e215.Hassan, M. M., Uddin, M. Z., Mohamed, A., and Almogren,A. A robust human activity recognition system usingsmartphone sensors and deep learning.
Future GenerationComputer Systems , 81:307–313, 2018.He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learn-ing for image recognition. In
Proceedings of the IEEEconference on computer vision and pattern recognition ,pp. 770–778, 2016.Holleman, J. The speed and power advantageof a purpose-built neural compute engine, Jun2019. URL .Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang,W., Weyand, T., Andreetto, M., and Adam, H. Mobilenets:Efficient convolutional neural networks for mobile visionapplications. arXiv preprint arXiv:1704.04861 , 2017.Kim, H., Chen, Q., Yoo, T., Kim, T. T.-H., and Kim, B. A1-16b precision reconfigurable digital in-memory com-puting macro featuring column-mac architecture and bit-serial computation. In
ESSCIRC 2019-IEEE 45th Eu-ropean Solid State Circuits Conference (ESSCIRC) , pp.345–348. IEEE, 2019.Koizumi, Y., Saito, S., Uematsu, H., Harada, N., and Imoto,K. Toyadmos: A dataset of miniature-machine operatingsounds for anomalous sound detection. In , pp. 313–317. IEEE, 2019.Koizumi, Y., Kawaguchi, Y., Imoto, K., Nakamura, T.,Nikaido, Y., Tanabe, R., Purohit, H., Suefusa, K., Endo,T., Yasuda, M., and Harada, N. Description and dis-cussion on DCASE2020 challenge task2: Unsupervisedanomalous sound detection for machine condition moni-toring. In arXiv e-prints: 2006.05822 , pp. 1–4, June 2020.URL https://arxiv.org/abs/2006.05822 .Krizhevsky, A., Hinton, G., et al. Learning multiple layersof features from tiny images. 2009a. enchmarking TinyML Systems
Krizhevsky, A., Nair, V., and Hinton, G. Cifar-10 (canadianinstitute for advanced research). 2009b. URL .Kumar, A., Goyal, S., and Varma, M. Resource-efficientmachine learning in 2 KB RAM for the internet ofthings. In Precup, D. and Teh, Y. W. (eds.),
Pro-ceedings of the 34th International Conference on Ma-chine Learning , volume 70 of
Proceedings of Ma-chine Learning Research , pp. 1935–1944, InternationalConvention Centre, Sydney, Australia, 06–11 Aug2017. PMLR. URL http://proceedings.mlr.press/v70/kumar17a.html .Lai, L., Suda, N., and Chandra, V. Cmsis-nn: Efficientneural network kernels for arm cortex-m cpus, 2018.LeCun, Y. and Cortes, C. MNIST handwritten digitdatabase. 2010. URL http://yann.lecun.com/exdb/mnist/ .Lobov, S., Krilova, N., Kastalskiy, I., Kazantsev, V., andMakarov, V. Latent factors limiting the performanceof semg-interfaces.
Sensors , 18:1122, 04 2018. doi:10.3390/s18041122.Moons, B., Bankman, D., Yang, L., Murmann, B., andVerhelst, M. Binareye: An always-on energy-accuracy-scalable binary cnn processor with all memory on chipin 28nm cmos. In , pp. 1–4. IEEE, 2018.Reddi, V. J., Cheng, C., Kanter, D., Mattson, P.,Schmuelling, G., Wu, C.-J., Anderson, B., Breughe, M.,Charlebois, M., Chou, W., Chukka, R., Coleman, C.,Davis, S., Deng, P., Diamos, G., Duke, J., Fick, D., Gard-ner, J. S., Hubara, I., Idgunji, S., Jablin, T. B., Jiao, J.,John, T. S., Kanwar, P., Lee, D., Liao, J., Lokhmotov,A., Massa, F., Meng, P., Micikevicius, P., Osborne, C.,Pekhimenko, G., Rajan, A. T. R., Sequeira, D., Sirasao,A., Sun, F., Tang, H., Thomson, M., Wei, F., Wu, E., Xu,L., Yamada, K., Yu, B., Yuan, G., Zhong, A., Zhang, P.,and Zhou, Y. Mlperf inference benchmark, 2019.Roggen, D., Calatroni, A., Rossi, M., Holleczek, T., F¨orster,K., Tr¨oster, G., Lukowicz, P., Bannach, D., Pirkl, G.,Ferscha, A., Doppler, J., Holzmann, C., Kurz, M., Holl,G., Chavarriaga, R., Sagha, H., Bayati, H., Creatura,M., and d. R. Mill`an, J. Collecting complex activitydatasets in highly rich networked sensor environments.In , pp. 233–240, June 2010. doi:10.1109/INSS.2010.5573462.Saxena, A. and Goebel, K. Turbofan enginedegradation simulation data set, 2008. URL http://ti.arc.nasa.gov/project/prognostic-data-repository . TensorFlow. Tensorflow lite for microcontrollers.URL .TFLM-Person-Detection. Tensorflow lite for mi-crocontrollers person detection example. URL https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/micro/examples/person_detection .tinyML Foundation. tinyml summit, 2019. URL .Torelli, P. and Bangale, M. Measuring inference perfor-mance of machine-learning frameworks on edge-classdevices with the mlmark benchmark. URL .Vaizman, Y., Ellis, K., and Lanckriet, G. Recognizingdetailed human context in the wild from smartphones andsmartwatches.
IEEE Pervasive Computing , 16(4):62–74,October 2017. ISSN 1558-2590. doi: 10.1109/MPRV.2017.3971131.Vergara, A., Vembu, S., Ayhan, T., Ryan, M., Homer, M.,and Huerta, R. Chemical gas sensor drift compensationusing classifier ensembles.
Sensors and Actuators B:Chemical , s 166–167:320–329, 05 2012. doi: 10.1016/j.snb.2012.01.074.Wang, K., Liu, Z., Lin, Y., Lin, J., and Han, S. Haq:Hardware-aware automated quantization with mixed pre-cision. In
Proceedings of the IEEE Conference on Com-puter Vision and Pattern Recognition , pp. 8612–8620,2019.Warden, P. Speech commands: A dataset for limited-vocabulary speech recognition, 2018a.Warden, P. why the future of machine learning is tiny, 2018b.URL https://petewarden.com/2018/06/11/why-the-future-of-machine-learning-is-tiny/https://petewarden.com/2018/06/11/why-the-future-of-machine-learning-is-tiny/