End devices, such as Internet-of-Things sensors, are generating most of the data. The amount of data created by these connected IoT devices will see a compound annual growth rate (CAGR) of 28.7% over the 2018-2025 forecast period, and IDC estimates that there will be 41.6 billion connected IoT devices, or "things," generating 79.4 zettabytes (ZB) of data in 2025[1]. A lot of those data needs to be analyzed in real-time using deep learning models. However, deep learning inference and training require substantial computation resources to work quickly.
There are currently two research directions to deploy deep learning models onto IoT devices to address requirements like scalability, privacy concern, network latency, bandwidth efficiency, etc.
Firstly, at the framework level, where data were trained on devices with higher computational ability, then the deep learning model was compressed to fit onto computationally weak end devices. Examples are DeepSense[2], TinyML[3], DeepThings[4], DeepIoT[5], and others. Secondly, some novel system-on-chip designs aim to solve the demand for deep learning at edge devices at the ASIC chip-level, running the deep learning algorithms from the deep ground. Those chips were designed by highly coupling the computation and memory resources, exploiting deep learning, and computing's vast inherent parallelism and computing at only required numerical precision. Examples are Ergo[6], which can process large neural networks in 20mW and supports a wide variety of currently popular styles of advanced neural networks, including CNNs, RNNs, LSTMs, and more. And also, NDP100/101[7], ECM3531/3532[8], etc.
Both approaches have their advantages regarding the cost, efficiency, effort for implementation, etc. Which one do you think will predominate the development of deep learning on edge devices? I am not asking for an answer to choose A or B, but a discussion of your preference if you are/were working on related topics.
[1] https://www.idc.com/getdoc.jsp?containerId=prUS45213219
[2] DeepSense: A unified deep learning framework for time-series mobile sensing data processing
[3] TinyML: Machine Learning with TensorFlow Lite on Arduino and Ultra-Low-Power Microcontrollers
[4] DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters
[5]. DeepIoT: Compressing Deep Neural Network Structures for Sensing Systems with a Compressor-Critic Framework
[6] https://perceive.io/product/
[7] https://www.syntiant.com/ndp101
[8] https://etacompute.com/tensai-soc/