
Read this to find out about:
- The advantages of performing AI inferencing at the edge
- The different hardware requirements of different inferencing techniques
- Why predictive maintenance applications can run well on an MCU, and image recognition is well suited to an FPGA
Sophisticated and complex applications of Artificial Intelligence (AI), such as autonomous vehicles, have to take safety-critical decisions in nanoseconds without ever making a mistake. The lives of an autonomous vehicle’s occupants, as well as other road users, depend on the ability of high-powered Graphics Processing Units (GPUs) and complex neural network algorithms to recognize in real time objects such as pedestrians, cyclists, traffic lights and road warning signs.
It is not only autonomous vehicles that use the most sophisticated hardware to run mathematically complex neural networking models. The most famous examples of AI in practice, such as ‘chatbot’ automated online customer services powered by IBM’s Watson platform, run on banks of servers which contain arrays of ultra-powerful processors.
But it does not follow that all AI implementations require the fastest, most powerful processing engine that the application can sustain. In fact, the correct principle to follow is the one that embedded developers have always followed: the application’s functional requirements should determine the hardware specifications, and not vice versa. Over-specifying the hardware is no more correct or sensible in the field of AI than it is in any other field of embedded computing.
And in fact, it is surprising how many AI systems can run at the edge, completely independently of cloud computing services, on low-cost hardware platforms such as a 32-bit microcontroller or a mid-density FPGA.
The Benefits of Edge Computing
Of course, developers of AI applications can choose to perform inferencing in the cloud, where hardware resources are hardly constrained at all. This is the architecture adopted, for instance, by Amazon’s Alexa Voice Service: a device such as a smart speaker will ‘hear’ the user speaking the ‘Alexa’ wake word, and then connect to Amazon’s speech-recognition cloud service to interpret the spoken command.
But there are often good reasons for embedded systems to perform inferencing locally, at the edge.
- When a network connection runs slowly, it can cause high latency. If the network connection goes down, the inferencing function will fail completely. Local inferencing eliminates this cause of unreliability.
- Inferencing is a data-intensive activity, and some network service providers will charge high prices for transferring large amounts of data.
- A network connection is a critical point of vulnerability to attack by malware, hackers or other threats. A stand-alone device performing inferencing at the edge is safe from network-borne attacks.
- Inferencing in the cloud calls for substantial infrastructure, including networking hardware, network service provision and cloud service provision. The OEM eliminates the need to specify and maintain this infrastructure by performing inferencing at the edge.
So there are strong reasons to prefer local inferencing performed on devices such as microcontrollers, applications processors and FPGAs. But does the developer’s preferred hardware platform affect the type of AI application that they will be able to successfully implement?
The Right Answer, Quickly
There are in fact two important parameters that determine the hardware requirement in an embedded AI design, as shown in Figure 1:
- Speed (or latency)
- Accuracy
Broadly speaking, the more time your system can take to decide on the input presented to it, and the higher your tolerance for error, the less powerful your hardware needs to be. An autonomous vehicle’s object-detection system is at the extreme end of the spectrum of use cases, requiring both nanosecond-level latency and near 100% accuracy.
Fig. 1: Speed and accuracy determine hardware requirements for AI. (Image credit: Future Electronics)
By contrast, consider a smart cat flap, which unlocks for the owner’s pet and excludes all other animals, feline or non-feline. It would be reasonable for this device to take up to 500ms to process an input from the camera mounted at the exterior of the cat flap. And pet owners might accept accuracy of 98%, so that once in every 50 instances it fails to admit the pet at the first attempt, and so runs the recognition program a second time.
It would be possible to reduce the cat flap’s latency to 10ms and to raise its accuracy to 99%, but the additional bill-of-materials cost would be substantial. The question for the developer must be, will the improved performance make any difference to the value the consumer derives from it?
It is actually remarkable how much AI work can be done by how little hardware. Figure 2 implies that the minimum hardware requirement for AI is a 32-bit Arm® Cortex®-M-based MCU. In fact, STMicroelectronics has found a way to implement AI without an MCU at all.
ST has embedded a small computation block called a Machine Learning Core (MLC) in some MEMS motion sensors such as the LSM6DSOX accelerometer/gyroscope. This sensor may be used to collect classes of motion data, such as jogging, walking, sitting and driving. Features of this data, such as mean, variance, energy and peak-to-peak values, are analyzed offline to produce a type of detection algorithm called a decision tree.
If the LSM6DSOX is embedded, for instance, in a sports wristband, its MLC is capable of applying the decision tree in real time to the motion measurements it takes and of classifying the activity of the wearer.
Programmable System-on-Chip (SoC) manufacturer QuickLogic uses a similar approach to enable sophisticated predictive maintenance applications on low-cost hardware devices such as its QuickAI™ platform or on Arm Cortex-M4-based MCUs.
Predictive maintenance depends on the recognition of various types of time-series data, such as vibration and sound. Deep learning and neural networks are commonly used to detect fault indicators in the patterns of these time-series data. These algorithms are a subset of a broader set of algorithms known as classifiers. Classifiers transform available inputs into desired discrete output classifications through inferencing, a term which covers a broad range of methods.
QuickLogic’s SensiML AI development toolkit provides a complete environment for building intelligent IoT sensing end-points using the most appropriate of these machine learning techniques, as shown in Figure 2.
Fig. 2: The machine learning development flow using QuickLogic’s SensiML AI toolkit. (Image credit: SensiML)
This includes capturing and labelling raw sensor data, analyzing the dataset for the most efficient algorithm that meets the design constraints, and auto-generating code from signal acquisition through to the classifier output. This process is optimized for the targeted hardware such as QuickAI or other supported platforms.
Fig. 3: Neural network inferencing involves millions of math calculations performed at high speed.
Some AI applications, then, can run on extremely constrained, low-power hardware. But not all can.
Image recognition and object detection, for instance, require a neural network, and performing local inferencing of a trained neural network model is a more processor-intensive exercise than running a decision tree.
Microchip Technology, with its high-performance mid-density PolarFire® FPGA family, argues that its FPGAs are inherently more efficient and faster at performing local inferencing of neural network algorithms than other digital devices. Like any FPGA, PolarFire devices inherently support parallel processing, rather than the sequential processing performed by the CPU in an MCU or applications processor. The PolarFire FPGAs also provide a huge DSP capacity in the form of 8-bit math blocks, the MPF500T PolarFire FPGA, for instance, contains 1480 math blocks. And as Figure 3 shows, a neural network inferencing event essentially involves a vast number of calculations performed in parallel.
Microchip testing shows that an MPF300 PolarFire FPGA can run the open TinyYolo v2 image-recognition algorithm to detect animals such as cows and horses at a frame rate of 43 frames/s, while consuming less than 3W of power.
This is testament to the efficiency with which a PolarFire FPGA implements neural network algorithms locally. For the most advanced forms of embedded AI, such as voice control, face recognition and machine condition monitoring, both NXP and ST provide a comprehensive hardware and software offering, ranging from Arm Cortex-M-based microcontrollers up to applications processors: the STM32MP1 from ST features dual Arm Cortex-A7 processor cores, and NXP offers a broad range of i.MX applications processors based on Arm Cortex-A cores.
Both manufacturers back up these hardware offerings with AI enablement and development tools. NXP even provides turnkey reference designs which demonstrate that applications such as local voice control, anomaly detection and face recognition can be performed on its i.MX RT series of crossover processors – devices which feature a high-end Arm Cortex-M7 microcontroller core rather than a Cortex-A processor core.
Lattice Semiconductor also provides AI solutions based on its iCE40 and ECP5 families of FPGAs. For instance the HM01B0, a low-power image sensor module based on the iCE40, implements reference designs for hand gesture recognition and human presence detection. The human presence detection application consumes as little as 1mW in a small form-factor, low-cost device. The reference design comes with all the material a customer needs to recreate the code, including input tools, training datasets and bitstreams. The design can be scaled up to run on the ECP5 platform for higher performance and greater functionality, while retaining low power consumption.
Low-Cost Hardware Choices
For many applications, the right choice will be to perform AI inferencing at the edge, rather than in the cloud. As the examples above show, the hardware to enable this exists today and it is the same hardware with which embedded developers are already familiar. It is true that, for high-speed neural network inferencing, FPGAs such as the PolarFire devices offer certain speed and power advantages, but some applications do not even need a neural network and where a simpler algorithm is effective, the hardware to support it can be simpler too.