Shipping Country
Free shipping within the continental US over $50. Conditions apply
Select Country
By Wafy Butty FPGA Specialist Engineer (Central Europe), Future Electronics
Read this to find out about:
The application of Artificial Intelligence (AI) has gained broad public attention in a handful of high-performance systems: examples include the machines which can beat humans at complex games such as chess and Go, IBM’s Watson AI platform which provides, for instance, natural-language diagnostic support to medical practitioners.
These pioneering uses of AI are based in the computer science world, and draw on the massive computing resources of data centres operated by large companies such as IBM, Google and Microsoft.
Now, however, interest in the application of AI is growing in the embedded world, where edge computing resources are a tiny fraction of those of a data centre. Already, the sweet spots of AI for embedded developers are emerging. AI applications known to be feasible even with the constrained computing resources of a microcontroller, applications processor or FPGA include machine condition monitoring for predictive maintenance, and object classification or recognition.
Object recognition is a particularly exciting function for embedded AI, because of the wide range of use cases for it:
The creation of an embedded device which is capable of recognising a certain category or categories of objects involves machine learning, a technology which is now the subject of a large body of literature. It is not the purpose of this article to shed light on the process of machine learning itself, nor on the functions of data collection and labelling, model training, and optimisation of a neural network algorithm.
Instead, this article looks at the narrow question of hardware evaluation: which type of platform is best suited to the task of running an object-recognition algorithm and the associated system functions? And how might the developer expect a silicon manufacturer to support the compilation of the algorithm to the hardware target?
The embedded developer community tends to be divided into tribes: a developer is normally a user of either a microcontroller, or an applications processor, or an FPGA. When it comes to the implementation of an object-recognition system, the FPGA user enjoys some important advantages.
The first arises from the typical composition of the host system in which the object-recognition function is embedded. It might include:
In addition, the object-recognition algorithm itself is essentially a complex set of mathematical operations performed in parallel. This calls for extensive digital-signal processing resources and a large number of I/Os for shifting data into and out of memory at high speed.
FPGAs are particularly well suited to this combination of requirements. Widely used in telecoms and networking equipment, FPGAs are excellent handlers of high-speed data streams. Their basic building blocks, Logic Elements (LEs) are readily configured to perform logic functions in parallel – the hallmark of the FPGA, distinguishing it from the sequential processing mode of a microcontroller or applications processor. This tends to mean that an FPGA can perform neural networking functions faster, while using less power and less hardware resource, than an MCU or applications processor.
Interestingly, for all the computational complexity of neural networks for object detection, they do not always require the massive array of LEs offered by high-end FPGAs available from Xilinx or Altera. In fact, successful implementations of camera-based AI have been made even on small FPGAs containing fewer than 10,000 LEs. Lattice Semiconductor, for instance, supplies the Himax HM01B0 Upduino shield, a modular development board for AI applications using visual and sound inputs, and running on a Lattice UltraPlus FPGA which contains just 5,300 Look-Up Tables (LUTs).
Fig. 1: the architecture of the PolarFire series of FPGAs from Microchip. (Image credit: Microchip)
For many object-recognition applications, mid-range FPGAs such as Microchip’s PolarFire series provide an ideal balance between capability, cost, size and power consumption. The features of the PolarFire MPF300T, for instance, include 300,000 LEs, 924 multiply-accumulate math blocks (18x18), and 20.6Mbits of RAM (see Figure 1). The biggest device in the PolarFire family has around 500,000 LEs, 1,480 math blocks, and 33Mbits of RAM.
The device’s features are closely aligned to the system requirements of machine vision equipment handling images at up to 4K resolution. An MPF300T can provide:
A particular feature of PolarFire FPGAs is the way that they perform math operations in DSP blocks: a PolarFire DSP block can perform up to four 9-bit operations per clock cycle, whereas some other FPGAs typically only perform two operations per clock cycle. This means that the PolarFire device can perform the same number of math operations at half the clock frequency. The biggest PolarFire device can perform around 1.48 tera-operations per clock cycle.
The hardware configuration of PolarFire FPGAs, then, is ideally suited to embedded systems that perform object recognition. An FPGA’s implementation of object recognition is underpinned by the way it compiles the neural network, which will normally be trained on a cloud-based compiler such as Caffe, TensorFlow or Keras.
For compilation to a PolarFire FPGA, Microchip collaborates with ASIC Design Services (ADS). The latter has developed Core Deep Learning (CDL), a scalable, flexible software framework optimized for convolutional neural networks – the type of neural network commonly used for object recognition, as well as other machine learning functions. CDL takes an input – a trained neural network – from the Caffe framework and renders it as a SystemVerilog file to be programmed in PolarFire logic fabric.
The CDL framework’s functions include:
An important advantage of CDL is the scope to add constraints. The system developer can specify the features of the hardware target at which the compiled neural network is aimed. The PolarFire family, for instance, stretches from the MPF100T with 109,000 LEs to the MPF500T with 481,000 LEs. The CDL will try to compile the trained network with the specified user constraints to fit in the target FPGA.
The quickest way for developers to start experimenting with the object-recognition capability of the PolarFire family is to use the Avalanche board (part number AVMPF300TS-00) supplied by Future Electronics (see Figure 2). This is a complete object-recognition demonstration system based on an MPF300T PolarFire FPGA with 256Mbits of DDR3 memory, 64Mbits of serial Flash, a Gigabit Ethernet interface and a webcam-style image sensor.
A PC application supplied with the board prepares the video stream and encapsulates it in an Ethernet-based link. The Avalanche board itself is programmed with the TINY YOLOv2 convolutional neural networks, which are pre-trained for object detection with the Pascal VOC dataset of 20 classes of common objects.
The Avalanche board returns the classification result provided by the TINY YOLOv2 algorithm to the user interface, which maps them on to the real-time video stream.
Fig. 2: the Future Electronics Avalanche board is supplied with an example neural network capable of recognising common objects such as cows and cats. (Image credit: Future Electronics)
This MPF300T-based system can reliably recognize still or video images of more than 20 object types including:
Users of the Avalanche board can start by exploring the Future Electronics demonstration and its supporting documentation, and then go on to experiment with different neural network implementations to see the variations in performance (speed, accuracy) and resource usage that result from changes in the way that the neural network is optimised in the Caffe training framework, or in the training data on which it learns.
Interested developers may apply for an Avalanche board from any branch of Future Electronics, and the company’s team of machine learning and FPGA specialists will be pleased to provide advice on starting a new object-recognition application.