The AIE program is one key element of DARPA’s broader AI investment strategy that will help ensure the U.S. maintains a technological advantage in this critical area. Past DARPA AI investments facilitated the advancement of “first wave” (rule based) and “second wave” (statistical learning based) AI technologies. DARPA-funded R&D enabled some of the first successes in AI, such as expert systems and search, and more recently has advanced machine learning algorithms and hardware. DARPA is now interested in researching and developing “third wave” AI theory and applications that address the limitations of first and second wave technologies.
DoD collects loads of data from satellites, drones and Internet-of-things devices. But it needs help making sense of the intelligence and analyzing it quickly enough so it can be used in combat operations. But, the sheer volume of video content produced makes identifying, assembling and delivering actionable intelligence — from multiple sources and across thousands of hours of footage — a habitually long, laborious process.
Now defense and intelligence agencies are leveraging artificial intelligence (AI) and machine learning to automatically identify video objects of interest. They need powerful artificial intelligence software tools that the tech industry is advancing at a past pace. The U.S. military has already spent $7.4 billion on AI to streamline and speed up video analysis in the conflict against ISIS.
The convolutional neural networks (CNNs) represent the heart of state-of-the-art object detection methods. They are used for extracting features. Several CNNs are available, for instance, AlexNet, VGGNet, and ResNet. These networks are mainly used for object classification task and have evaluated on some widely used benchmarks and datasets such as ImageNet.
In image classification or image recognition, the classifier classifies a single object in the image, outputs a single category per image, and gives the probability of matching a class. Whereas in object detection, the model must be able to recognize several objects in a single image and provides the coordinates that identify the location of the objects. This shows that the detection of objects can be more difficult than the classification of images. To achieve high accuracy in such NNs today, AI processing for video is largely performed in the data center by spatial algorithms that lack temporal representation between frames.
AI processing of video presents a challenging problem because high resolution, high dynamic range, and high frame rates generate significantly more data in real time than other edge sensing modalities. The number of parameters and memory requirement for SOA AI algorithms typically is proportional to the input dimensionality and scales exponentially with the accuracy requirement.
The number of parameters and memory requirement for state-of-the-art (SOA) AI algorithms typically is proportional to the input dimensionality and scales exponentially with accuracy requirements. To accommodate this large data stream, current SOA deep neural networks (DNN), require hundreds of millions of parameters for tens of billions of operations to produce a single accurate AI inference. Today’s current solution is incompatible with embedded implementation at the sensor edge due to power and latency constraints and the result is that embedded solutions for vision sensing at the mobile edge abandon SOA accuracy in favor of even marginally accurate solutions that can operate within the size and power envelope.
DARPA In-Pixel Intelligent Processing (IP2)
In May 2021, the Defense Advanced Research Projects Agency (DARPA) issued an Artificial Intelligence Exploration (AIE) opportunity for In-Pixel Intelligent Processing (IP2). Proposals in response to this notice are due no later than 4:00 p.m. Eastern on June 3.
The objective of the In-Pixel Intelligent Processing (IP2) exploration is to reclaim the accuracy and functionality of deep neural networks (NNs) in power-constrained sensing platforms.
Technical Challenge 1 – Data Complexity: Today’s large format, high frame rate, high dynamic range sensors produce massive quantities of data. Both military relevant and commercially available formats already produce upwards of 30 Gb/sec and seek to triple that to 100 Gb/sec in the coming few years. Movement of such data rates from the pixels to a processing unit is prohibitively expensive in energy and delay. To address this challenge, researchers have moved processing units closer to the sensor via near-sensor processing, processing in periphery or macropixel processing where groups of pixels are processed directly underneath the sensing array. Despite the reduction in the distance the data has to move, the data complexity is not reduced and the power and latency are still excessive since processing is performed after data movement.
Technical Challenge 2 – Accurate, Low Latency, Low Size, Weight, and Power (SWAP) Backend AI Algorithms: SOA NNs typically require billions of operations per inference and require vast quantities of memory, making them poor candidates for embedded implementation.
Rather than in embedded systems, the highest accuracy NNs are typically implemented in the cloud, where they consume megawatts of power. Small traditional DNNs cannot operate with high accuracy on high dimensionality input data because they do not contain enough parameters to train to high accuracy. Instead, current SOA approaches to AI processing seek higher accuracy from exponentially larger networks for marginal improvements to accuracy.
To move beyond this paradigm, IP2 will seek to solve two key elements required to embed AI at the sensor edge: data complexity and implementation of accurate, low-latency, low size, weight, and power (SWaP) AI algorithms. First, IP2 will address data complexity by bringing the front end of the neural network into the pixel, reducing dimensionality locally and thereby increasing the sparsity of high-dimensional video data. This “curated” datastream will enable more efficient back end processing without any loss of accuracy.
Second, IP2 will develop new, closed-loop, task oriented predictive recurrent neural network (RNN) algorithms implemented on a field-programmable gate array (FPGA), to exploit structure and saliency from the in-pixel neurons produced by the NN front end. The NN front end will identify and pass only salient information into the RNN back end, to enable high-throughput, high accuracy 3rd-wave vision functionality. By immediately moving the data stream to sparse feature representation, reduced complexity NNs will train to high accuracy while reducing overall compute operations by 10x.
IP2 will require performers to demonstrate SOA accuracy with 20x reduction of AI algorithm processing energy-delay product (EDP) while processing complex datasets such as UC Berkeley BDD100K. BDD100K is a self-driving car dataset that incorporates geographic, environmental, and weather diversity, intentional occlusions and a large number of classification tasks; this is ideal for demonstrating 3rd-wave functionality for future large format embedded sensors.
The IP2 program seeks to demonstrate revolutionary advances in embedded 3rd-wave AI functionality at the edge by developing a new mesh NN hardware scheme that brings the front end of the NN directly into the sensor pixels. This NN will enable significant advances in both data stream complexity reduction and back-end 3rd-wave AI processing efficiency and functionality, without loss of accuracy.