DARPA’s 1000X efficient graph analytics processor enables real-time identification of cyber threats, and vastly improved situational awareness

Rajesh Uppal January 14, 2018 AI & IT Comments Off on DARPA’s 1000X efficient graph analytics processor enables real-time identification of cyber threats, and vastly improved situational awareness 1,029 Views

Today large amounts of data is collected from numerous sources, such as social media, sensor feeds (e.g. cameras), and scientific data. There are over 1 billion websites on the world wide web today and the Annual global IP traffic will reach 3.3 ZB per year by 2021, or 278 exabytes (EB) per month. In 2016, the annual run rate for global IP traffic was 1.2 ZB per year, or 96 EB per month. The notion of Big Data emerges from the observation that 90 percent of the data available today has been created in just the past two years. From devices at the edge to large data centers crunching everything from corporate clouds to future energy technology simulations, the world is awash in data – being stored, indexed and accessed, says Intel. The goal of DARPA’s Hierarchical Identify Verify Exploit (HIVE) programme is to explore new and more efficient methods of processing large amounts of complex data.

DoD, has to make sense and make decisions based on large amount of data it collects like communications, intelligence, surveillance, and reconnaissance from drones, automated cybersecurity systems. Real-time predictive large-scale data analytics can provide decisive advantage to commanders across a range of military operations in the homeland and abroad, information supremacy, enhancing autonomy technologies and vastly improved situational awareness to aid warfighters and intelligence analysts, according to ARL.

In the big data era, information is often linked to form large-scale graphs. Graph analytics has emerged as a way to understand the relationships between these heterogeneous types of data, allowing analysts to draw conclusions from the patterns in the data and to answer previously unthinkable questions. By understanding the complex relationships between different data feeds, a more complete picture of the problem can be understood and some amount of causality may be inferred.

There is also an increasing need to make decisions in real time, which requires understanding how the inherent relationships in the graph evolve over time. This emerging data analytics technology is also used for applications like cyber defense and critical infrastructure protection that require analyzing huge data sets in real time.

Currently much of graph analytics is performed in large data centers on large cached or static data sets and the amount of processing required is a function of not only the size of the graph, but the type of data being processed. DARPA’s Hierarchical Identify Verify Exploit (HIVE) program that seeks to develop a generic and scalable graph processor that specializes in processing sparse graph primitives, and achieves 1000-times improvement in processing efficiency over standard processors.

In combination with emerging machine learning and other artificial intelligence techniques that can categorize raw data elements, and by updating the elements in the graph as new data becomes available, a powerful graph analytics processor could discern otherwise hidden causal relationships and stories among the data elements in the graph representations.

Most graph processing problems require large server-class type computers with a large size, weight, and power (SWaP) requirements. But the scale required limits what can be done in a tactical environment. HIVE is expected to overcome that challenge and enable processing of information at the tactical edge, Boyle said. “The hard problem is getting the processor down into a form factor and a SWaP footprint that is compatible with a tactical environment and then using it in an environment where you are really working towards this future of cognitive autonomy and intelligent systems,” he added.

“You are seeing companies shift from general-purpose computing devices to purpose built; that is what the HIVE chip is, it is a purpose-built chip just for graph processing,” he said. “This is a key enabler to the future that you hear the customers talking about. It is not just about being able to process graphs, this is one of the core technologies required for cognitive systems. That technology more broadly speaking will impact just about every aspect of war fighting in the future.”

The concept phase of the HIVE program extends though next year, with initial prototyping beginning in fiscal 2019. Chip fabrication could begin as early as fiscal 2020, the agency said.

Current Hardware inefficient for graph analytics

Unlike traditional analytics that are tools to study “one to one” or “one to many” relationships, graph analytics can use algorithms to construct and process the world’s data organized in a “many to many” relationship – moving from immediate connections to multiple layers of indirect relationships. Examples of these relationships among data elements and categories include person-to-person interactions as well as seemingly disparate links between, say, geography and changes in doctor visit trends or social media and regional strife. This applies to a wide array of applications such as transportation routing, genomics processing, financial transaction optimization, and consumer purchasing analysis.

Processing connected big data has been a major challenge. With the emergence of data and network science, graph computing is becoming one of the most important techniques for processing, analyzing, and visualizing connected data. “Georgia Tech researchers led by Lifeng Na noted in a paper delivered at the 2015 supercomputing conference, “The challenges in graph computing come from multiple key issues like frameworks, data representations, computation types, and data sources.”

Previous research has been done on streaming graph analytics, but has been hampered by the amount of processing required to pinpoint which part of the graph needs to be updated based on the new data. This update has to be done at the speed of the incoming data and cannot be done as an offline process because the nature of the graph is either developing or changing in real time.

The nature of the graph can be very sparse, as the number of relationships between entities are not known or clear. Trying to analyze the graph with standard processors is extremely inefficient because sparse data must be processed in real time, DARPA officials say.

The sparseness of the data and the requirement to process that data in real time make the application of graph analytics on standard processors extremely inefficient. Graph analytics shifts the processing workload to locating the information and moving the data; only 4 percent of processing time and power goes to the overall effort. Such inefficiency either limits the size of the graph to what the chip can hold, or requires an extremely large cluster of computers.

Graph Analytics Processor

To take on that technology shortfall, MTO last summer unveiled its Hierarchical Identify Verify Exploit (HIVE) program that seeks to develop a generic and scalable graph processor that specializes in processing sparse graph primitives, and achieves 1000-times improvement in processing efficiency over standard processors.

If HIVE is successful, it could deliver a graph analytics processor that achieves a thousand fold improvement in processing efficiency over today’s best processors, enabling the real-time identification of strategically important relationships as they unfold in the field rather than relying on after-the-fact analyses in data centers.

“This should empower data scientists to make associations previously thought impractical due to the amount of processing required,” said Tran. These could include the ability to spot, for example, early signs of an Ebola outbreak, the first digital missives of a cyberattack, or even the plans to carry out such an attack before it happens.

HIVE is non-Neuman architecture

The classical von Neumann architecture, in which the processing of information and the storage of information are kept separate, has now faced a performance bottleneck. Data travels to and from the processor and memory—but the computer can’t process and store at the same time. By the nature of the architecture, it’s a linear process, and ultimately leads to the von Neuman “bottleneck.”

Trung Tran, a DARPA program manager said that our CPUs and GPUs have gone parallel but cores are still von Neumann. HIVE is non-Neuman as it simultaneously performs different processes on the different areas of memory. This approach allows one big map that can be accessed by multiple processors at the same time, Tran said.

Hierarchical Identify Verify Exploit (HIVE) program

The program has now signed on five performers to carry out HIVE’s mandate: to develop a powerful new data-handling and computing platform specialized for analyzing and interpreting huge amounts of data with unprecedented deftness.

The program includes the development of chip prototypes, development of software tools to support programming of the new hardware, and design of a system architecture to support efficient multi-node scaling. Specifically the chip development will focus on improving the efficiency of random access memory transactions to limit data movement, efficient parallelism to improve scalability, and new accelerators which are design specifically for graph computation.

The HIVE project will be performed in three phases over the next four and a half years, with three technical areas:

Graph analytics processor: The role of TA1 is to research and design a new chip architecture from scratch. Performers are intended to tackle the twin challenges of the memory wall and of true parallelization of multimode systems. The memory wall has vexed programmers for the last 20 years and has forced them to come up with new and creative ways to deal with memory access and memory bandwidth bottlenecks, bottlenecks caused by serial memory access patterns relying on the uniform memory placement. New memory architectures are anticipated to be created to allow for non-uniform memory access (NUMA).
True parallelization has also been hampered by the ability to allow coherent memory accesses between nodes and the ability to allow for multi-master multi-drop bus architectures. This leads to machines running in parallel but running independently. True parallelization would allow for those machines to work more closely in concert. In essence, TA1 has to move from today’s single instruction multi-data (SIMD) world to one that allows for multiple instruction multi-data (MIMD) execution
Graph analytics toolkits

DARPA has outlined the HIVE architectural goals as follows:

Create an accelerator architecture and processor pipeline which supports the processing of identified graph primitives in a native sparse matrix format.
Develop a chip architecture that supports the rapid and efficient movement of data from memory or I/Os to the accelerators based on an identified data flow model. Emphasis should be on redefining cache based architectures so that they address both sparse and dense data sets.
Develop an external memory controller designed to ensure efficient use of the identified data mapping tools. The controller should be able to efficiently handle random and sequential memory accesses on memory transfers as small as 8 to 32 bytes

According to Dhiraj Mallick, vice president of the Data Center Group and general manager of the Innovation Pathfinding and Architecture Group at Intel, by the middle of 2021, they and their HIVE contract partners will deliver “a 16-node demonstration platform showcasing 1,000x performance-per-watt improvement over today’s best-in-class hardware and software for graph analytics workloads.”

There are two initial challenges:

The first is a static graph problem focused on sub-graph Isomorphism. This provides the ability to search a large graph in order to identify a particular subsection of that graph.

The second is a dynamic graph problem focused on trying to find optimal clusters of data within the graph. Both will have a small graph problems in the billions of nodes and a large graph problem in the trillions of nodes.

HIVE Partners

The quintet of performers includes a mix of large commercial electronics firms, a national laboratory, a university, and a veteran defense-industry company: Intel Corporation (Santa Clara, California), Qualcomm Intelligent Solutions (San Diego, California), Pacific Northwest National Laboratory (Richland, Washington), Georgia Tech (Atlanta, Georgia), and Northrop Grumman (Falls Church, Virginia).

HIVE is centered on three areas; the first has two teams, one led by Intel, the other by Qualcomm, who are developing the specialised graph processor chip. Two other teams, one led by Pacific Northwest National Laboratory and the other by Georgia Tech, are developing the software and analytic tools piece.

Qualcomm Intelligent Solutions, Inc. (QISI) is one of just two silicon technology providers selected by DARPA to perform breakthrough architectural work on a graph analytics processor as a part of the HIVE (Hierarchical Identify Verify Exploit) project.

QISI has kicked off an initiative called Project Honeycomb to support this important effort. QISI’s goal with Project Honeycomb is to develop a domain-specific processor design and scalable multi-node architecture for the HIVE project. The work is intended to produce a hardware accelerator for graph computation primitives, a memory controller that optimizes data movement based on sparse mapping, and network architecture to avoid congestion in data movement.

QISI plans to deliver the Project Honeycomb architecture specification and simulator to DARPA and other HIVE project performers in 12 months. The next two phases entail the design and fabrication of the graph analytics processor and delivery of a functioning 16-node system to DARPA for evaluation.

We are excited about the innovation potential of this research project that we believe will help define future architectures for advanced deep learning. In addition, we expect Project Honeycomb and the HIVE project as a whole will help accelerate the development of commercial products using a new innovative architectural approach for many areas related to data analytics and artificial intelligence.

The third area is led by Northrop Grumman, which will integrate the hardware and software and test it against a variety of relevant use cases and other technologies, Vern Boyle, vice-president for cyber and advanced processing for Northrop Grumman told Jane’s . The company has been setting up the test environment and working with both the hardware and software providers to look at the designs and algorithms to understand how those will be tested, measured, and evaluated, Boyle said.

The company has been setting up the test environment and working with both the hardware and software providers to look at the designs and algorithms to understand how those will be tested, measured, and evaluated, Boyle said. In order to evaluate HIVE processors, the programme intends to compare the performance of prototypes to current state-of-the-art, multi-GPU systems, Wade Shen, DARPA programme manager for HIVE, told Jane’s . “We are working with [US Department of Defense (DoD) and US government] partners to collect and benchmark how these systems will compare on real graph problems,” Shen said.