The world of high-performance computing is particularly challenging, as new flavors of sophisticated sensors, complex cameras, and the requirements of Advanced computations, which include Software-Defined Radio, cryptography, and other types of arithmetic-intensive algorithms. Recent developments in artificial intelligence (AI) and growing demands imposed by both the enterprise and military sectors call for heavy-lift tasks such as: rendering images for video; analysis of high-volumes of data in detecting risks and generating real-time countermeasures demanded by cybersecurity; as well as the software-based abstraction of complex hardware systems.
GPPs, also known as “traditional CPUs,” feature ever-improving performance, well-understood, and mature software development tools – and are available in many form factors. While higher transistor counts and frequency increases are brute force approaches to CPU scaling, the limits of those methods are proving to be increasingly cumbersome. The downside of using these GPPs in embedded high-performance applications might be limited product lifespan brought about by end-of-life commercial components or changes in platform support, along with latency issues that are always a concern with real-time applications (particularly vehicle systems or soldier-worn equipment). Meanwhile, environmental concerns can result in thermal issues and reduced quality of service in cold temperatures, and power consumption draws can run high.
One alternative to the GPP is the classic DSP, typically offering lower-power, low-latency components with high-throughput potential: However, learning complexity of development tools and the slower relative processing performance, both of which mean that DSPs are not always real-world practical for military applications. Networks of DSPs are a tried-and-true solution for parallel processing requirements but magnify the shortcomings of the technology in general. Long development cycles abound, and deployment and maintenance can be difficult, sometimes limiting the usefulness of the technology. Furthermore, DSPs, like other processors, are becoming more power-hungry as their performance increases, meaning that heat dissipation and power usage must be addressed.
Another alternative to GPPs is the FPGA, which has found a niche in the high-performance computing world, often utilized as a coprocessing device to massage data. Their inherent parallel architecture and performance – as related to processing power, latency, and throughput – are well-suited to many types of mission-critical signal processing applications. The field-programmable aspect of the FPGA’s processing unit is also highly beneficial, as updates can be implemented in near real time.
Although excessive power consumption can be an issue with FPGAs in embedded applications, power usage can usually be managed according to a given technical requirement. However, FPGAs are not always available in extended-temperature or rugged packaging, limiting their use in systems designed for harsh environments. FPGAs can also have longer product life cycles; if a particular device is end-of-life, functional deployed application firmware can usually be employed on a newer part with little additional effort.
A graphics processing unit (GPU) is a specialized, electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobile phones, personal computers, workstations, and game consoles. Modern GPUs are very efficient at manipulating computer graphics and image processing. Their highly parallel structure makes them more efficient than general-purpose central processing units (CPUs) for algorithms that process large blocks of data in parallel. In a personal computer, a GPU can be present on a video card or embedded on the motherboard. In certain CPUs, they are embedded on the CPU die.
Thus, general purpose GPU (GPGPU) computing was born. GPGPU architecture involves many parallel computing pipelines, each of which can run a small program called a shader. As a customizable program, a shader can perform a wide variety of tasks. NVIDIA has capitalized deeply on this ability for more than a decade with its Compute Unified Device Architecture (CUDA) software platform. NVIDIA®
CUDA® provides an application programming interface (API) for software developers to let programs written in a variety of languages access GPU functions (e.g., C, C++, Fortran, and many more via third-party wrappers).
CUDA is not the only available API for GPGPU software. For example Open Computing Language (OpenCL) is a Khronos Group open-source cross-platform alternative to CUDA. OpenCL helps programs execute on heterogeneous platforms – systems that compute across CPUs, GPUs, and other types of processors. OpenGL, also managed by Khronos, offers a cross-platform API specifically made for vector graphics and leverages the GPU to accelerate rendering. Ultimately CUDA remains popular for its easy programmability. Field-programmable gate array (FPGA) chips can be customized for accelerating certain tasks, but CUDA remains the more user-friendly and efficient approach
for many programmers.
Additionally, GPUs are available in extended temperature and rugged packages, making them suitable for deployment on airborne or other environmentally challenging platforms. The projected GPU lifespan can be limited, but with careful material planning, this can be managed. As with GPPs, care must also be used with power management and heat dissipation, particularly with small form factor systems.
While CPU performance advances have recently rolled off, the development of GPUs continue to track more closely to Moore’s Law. As such, GPUs stand more ready than ever to help process the very large data sets commonly seen in data centers and IoT hubs. However, GPUs alone are not the answer for next-gen computation. While none of them alone serves as the best remedy, modern Graphics Processing Units (GPUs) are providing a viable remedy when combined with other computing “rivals” – the General Purpose Processor (GPP), DSP, and FPGA.
This alternative also known as heterogeneous computing looks to be the most effective approach for tackling tomorrow’s large computing needs. Heterogeneous computing refers to systems that use more than one kind of processor or cores. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually incorporating specialized processing capabilities to handle particular tasks. With heterogeneous computing, different chip types perform complementary tasks. For example, a heterogeneous platform might have FPGAs manage information as it flows into the system. Output from the FPGAs could then stream into GPUs for processing, and CPUs would handle data management and routing. APIs help programmers devise software that lets each chip type work in concert with each other while leveraging each processor’s strengths.
Different embedded solutions bring their own strengths and weaknesses. Computer processing units (CPUs) are well suited to handling sequential workloads and benefit from a large existing code base; the drawback is power consumption and the latency involved in getting data to be processed. Graphics processing units (GPUs) are better at processing vector data and parallel processing, but at the cost of high power consumption. Field-programmable gate arrays (FPGAs) require lower power and are low latency, but more money and higher complexity are involved in system designs, Smith explains during a talk at VITA’s Embedded Tech Trends in Phoenix, explains Adam Smith, a sales professional at Alpha Data in Denver. Smith recommends creating hybrid systems to benefit from the best characteristics of each (FPGA, GPU, and CPU), including: low latency, power efficiency, attractive performance per dollar, longer product life, customization, and the efficient use of diverse processors.
Heterogeneous System Architecture (HSA) is a cross-vendor set of specifications that allow for the integration of central processing units and graphics processors on the same bus, with shared memory and tasks. The HSA is being developed by the HSA Foundation, which includes (among many others) AMD and ARM. The platform’s stated aim is to reduce communication latency between CPUs, GPUs and other compute devices, and make these various devices more compatible from a programmer’s perspective, relieving the programmer of the task of planning the moving of data between devices’ disjoint memories (as must currently be done with OpenCL or CUDA)
In digital warfare, attacks can take many forms: from disinformation campaigns, to espionage, to the widespread exploitation of network infrastructure, which can then be used to attack other infrastructure such as energy grids. Defending against such attacks requires a formidable amount of real-time network monitoring, event analysis, and the ability to inject countermeasures instantaneously. Some
countermeasures may involve massive amounts of processing for traffic handling. Heterogeneous computing can address these various stages of digital warfare and process them with maximum efficacy.
Armed forces can reap benefits from heterogeneous computing in numerous ways. For instance, even though radar processing systems often deploy on large cruisers, submarines, and similar platforms, such vehicles must still deal with the same size, weight and power (SWaP) constraints as the rest of the military. A legacy radar processing systems might require four cubic feet to house an 18-blade server
weighing over 50 kg and consuming 2000W to achieve a peak processing speed of 576 GFLOPS. Compare that with a modern VITA-75 system, such as one of ADLINK’s HPERC family of platforms. To reach a nearly identical 574 GFLOPS, ADLINK’s fanless HPERC measures only 0.8 cubic feet while weighing less than 5 kg and consuming just 200W. This is made possible in part by the onboard GPU taking over a large portion of the radar signal processing workload.
One near-future application of GPGPU technology in the military/aerospace sector will be the forthcoming F-35 Block 4 updates, expected in 2023, which must be applied simultaneously across the platform’s Autonomic Logistics Information System (ALIS) fleet management backbone. Upgrades are expected to include 11 radar and electrooptical system enhancements as well as the addition of a wide-area high-resolution synthetic aperture radar (SAR) mode to the Northrop Grumman APG-81’s active electronically scanned array (AESA) radar. Taken together, the Block 4 updates represent a significant jump in signal processing demand that will require a commensurate upgrade to the processing infrastructure. The acceleration inherent in GPU-driven parallel processing will likely be essential.
Military threats are constantly evolving, and our ability to intercept, monitor, and decode communications must keep pace. With traditional hardware components abstracted as software functions (amplifiers, mixers, filters, etc.), software-defined radio (SDR) can address
a wide variety of radio protocols and perform operations on radio signals in real-time. Naturally, such operations rely heavily on signal analysis and filtering. SDR products are widely and increasingly adopted in the military, where the ability to upgrade and update existing software to meet and beat new and emerging threats can give warfighters the combat edge. SDR is a baseline technology that enables
dynamic spectrum access systems with cognitive or “smart radio” functionality. Predictably, GPUs are adept at performing this kind of software-based computation.
From cybersecurity to transportation logistics to target identification, AI is now permeating an increasing array of military, aerospace, industrial, and similar market applications. According to Analytics Insight, 51% of IT professionals report having to work with data for days to multiple weeks before it becomes actionable. A similar number said that they only “sometimes” have the resources needed to act on data. Extrapolate these limitations to military scenarios and the potential for AI to assist in decision making becomes a mission-critical advantage.
GPU-enhanced heterogeneous computing has the ability to introduce critical high-load applications such as AI to many arenas. Data centers, mobile troop encampments, robotics and autonomous vehicles can all take advantage for accelerated, low-power computing resulting in faster timeto-decision. As the NVIDIA Quadro Embedded Partner with the most market-proven experience and broadest embedded
product portfolio, ADLINK continues to excel in making some of the world’s most rugged, dependable, high-performance embedded computing solutions. With NVIDIA embedded technology, ADLINK solutions are now even more capable with integrated GPGPU acceleration
In a distributed system, middleware is the software layer that lies between the operating system and applications. It enables the various components of a system to more easily communicate and share data. Middleware simplifies the development of distributed systems by letting software developers focus on the specific purpose of their applications rather than the mechanics of passing information between applications and systems. Such functionality becomes particularly important at the edge, where multiple AI and conventional applications may need to collaborate as part of a larger edge strategy.
The Vortex data distribution service (DDS) is middleware that enables a secure publish/subscribe pattern for sending and receiving data, events, and commands among network nodes. DDS addresses the needs of applications in aerospace and defense, air-traffic control,
autonomous vehicles, medical devices, robotics, power generation, simulation and testing, smart grid management, transportation systems, and other applications that require real-time data exchange.
Flexible GPU system architecture
A system of GPUs combined with traditional CPUs, DSPs, or FPGAs would allow a developer’s specific signal processing or other algorithms to deploy effectively. These GPU-based systems can be designed to execute many operations (OPS, FLOPs, Teraflops) of usable processing. Moreover, the architecture could comprise a highly customized suite of 6U VPX boards, consisting of one or more compute nodes, an input/output board, an InfiniBand switch, and a management node, all housed in a conduction-cooled chassis and supported by a Linux-based operating system. Estimated system-level performance could then be as high as 1.55 GF/W at theoretical peak/Thermal Design Point (TDP). Fully populated, a GPU-based high-performance computing system could provide a theoretical 1.94 TF of floating-point computation in a physical package of less than 2 cubic feet, which would elicit a power draw of only slightly more than 1 kilowatt.
Such a GPU-based high-performance computing system could additionally include a 6U VPX carrier card, rendering it a configurable SBC with graphics capability. These flexible SBCs could then be installed in an air-cooled chassis, with forced-air cooling, or could be used in a rugged conduction-cooled chassis suitable for harsh environments such as for airborne or naval deployment. As technology advances, the COTS modules can be upgraded, extending the system’s life cycle. Because the deployed form factor does not need to change upon upgrade, enabling deployed systems to be quickly upgraded – or even downsized – to fit power budgets or other environmental constraints is easier. An example of a GPU-based high-performance system suitable for military deployment is Quantum3D’s Katana system, based on the Tanto compute node.
References and Resources also include: