Trending News
Home / Technology / AI & IT / Supercomputers optimized for Artificial Intelligence and Machine Learning applications

Supercomputers optimized for Artificial Intelligence and Machine Learning applications

The  major factors accounting for the recent success of deep neural network is the significant leap in the availability of computational processing power. When Google’s computers roundly beat the world-class Go champion Lee Sedol, it marked a milestone in artificial intelligence. The winning computer program, created by researchers at Google DeepMind in London, used an artificial neural network that took advantage of what’s known as deep learning, a strategy by which neural networks involving many layers of processing are configured in an automated fashion to solve the problem at hand. In addition the computers Google used to defeat Sedol contained special-purpose hardware—a computer card Google calls its Tensor Processing Unit. Reportedly it uses application-specific integrated circuit, or ASIC to speed up deep-learning calculations.

 

A neural network is typically organized into layers, and each layer contains a large number of processing nodes. Data come in and are divided up among the nodes in the bottom layer. Each node manipulates the data it receives and passes the results on to nodes in the next layer, which manipulate the data they receive and pass on the results, and so on. The output of the final layer yields the solution to some computational problem. Although they outperform more conventional algorithms on many visual-processing tasks, they require much greater computational resources.

 

The major factors accounting for the recent success of deep neural network is the significant leap in the availability of computational processing power. Researchers have been taking advantage of graphical processing units (GPUs), which are small chips designed for high performance in processing the huge amount of visual content needed for video games. GPUs have long been used to accelerate traditional HPC workloads. In recent years, however, they’ve also emerged as the go-to acceleration chip for training deep-learning models for AI applications.

 

Most work on scaling up deep learning has taken place inside the data centers of internet companies, where servers work together on problems by splitting them up, because they are connected relatively loosely, not bound into one giant computer. However, companies have been developing supercomputers specialized for AI and machine learning applications.

 

“A supercomputer takes very complex problems and breaks them down into parts that are worked on simultaneously by thousands of processors, instead of being worked on individually in a single system, like a regular computer. Thanks to parallel processing, researchers and scientists can generate insight much faster considering a laptop might take days or weeks to solve what a supercomputer can solve in minutes or hours,” explained Scott Tease, Lenovo’s executive director of High Performance Computing and Artificial Intelligence.

 

Supercomputers have become essential for National Security, for decoding encrypted messages, simulating complex ballistics models, nuclear weapon detonations and other WMD, developing new kinds of stealth technology, and cyber defence/ attack simulation.

 

Deep learning stretches up to scientific supercomputers

Machine learning, a form of artificial intelligence, enjoys unprecedented success in commercial applications. However, the use of machine learning in high performance computing for science has been limited. Why? Advanced machine learning tools weren’t designed for big data sets, like those used to study stars and planets. A team from Intel, National Energy Research Scientific Computing Center (NERSC), and Stanford changed that situation. They developed the first 15-petaflop deep-learning software. They demonstrated its ability to handle large data sets via test runs on the Cori supercomputer.

 

Using machine learning techniques on supercomputers, scientists could extract insights from large, complex data sets. Powerful instruments, such as accelerators, produce massive data sets. The new software could make the world’s largest supercomputers able to fit such data into deep learning uses. The resulting insights could benefit Earth systems modeling, fusion energy, and astrophysics.

 

Machine learning techniques hold potential for enabling scientists to extract valuable insights from large, complex data sets being produced by accelerators, light sources, telescopes, and computer simulations. While these techniques have had great success in a variety of commercial applications, their use in high performance computing for science has been limited because existing tools were not designed to work with the terabyte- to petabyte-sized data sets found in many science domains.

 

To address this problem a collaboration among Intel, the National Energy Research Scientific Computing Center, and Stanford University has been working to solve problems that arise when using deep learning techniques, a form of machine learning, on terabyte and petabyte data sets.

 

The team developed the first 15-petaflop deep-learning software. They demonstrated its scalability for data-intensive applications by executing a number of training runs using large scientific data sets. The runs used physics- and climate-based data sets on Cori, a supercomputer located at the National Energy Research Scientific Computing Center. They achieved a peak rate between 11.73 and 15.07 petaflops (single-precision) and an average sustained performance of 11.41 to 13.47 petaflops. (A petaflop is million billion calculations per second.)

 

Summit

Supercomputers like Summit have a different architecture, with specialized high-speed connections linking their thousands of processors into a single system that can work as a whole.

 

Designed by IBM and Nvidia, Summit is powered by the former’s Power 9 processors and the latter’s Tensor Core GPUs. The machine takes up an area about the size of two tennis courts at the DOE’s Oak Ridge National Lab in Tennessee and can draw up to 13MW of power when firing on all cylinders.

 

The new computing architecture developed for the system combines AI computing capabilities with traditional high-performance computing ones. That means it can be used to build scientific models and simulations while at the same time processing machine-learning workloads, greatly opening up the possibilities in supercomputer-assisted scientific research.

 

Summit is “the most powerful and the smartest supercomputer in the world,” Paresh Kharya, a product management and marketing director for accelerated computing at Nvidia, said “It’s also the world’s largest GPU-accelerated supercomputer.”

 

In late 2018, a climate research project run on Summit by the US government exceeded the scale of any corporate AI lab. It was. As part of project, the giant computer booted up a machine-learning experiment that ran faster than any before.

 

Summit used more than 27,000 powerful graphics processors in the project. It tapped their power to train deep-learning algorithms, the technology driving AI’s frontier, chewing through the exercise at a rate of a billion billion operations per second, a pace known in supercomputing circles as an exaflop.

 

Fittingly, the world’s most powerful computer’s AI workout was focused on one of the world’s largest problems: climate change. Tech companies train algorithms to recognize faces or road signs; the government scientists trained theirs to detect weather patterns like cyclones in the copious output from climate simulations that spool out a century’s worth of three-hour forecasts for Earth’s atmosphere.

 

The project demonstrates the scientific potential of adapting deep learning to supercomputers, which traditionally simulate physical and chemical processes such as nuclear explosions, black holes, or new materials. It also shows that machine learning can benefit from more computing power—if you can find it—boding well for future breakthroughs.

 

Summit is also being used for Disease and Addiction. Researchers will use AI to identify patterns in the function and evolution of human proteins and cellular systems. These patterns can help us better understand Alzheimer’s, heart disease, or addiction, and inform the drug discovery process.

 

World’s First and Fastest Exascale Supercomputer “Frontier” for the U.S. Department of Energy’s Oak Ridge National Laboratory, reported in May 2022

Frontier breaks exascale barrier, at 1.1 exaflops, faster than the next seven most powerful supercomputers in the world combined1

Frontier also ranked number one in a category, called mixed-precision computing, that rates performance in formats commonly used for artificial intelligence, with a performance of 6.88 exaflops. New supercomputer’s monumental performance marks a new era in artificial intelligence, scientific research, and innovation that will solve calculations up to 10X faster than today’s top supercomputers2, and tackle problems that are 8X more complex.

Additionally, the new supercomputer claimed the number one spot on the Green500 list as the world’s most energy efficient supercomputer with 52.23 gigaflops performance per watt, making it 32% more energy efficient compared to the previous number one system.3

 

China claims to run AI breakthrough on Sunway , reported in June 2022

In the latest series of claims and counterclaims, Chinese scientists recently claimed to have run an artificial intelligence program with architecture as complicated as the human brain. The AI model, dubbed “BaGuaLu” or “alchemist’s pot,” was run on the latest generation of the Sunway supercomputer at the National Supercomputing Center.

The report, citing Chinese scientists, said that the AI model with 174 trillion parameters could be used for several applications ranging from autonomous vehicles to scientific study. The data was presented in April during a virtual meeting of Principles and Practice of Parallel Programming 2022, an international conference held by the Association for Computing Machinery (ACM) in the United States.

The result places the Newest Generation Sunway supercomputer on the level with Frontier, the latest machine developed by the US Department of Energy.

 

Russia’s New ‘AI Supercomputer’

Russia’s latest supercomputer is unique in some ways — it is the country’s first to be devoted to “solving problems in the field of artificial intelligence” — but not in others. Like some other Russian defense and dual-use projects, the Zhores computer is built on Western technology.

 

Its development began in 2017 at the Skolkovo Institute of Science and Technology, an 8-year-old private institute founded in partnership with the Massachusetts Institute of Technology. In January, Skolkovo engineers told state media that Zhores had reached computing speeds of about one petaflop, making it Russia’s sixth-fastest supercomputer. (The country’s fastest supercomputer is about 50 times more powerful — and it’s ranked just 77th in the world.)

 

Plans call for boosting Zhores’ speed to 2 or 3 petaflops, officials said, improving its ability to tackle problems in “a wide range of interdisciplinary tasks at the interface of machine learning, data science and mathematical modeling in biomedicine, image processing, development and search for new drugs, photonics, predictive maintenance, and development of new x-ray and gamma radiation sources.” All this may give the new supercomputer a prominent place in the country’s ever-more-coordinated attempts to advance AI research.

 

Yet perhaps the most remarkable thing about Zhores — named for the Russian physicist who won his field’s Nobel Prize in 2000 — is how open its designers have been about their reliance on imported technology. “The system incorporates 26 nodes with the most powerful graphics accelerators with tensor cores today—each has four NVIDIA Tesla V100 cards,” TASS reported.

 

Benchmarking Supercomputers for AI and Machine learning

Twice a year, Top500.org publishes a ranking of raw computing power using a value called Rmax, derived from benchmark software called Linpack. The ranking of the top nine systems are unchanged from June, with Japan’s Supercomputer Fugaku on top at 442,010 trillion floating point operations per second.

 

But by another measure—one more related to AI, benchmark, called HPL-AI, measures a system’s performance using the lower-precision numbers—16-bits or less—common to neural network computing. Using that yardstick, Fugaku hits 2 exaflops (no change from June 2021) and Summit reaches 1.4 (a 23 percent increase).

 

MLCommons, the industry organization that’s been setting realistic tests for AI systems of all sizes. It released results from version 1.0 of its high-performance computing benchmarks, called MLPerf HPC, in Nov 2021. The suite of benchmarks measures the time it takes to train real scientific machine learning models to agreed-on quality targets.

 

Eight supercomputing centers took part, producing 30 benchmark results.

As in MLPerf’s other benchmarking efforts, there were two divisions: “Closed” submissions all used the same neural network model to ensure a more apples-to-apples comparison; “open” submissions were allowed to modify their models.

The three neural networks trialed were:

  • CosmoFlow uses the distribution of matter in telescope images to predict things about dark energy and other mysteries of the universe.
  • DeepCAM tests the detection of cyclones and other extreme weather in climate data.
  • OpenCatalyst, the newest benchmark, predicts the quantum mechanical properties of catalyst systems to discover and evaluate new catalyst materials for energy storage.

 

In the closed division, there were two ways of testing these networks: Strong scaling allowed participants to use as much of the supercomputer’s resources to achieve the fastest neural network training time. Because it’s not really practical to use an entire supercomputer-worth of CPUs, accelerator chips, and bandwidth resources on a single neural network, strong scaling shows what researchers think the optimal distribution of resources can do. Weak scaling, in contrast, breaks up the entire supercomputer into hundreds of identical versions of the same neural network to figure out what the system’s AI abilities are in total.

 

Here’s a selection of results:

Argonne National Laboratories used its Theta supercomputer to measure strong scaling for DeepCAM and OpenCatalyst. Using 32 CPUs and 129 Nvidia GPUs, Argonne researchers trained DeepCAM in 32.19 minutes and OpenCatalyst in 256.7 minutes. Argonne says it plans to use the results to develop better AI algorithms for two upcoming systems, Polaris and Aurora.

The Swiss National Supercomputing Centre used Piz Daint to train OpenCatalyst and DeepCAM. In the strong scaling category, Piz Daint trained OpenCatalyst in 753.11 minutes using 256 CPUs and 256 GPUs. It finished DeepCAM in 21.88 minutes using 1024 of each. The center will use the results to inform algorithms for its upcoming Alps supercomputer.

Fujitsu and RIKEN used 512 of Fugaku’s custom-made processors to perform CosmoFlow in 114 minutes. It then used half of the complete system—82,944 processors—to perform the weak scaling benchmark on the same neural network. That meant training 637 instances of CosmoFlow, which it managed to do at an average of 1.29 models per minutes for a total of 495.66 minutes (not quite 8 hours).

Helmholtz AI, a joint effort of Germany’s largest research centers, tested both the JUWELS and HoreKa supercomputers. HoreKa’s best effort was to chug through DeepCAM in 4.36 minutes using 256 CPUs and 512 GPUs. JUWELS did it in as little as 2.56 minutes using 1024 CPUs and 2048 GPUs. For CosmoFlow, its best effort was 16.73 minutes using 512 CPUs and 1024 GPUs. In the weak scaling benchmark JUWELS used 1536 CPUs and 3072 GPUs to plow through DeepCAM at rate of 0.76 models per minute.

Scientific supercomputing is not immune to the wave of machine learning that’s swept the tech world. Those using supercomputers to uncover the structure of the universe, discover new molecules, and predict the global climate are increasingly using neural networks to do so. And as is long-standing tradition in the field of high-performance computing, it’s all going to be measured down to the last floating-point operation.

Twice a year, Top500.org publishes a ranking of raw computing power using a value called Rmax, derived from benchmark software called Linpack. By that measure, it’s been a bit of a dull year. The ranking of the top nine systems are unchanged from June, with Japan’s Supercomputer Fugaku on top at 442,010 trillion floating point operations per second. That leaves the Fujitsu-built system a bit shy of the long-sought goal of exascale computing—one-million trillion 64-bit floating-point operations per second, or exaflops.

But by another measure—one more related to AI—Fugagku and its competitor the Summit supercomputer at Oak Ridge National Laboratory have already passed the exascale mark. That benchmark, called HPL-AI, measures a system’s performance using the lower-precision numbers—16-bits or less—common to neural network computing. Using that yardstick, Fugaku hits 2 exaflops (no change from June 2021) and Summit reaches 1.4 (a 23 percent increase).

By one benchmark, related to AI, Japan’s Fugaku and the U.S.’s Summit supercomputers are already doing exascale computing.

But HPL-AI isn’t really how AI is done in supercomputers today. Enter MLCommons, the industry organization that’s been setting realistic tests for AI systems of all sizes. It released results from version 1.0 of its high-performance computing benchmarks, called MLPerf HPC, this week.

The suite of benchmarks measures the time it takes to train real scientific machine learning models to agreed-on quality targets. Compared to MLPerf HPC version 0.7, basically a warmup round from last year, the best results in version 1.0 showed a 4- to 7-fold improvement. Eight supercomputing centers took part, producing 30 benchmark results.

As in MLPerf’s other benchmarking efforts, there were two divisions: “Closed” submissions all used the same neural network model to ensure a more apples-to-apples comparison; “open” submissions were allowed to modify their models.

The three neural networks trialed were:

  • CosmoFlow uses the distribution of matter in telescope images to predict things about dark energy and other mysteries of the universe.
  • DeepCAM tests the detection of cyclones and other extreme weather in climate data.
  • OpenCatalyst, the newest benchmark, predicts the quantum mechanical properties of catalyst systems to discover and evaluate new catalyst materials for energy storage.

In the closed division, there were two ways of testing these networks: Strong scaling allowed participants to use as much of the supercomputer’s resources to achieve the fastest neural network training time. Because it’s not really practical to use an entire supercomputer-worth of CPUs, accelerator chips, and bandwidth resources on a single neural network, strong scaling shows what researchers think the optimal distribution of resources can do. Weak scaling, in contrast, breaks up the entire supercomputer into hundreds of identical versions of the same neural network to figure out what the system’s AI abilities are in total.

Here’s a selection of results:

Argonne National Laboratories used its Theta supercomputer to measure strong scaling for DeepCAM and OpenCatalyst. Using 32 CPUs and 129 Nvidia GPUs, Argonne researchers trained DeepCAM in 32.19 minutes and OpenCatalyst in 256.7 minutes. Argonne says it plans to use the results to develop better AI algorithms for two upcoming systems, Polaris and Aurora.

The Swiss National Supercomputing Centre used Piz Daint to train OpenCatalyst and DeepCAM. In the strong scaling category, Piz Daint trained OpenCatalyst in 753.11 minutes using 256 CPUs and 256 GPUs. It finished DeepCAM in 21.88 minutes using 1024 of each. The center will use the results to inform algorithms for its upcoming Alps supercomputer.

Fujitsu and RIKEN used 512 of Fugaku’s custom-made processors to perform CosmoFlow in 114 minutes. It then used half of the complete system—82,944 processors—to perform the weak scaling benchmark on the same neural network. That meant training 637 instances of CosmoFlow, which it managed to do at an average of 1.29 models per minutes for a total of 495.66 minutes (not quite 8 hours).

 

Helmholtz AI, a joint effort of Germany’s largest research centers, tested both the JUWELS and HoreKa supercomputers. HoreKa’s best effort was to chug through DeepCAM in 4.36 minutes using 256 CPUs and 512 GPUs. JUWELS did it in as little as 2.56 minutes using 1024 CPUs and 2048 GPUs. For CosmoFlow, its best effort was 16.73 minutes using 512 CPUs and 1024 GPUs. In the weak scaling benchmark JUWELS used 1536 CPUs and 3072 GPUs to plow through DeepCAM at rate of 0.76 models per minute.

Lawrence Berkeley National Laboratory used the Perlmutter supercomputer to conquer CosmoFlow in 8.5 minutes (256 CPUs and 1024 GPUs), DeepCAM in 2.51 minutes (512 CPUs and 2048 GPUs), and OpenCatalyst in 111.86 minutes (128 CPUs and 512 GPUs). It used 1280 CPUs and 5120 GPUs for the weak scaling effort, yielding 0.68 models per minute for CosmoFlow and 2.06 models per minute for DeepCAM.

 

The (U.S.) National Center for Supercomputing Applications did its benchmarks on the Hardware Accelerated Learning (HAL) system. Using 32 CPUs and 64 GPUs they trained OpenCatalyst in 1021.18 minutes and DeepCAM in 133.91 minutes.

Nvidia, which made the GPUs used in every entry except Riken’s, tested its DGX A100 systems on CosmoFlow (8.04 minutes using 256 CPUs and 1024 GPUs) and DeepCAM (1.67 minutes with 512 CPUs and 2048 GPUs). In weak scaling the system was made up of 1024 CPUs and 4096 GPUs and it plowed through 0.73 CosmoFlow models per minute and 5.27 DeepCAM models per minute.

Texas Advanced Computing Center’s Frontera-Longhorn system tackled CosmoFlow in 140.45 minutes and DeepCAM in 76.9 minutes using 64 CPUs and 128 GPUs.

 

 

References and resources also include:

https://spectrum.ieee.org/ai-supercomputer

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Cite This Article

 
International Defense Security & Technology (October 5, 2022) Supercomputers optimized for Artificial Intelligence and Machine Learning applications. Retrieved from https://idstch.com/geopolitics/supercomputers-optimized-for-artificial-intelligence-and-machine-learning-applications/.
"Supercomputers optimized for Artificial Intelligence and Machine Learning applications." International Defense Security & Technology - October 5, 2022, https://idstch.com/geopolitics/supercomputers-optimized-for-artificial-intelligence-and-machine-learning-applications/
International Defense Security & Technology September 16, 2022 Supercomputers optimized for Artificial Intelligence and Machine Learning applications., viewed October 5, 2022,<https://idstch.com/geopolitics/supercomputers-optimized-for-artificial-intelligence-and-machine-learning-applications/>
International Defense Security & Technology - Supercomputers optimized for Artificial Intelligence and Machine Learning applications. [Internet]. [Accessed October 5, 2022]. Available from: https://idstch.com/geopolitics/supercomputers-optimized-for-artificial-intelligence-and-machine-learning-applications/
"Supercomputers optimized for Artificial Intelligence and Machine Learning applications." International Defense Security & Technology - Accessed October 5, 2022. https://idstch.com/geopolitics/supercomputers-optimized-for-artificial-intelligence-and-machine-learning-applications/
"Supercomputers optimized for Artificial Intelligence and Machine Learning applications." International Defense Security & Technology [Online]. Available: https://idstch.com/geopolitics/supercomputers-optimized-for-artificial-intelligence-and-machine-learning-applications/. [Accessed: October 5, 2022]

About Rajesh Uppal

Check Also

Software Design and development of Small Satellites

Traditionally, all space missions have two main segments, one is the Space Segment which is …

error: Content is protected !!