The coronavirus pandemic in 2020 threatens lives of many people and hinders economic and social activity in multiple countries all over the world. As a result, it attracted significant attention of many research groups. Finding treatments to prevent and mitigate negative impact of COVID-19 is the highest priority in the scientific community now. Developing drugs to mitigate the disease and reduce the risk of the severe complications is one of the most important tasks before coronavirus vaccine will be widely adopted. Computer simulations deliver valuable information on the viral activity on atomic level and they can be used to predict the efficiency of potential drugs. Such calculations are extremely demanding and can be done only with the most powerful supercomputers. HPC systems are widely used in simulations of biochemical processes. The simulations help to reduce the number of experiments that would otherwise be needed to get same results. Leading global pharmaceutical and research centers use molecular modeling at the initial steps of drug development, when a massive number of chemical substances have to be investigated for specific activity.
An international research team of scientists – from Russia, Finland, Italy, China, Japan and Canada is using a recently upgraded HPC system at the Joint Supercomputer Center of the Russian Academy of Sciences to develop diagnostics and treatment against COVID-19 coronavirus infection that became the cause of global pandemic.
Rapid global spread of COVID-19 coronavirus infection pandemic has shown that there are no clear global emergency response plans against threats to humankind caused by new viruses,” said said Anna Kichkailo, Head of Laboratory For Digital Controlled Drugs and Theranostics at the Krasnoyarsk Federal Science Center. “One of the obvious shortcomings is the lack of technologies for quick development of medicines for diagnostics and therapy. To help solving this problem, an international team of scientists – from Russia, Finland, Italy, China, Japan and Canada – was formed. We all have different competences, knowledge, skills and resources. Our geographically distributed team includes virologists, biologists, chemists, mathematicians and physical scientists. The international cooperation is extremely important to achieve quick progress and rapidly react to the ever-changing situation with global COVID-19 pandemic. We hope that our research will actually help to fight spread of such infections.”
The White House has announced the launch of the Covid-19 High Performance Computing Consortium, a collaboration among various industry, government, and academic institutions which will aim to make their supercomputing resources available to the wider research community, in an effort to speed up the search for solutions to the evolving Covid-19 pandemic. Fighting COVID-19 will require extensive research in areas like bioinformatics, epidemiology, and molecular modeling to understand the threat we’re facing and form strategies to address it. This work demands a massive amount of computational capacity. The COVID-19 High Performance Computing Consortium helps aggregate computing capabilities from the world’s most powerful and advanced computers to help COVID-19 researchers execute complex computational research programs to help fight the virus.”
Key government partners so far include Argonne National Laboratory, Lawrence Livermore National Laboratory, Los Alamos National Laboratory, Oak Ridge National Laboratory, Sandia National Laboratories, National Science Foundation, and NASA. Among industry partners are IBM, HPE, Amazon Web Services, Google Cloud, and Microsoft. A few examples from academia include MIT, Rensselaer Polytechnic Institute, University of Chicago, and Northwestern University. IBM is also hosting a central portal. In fact, several COVID-19 projects are already underway. Dabbar referenced an Oak Ridge National Lab project in which researchers explored 8000 compounds of interest narrowing that to 77 promising small molecule drug compounds. Not surprisingly the early COVID-19 drug research is focused on already approved drugs (~10,000) because they have already passed safety hurdles and more is known about them.
DoE and other government agencies already have aggressive computational life science projects. The CANDLE project being run by the National Cancer Institute is a good example. It’s focused on building machine learning tools for use in cancer research. There’s also the ATOM (Accelerating Therapeutics for Opportunities in Medicine) project at LLNL. Both CANDLE and ATOM are pivoting efforts toward COVID-19.
Computational biology is a branch of science that uses computers to understand the structures as well as models of structures and processes of life. Computational biology involves the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, ecological, behavioral, and social systems. It aims to develop and use visual simulations in order to assess the complexity of biological systems. This is accomplished through the use of specialized algorithms, and visualization software. The method involves computational methods, such as algorithms for the representation and simulation of biological systems and for the interpretation of experimental data, often on a very large scale.
These models allow for prediction of how systems will react under different environments. This is useful for determining if a system is robust. A robust biological system is one that “maintain their state and functions against external and internal perturbations”, which is essential for a biological system to survive. Computational biology has been used to help sequence the human genome, create accurate models of the human brain, and assist in modeling biological systems.
Computational biomodeling generates a large archive of such data, allowing for analysis from multiple users. While current techniques focus on small biological systems, researchers are working on approaches that will allow for larger networks to be analyzed and modeled. A majority of researchers believe that this will be essential in developing modern medical approaches to creating new drugs and gene therapy. A useful modelling approach is to use Petri nets via tools such as esyN
Computational genomics is a field within genomics which studies the genomes of cells and organisms. It is sometimes referred to as Computational and Statistical Genetics and encompasses much of Bioinformatics. The Human Genome Project is one example of computational genomics. This project looks to sequence the entire human genome into a set of data. Once fully implemented, this could allow for doctors to analyze the genome of an individual patient. This opens the possibility of personalized medicine, prescribing treatments based on an individual’s pre-existing genetic patterns. This project has created many similar programs. Researchers are looking to sequence the genomes of animals, plants, bacteria, and all other types of life
Computational neuroscience is the study of brain function in terms of the information processing properties of the structures that make up the nervous system. It is a subset of the field of neuroscience, and looks to analyze brain data to create practical applications. It looks to model the brain in order to examine specific types aspects of the neurological system. Various types of models of the brain include:
Realistic Brain Models: These models look to represent every aspect of the brain, including as much detail at the cellular level as possible. Realistic models provide the most information about the brain, but also have the largest margin for error. More variables in a brain model create the possibility for more error to occur. These models do not account for parts of the cellular structure that scientists do not know about. Realistic brain models are the most computationally heavy and the most expensive to implement.
Simplifying Brain Models: These models look to limit the scope of a model in order to assess a specific physical property of the neurological system. This allows for the intensive computational problems to be solved, and reduces the amount of potential error from a realistic brain model
Experimental data about the coronavirus activity on molecular level is very limited and have been produced in vitro. For example, the viral protein structure corresponds to the crystallized protein and not to a virus in solution. Moreover, there is not enough experimental data on complexes between virus and human proteins or virus proteins and potential drugs. On the other hand, supercomputer calculations can give all the structural data and the details of binding process. Therefore, the computing part is critically important, as well as subsequent experimental verification.
The military needs and requirements in biomedical informatics exceed the ones in the civilian setting. The military health system supports a continuum of care starting at the point of injury in the battlefield (echelon I)—where resources are scarce and the “working” environment is dynamic and unknown in advance—all the way to Continental United States (CONUS)-based military and civilian hospitals (echelon V), resulting in additional requirements and system functionalities unequaled in the civilian environment, write Researchers from Telemedicine and Advanced Technology Research Center, Fort Detrick.
For example, the capability to perform remote life-sign detection of trauma casualties through nearly undetectable wireless networks, which would help reduce morbidity and mortality of wounded soldiers in the battlefield, may have no parallel in the civilian setting. Furthermore, the combat medic (who is the “doctor” in the battlefield) has limited medical training, carries limited resources, and works in an unknown, often hostile, environment. Hence, additional medical informatics technologies addressing specific requirements and possessing unique functionalities are clearly needed for the military field environment and for linkage to upper echelons of care.
The Authors have identified research needs structured into the following four technology areas: data capture, data integration and representation, improved decision support tools, and patient/provider access to data/knowledge. These four areas represent the natural progression of patient information flow in hospital and clinical settings—starting with the capture of a patient’s records and ending with the access to structured data, clinical analysis, and medical knowledge by patients and providers.
U.S. Army Medical Research and Materiel Command (USAMRMC) Biomedical Informatics Roadmap has developed a strategic plan in four focus areas: Hospital and Clinical Informatics, E-Health, Combat Health Informatics, and Bioinformatics and Biomedical Computation.
The scale at which data are generated today was, until recently, unimaginable. For instance, a decade ago, sequencing the human genome took eight years, thousands of researchers and about $1 billion. Now, we can do it in a few days at a cost approaching $1,000: Three billion base pairs—the entire genetic code each human harbours—delivered in the same time, and even the same cost, as some packages from Amazon Prime. The ease with which we can acquire biological information is enabling some of the most exciting advances of our time, from precision cancer medicines to a deeper understanding of how the bacteria in our gut keep us healthy. But before any of that can happen, trillions upon trillions of data points must be processed into meaningful information by powerful supercomputers, like the ones available at UT Austin.
Supercomputing—or High Performance Computing—is the process of aggregating computing power to generate performance that far exceeds that of your average desktop computer. “In the life sciences, none of the recent technological advances would have been possible without it,” says Hans Hofmann, Director of the Center for Computational Biology and Bioinformatics, which is hosting today’s conference. “These are calculations that can’t be done on your laptop or a piece of paper. At least not in a human’s lifetime.”
“When people think about supercomputers they usually think about CPU time,” says Ben Liebeskind, a graduate student in the Ecology, Evolution and Behavior program. “But storage is a huge deal, especially for all that data being produced, and you need supercomputers for that. Otherwise, all that knowledge is wasted.” Liebeskind’s graduate research relied predominantly on genetic information created by other people and made publically available on GenBank—a giant world repository of DNA sequence data.
For many, the fundamental benefit of high-performance computing comes down to speed. An analysis that takes months on a laptop can be run in a day on a supercomputer, and in some situations, like developing a safe and effective vaccine for Ebola, this can make all the difference. In early 2015, mathematical biologists at UT Austin began helping the U.S. Centers for Disease Control and Prevention (CDC) develop trials for an Ebola vaccine in Sierra Leone. Using statistical tests, they aimed to determine which trial design, out of two possible options, would give the best chance of finding a vaccine that would prevent another devastating outbreak.
“At the time, Ebola cases in West Africa were starting to decline, which was obviously very good,” says Spencer Fox, a graduate student working on the project. “But in terms of a vaccine trial, this is an issue because there needs to be enough cases to really test if a vaccine is going to work. That’s why we had to deliver results to the CDC before it was too late in the epidemic.”
To see how effective a trial design was at detecting whether a vaccine did or did not work, the scientists fit about 500 million models, which took the equivalent of 250 days of computing time on one node on TACC. But by running these on hundreds of nodes simultaneously—known as parallelization—the analyses were done overnight. Within two weeks, the team delivered results to the CDC and a recommendation about which method to use in conducting the vaccine trials. High-performance computing, says Fox, was key in allowing them to respond to the rapidly changing conditions and ultimately make a difference to human health. “Otherwise, the epidemic would have been over before we were finished.”
As our ability to sequence genomes and analyze them improves, some researchers are able to fill in missing pieces of older puzzles, like figuring out how our genes influence whether we become addicted to alcohol. “Before supercomputers, we were limited to looking at one gene at a time, even though we knew that thousands have some importance to alcohol addiction,” says Sean Farris, a postdoctoral fellow at The University of Texas at Austin’s Waggoner Center for Alcohol and Addiction Research. In the end, Farris says, that’s the promise of supercomputers. “They are allowing us to delve deeper and ask bigger questions than we ever knew possible.”
Software and tools
Computational Biologists use a wide range of software. These range from command line programs to graphical and web-based programs.
Open source software Open source software provides a platform to develop computational biological methods. Specifically, open source means that every person and/or entity can access and benefit from software developed in research. PLOS cites four main reasons for the use of open source software including:
Reproducibility: This allows for researchers to use the exact methods used to calculate the relations between biological data.
Faster Development: developers and researchers do not have to reinvent existing code for minor tasks. Instead they can use pre-existing programs to save time on the development and implementation of larger projects.
Increased quality: Having input from multiple researchers studying the same topic provides a layer of assurance that errors will not be in the code.
Long-term availability: Open source programs are not tied to any businesses or patents. This allows for them to be posted to multiple web pages and ensure that they are available in the future.
Computational Biology market emerging trends to 2027
Medical Market Research has released the “Global Computational Biology Market Analysis to 2027,” a specialized and in-depth study of the medical device industry with a focus on the global market trend. The report aims to provide an overview of global market with detailed market segmentation by tools, application, and geography.
The global market is expected to witness high growth during the forecast period. The report provides key statistics on the market status of the leading computational biology market players and offers key trends and opportunities in the market.
The global computational biology market is segmented on the basis of tools and application. Based on tools, the market is segmented as, analysis software & services, hardware and databases. Similarly, based on application, the market is categorized as, cellular & biological simulation, drug discovery and disease modelling, preclinical drug development, clinical trials, and other applications.
Rise in the number of clinical studies in the field of pharmacogenomics and the rise in number of clinical trials are expected to fuel the growth of the computational biology market during the forecast period. Moreover, various technological advancements in drug development is anticipated to offer significant growth opportunities in the market.
North America is expected to contribute to the largest share in the computational biology market in the coming years, owing to rising investments made in R&D activities for the discovery as well as the development of novel drugs. Also, Asia Pacific is anticipated to witness steady growth during the forecast period, due to increasing clinical trials in the region.
Key Players are Certara, Chemical Computing Group, Compugen Ltd., Dassault Systèmes, Genedata AG, Insilico Biotechnology AG, Leadscope, Inc., Nimbus Therapeutics.