Computational biology is a branch of biology that aims to better understand and model biological structures and processes by utilizing computers and computer science. It entails the application of computer methods (such as algorithms) to the representation and simulation of biological systems and the large-scale analysis of experimental data.
Computational biology is a branch of science that uses computers to understand the structures as well as models of structures and processes of life. Computational biology involves the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, ecological, behavioral, and social systems. It aims to develop and use visual simulations in order to assess the complexity of biological systems. This is accomplished through the use of specialized algorithms, and visualization software. The method involves computational methods, such as algorithms for the representation and simulation of biological systems and for the interpretation of experimental data, often on a very large scale.
These models allow for prediction of how systems will react under different environments. This is useful for determining if a system is robust. A robust biological system is one that “maintain their state and functions against external and internal perturbations”, which is essential for a biological system to survive. Computational biology has been used to help sequence the human genome, create accurate models of the human brain, and assist in modeling biological systems.
Computational biomodeling generates a large archive of such data, allowing for analysis from multiple users. While current techniques focus on small biological systems, researchers are working on approaches that will allow for larger networks to be analyzed and modeled. A majority of researchers believe that this will be essential in developing modern medical approaches to creating new drugs and gene therapy. A useful modelling approach is to use Petri nets via tools such as esyN
Computational genomics is a field within genomics which studies the genomes of cells and organisms. It is sometimes referred to as Computational and Statistical Genetics and encompasses much of Bioinformatics. The Human Genome Project is one example of computational genomics. This project looks to sequence the entire human genome into a set of data. Once fully implemented, this could allow for doctors to analyze the genome of an individual patient. This opens the possibility of personalized medicine, prescribing treatments based on an individual’s pre-existing genetic patterns. This project has created many similar programs. Researchers are looking to sequence the genomes of animals, plants, bacteria, and all other types of life
Computational neuroscience is the study of brain function in terms of the information processing properties of the structures that make up the nervous system. It is a subset of the field of neuroscience, and looks to analyze brain data to create practical applications. It looks to model the brain in order to examine specific types aspects of the neurological system. Various types of models of the brain include:
Realistic Brain Models: These models look to represent every aspect of the brain, including as much detail at the cellular level as possible. Realistic models provide the most information about the brain, but also have the largest margin for error. More variables in a brain model create the possibility for more error to occur. These models do not account for parts of the cellular structure that scientists do not know about. Realistic brain models are the most computationally heavy and the most expensive to implement.
Simplifying Brain Models: These models look to limit the scope of a model in order to assess a specific physical property of the neurological system. This allows for the intensive computational problems to be solved, and reduces the amount of potential error from a realistic brain model
Experimental data about the coronavirus activity on molecular level is very limited and have been produced in vitro. For example, the viral protein structure corresponds to the crystallized protein and not to a virus in solution. Moreover, there is not enough experimental data on complexes between virus and human proteins or virus proteins and potential drugs. On the other hand, supercomputer calculations can give all the structural data and the details of binding process. Therefore, the computing part is critically important, as well as subsequent experimental verification.
As a result of the blending of biology and computer technology, people’s conceptions of their place in the world as modern humans have developed. Researchers could get new insights into human evolution and genetic and anatomical data from the computer. Data exchange, storage, and high-performance computing have all been developed to make working with large data volumes easier. Pharmaceutical and biotech companies are always searching for new drugs and techniques to understand better how proteins interact.
Furthermore, the advancement and deployment of advanced analytics, robotics, and automation in pharmaceutical medication production are one of the market’s promising trends. Automation, digitization, and robots have combined to create highly adaptable innovative laboratories capable of performing routine and high-volume investigations and highly specialized testing at a fair cost. Laboratory automation is intended to benefit analytical testing, which is likely to benefit significantly from increased levels of automation.
The coronavirus pandemic in 2020 threatens lives of many people and hinders economic and social activity in multiple countries all over the world. As a result, it attracted significant attention of many research groups. Finding treatments to prevent and mitigate negative impact of COVID-19 is the highest priority in the scientific community now. Developing drugs to mitigate the disease and reduce the risk of the severe complications is one of the most important tasks before coronavirus vaccine will be widely adopted. Computer simulations deliver valuable information on the viral activity on atomic level and they can be used to predict the efficiency of potential drugs. Such calculations are extremely demanding and can be done only with the most powerful supercomputers. HPC systems are widely used in simulations of biochemical processes. The simulations help to reduce the number of experiments that would otherwise be needed to get same results. Leading global pharmaceutical and research centers use molecular modeling at the initial steps of drug development, when a massive number of chemical substances have to be investigated for specific activity.
An international research team of scientists – from Russia, Finland, Italy, China, Japan and Canada is using a recently upgraded HPC system at the Joint Supercomputer Center of the Russian Academy of Sciences to develop diagnostics and treatment against COVID-19 coronavirus infection that became the cause of global pandemic.
Rapid global spread of COVID-19 coronavirus infection pandemic has shown that there are no clear global emergency response plans against threats to humankind caused by new viruses,” said said Anna Kichkailo, Head of Laboratory For Digital Controlled Drugs and Theranostics at the Krasnoyarsk Federal Science Center. “One of the obvious shortcomings is the lack of technologies for quick development of medicines for diagnostics and therapy. To help solving this problem, an international team of scientists – from Russia, Finland, Italy, China, Japan and Canada – was formed. We all have different competences, knowledge, skills and resources. Our geographically distributed team includes virologists, biologists, chemists, mathematicians and physical scientists. The international cooperation is extremely important to achieve quick progress and rapidly react to the ever-changing situation with global COVID-19 pandemic. We hope that our research will actually help to fight spread of such infections.”
The White House has announced the launch of the Covid-19 High Performance Computing Consortium, a collaboration among various industry, government, and academic institutions which will aim to make their supercomputing resources available to the wider research community, in an effort to speed up the search for solutions to the evolving Covid-19 pandemic. Fighting COVID-19 will require extensive research in areas like bioinformatics, epidemiology, and molecular modeling to understand the threat we’re facing and form strategies to address it. This work demands a massive amount of computational capacity. The COVID-19 High Performance Computing Consortium helps aggregate computing capabilities from the world’s most powerful and advanced computers to help COVID-19 researchers execute complex computational research programs to help fight the virus.”
Key government partners so far include Argonne National Laboratory, Lawrence Livermore National Laboratory, Los Alamos National Laboratory, Oak Ridge National Laboratory, Sandia National Laboratories, National Science Foundation, and NASA. Among industry partners are IBM, HPE, Amazon Web Services, Google Cloud, and Microsoft. A few examples from academia include MIT, Rensselaer Polytechnic Institute, University of Chicago, and Northwestern University. IBM is also hosting a central portal. In fact, several COVID-19 projects are already underway. Dabbar referenced an Oak Ridge National Lab project in which researchers explored 8000 compounds of interest narrowing that to 77 promising small molecule drug compounds. Not surprisingly the early COVID-19 drug research is focused on already approved drugs (~10,000) because they have already passed safety hurdles and more is known about them.
DoE and other government agencies already have aggressive computational life science projects. The CANDLE project being run by the National Cancer Institute is a good example. It’s focused on building machine learning tools for use in cancer research. There’s also the ATOM (Accelerating Therapeutics for Opportunities in Medicine) project at LLNL. Both CANDLE and ATOM are pivoting efforts toward COVID-19.
Military needs
The military needs and requirements in biomedical informatics exceed the ones in the civilian setting. The military health system supports a continuum of care starting at the point of injury in the battlefield (echelon I)—where resources are scarce and the “working” environment is dynamic and unknown in advance—all the way to Continental United States (CONUS)-based military and civilian hospitals (echelon V), resulting in additional requirements and system functionalities unequaled in the civilian environment, write Researchers from Telemedicine and Advanced Technology Research Center, Fort Detrick.
For example, the capability to perform remote life-sign detection of trauma casualties through nearly undetectable wireless networks, which would help reduce morbidity and mortality of wounded soldiers in the battlefield, may have no parallel in the civilian setting. Furthermore, the combat medic (who is the “doctor” in the battlefield) has limited medical training, carries limited resources, and works in an unknown, often hostile, environment. Hence, additional medical informatics technologies addressing specific requirements and possessing unique functionalities are clearly needed for the military field environment and for linkage to upper echelons of care.
The Authors have identified research needs structured into the following four technology areas: data capture, data integration and representation, improved decision support tools, and patient/provider access to data/knowledge. These four areas represent the natural progression of patient information flow in hospital and clinical settings—starting with the capture of a patient’s records and ending with the access to structured data, clinical analysis, and medical knowledge by patients and providers.
U.S. Army Medical Research and Materiel Command (USAMRMC) Biomedical Informatics Roadmap has developed a strategic plan in four focus areas: Hospital and Clinical Informatics, E-Health, Combat Health Informatics, and Bioinformatics and Biomedical Computation.
Supercomputers
The scale at which data are generated today was, until recently, unimaginable. For instance, a decade ago, sequencing the human genome took eight years, thousands of researchers and about $1 billion. Now, we can do it in a few days at a cost approaching $1,000: Three billion base pairs—the entire genetic code each human harbours—delivered in the same time, and even the same cost, as some packages from Amazon Prime. The ease with which we can acquire biological information is enabling some of the most exciting advances of our time, from precision cancer medicines to a deeper understanding of how the bacteria in our gut keep us healthy. But before any of that can happen, trillions upon trillions of data points must be processed into meaningful information by powerful supercomputers, like the ones available at UT Austin.
Supercomputing—or High Performance Computing—is the process of aggregating computing power to generate performance that far exceeds that of your average desktop computer. “In the life sciences, none of the recent technological advances would have been possible without it,” says Hans Hofmann, Director of the Center for Computational Biology and Bioinformatics, which is hosting today’s conference. “These are calculations that can’t be done on your laptop or a piece of paper. At least not in a human’s lifetime.”
“When people think about supercomputers they usually think about CPU time,” says Ben Liebeskind, a graduate student in the Ecology, Evolution and Behavior program. “But storage is a huge deal, especially for all that data being produced, and you need supercomputers for that. Otherwise, all that knowledge is wasted.” Liebeskind’s graduate research relied predominantly on genetic information created by other people and made publically available on GenBank—a giant world repository of DNA sequence data.
For many, the fundamental benefit of high-performance computing comes down to speed. An analysis that takes months on a laptop can be run in a day on a supercomputer, and in some situations, like developing a safe and effective vaccine for Ebola, this can make all the difference. In early 2015, mathematical biologists at UT Austin began helping the U.S. Centers for Disease Control and Prevention (CDC) develop trials for an Ebola vaccine in Sierra Leone. Using statistical tests, they aimed to determine which trial design, out of two possible options, would give the best chance of finding a vaccine that would prevent another devastating outbreak.
“At the time, Ebola cases in West Africa were starting to decline, which was obviously very good,” says Spencer Fox, a graduate student working on the project. “But in terms of a vaccine trial, this is an issue because there needs to be enough cases to really test if a vaccine is going to work. That’s why we had to deliver results to the CDC before it was too late in the epidemic.”
To see how effective a trial design was at detecting whether a vaccine did or did not work, the scientists fit about 500 million models, which took the equivalent of 250 days of computing time on one node on TACC. But by running these on hundreds of nodes simultaneously—known as parallelization—the analyses were done overnight. Within two weeks, the team delivered results to the CDC and a recommendation about which method to use in conducting the vaccine trials. High-performance computing, says Fox, was key in allowing them to respond to the rapidly changing conditions and ultimately make a difference to human health. “Otherwise, the epidemic would have been over before we were finished.”
As our ability to sequence genomes and analyze them improves, some researchers are able to fill in missing pieces of older puzzles, like figuring out how our genes influence whether we become addicted to alcohol. “Before supercomputers, we were limited to looking at one gene at a time, even though we knew that thousands have some importance to alcohol addiction,” says Sean Farris, a postdoctoral fellow at The University of Texas at Austin’s Waggoner Center for Alcohol and Addiction Research. In the end, Farris says, that’s the promise of supercomputers. “They are allowing us to delve deeper and ask bigger questions than we ever knew possible.”
Computational Biology market emerging trends to 2027
The global computational biology market was worth USD 5,350 million in 2021, and it is projected to be worth USD 27,425 million by 2030, registering a CAGR of 19.9% during the forecast period (2022–2030).
The global computational biology market is predicted to rise due to the increase in bioinformatics research, an increase in the number of clinical studies in pharmacogenomics and pharmacokinetics, and the growth of drug designing and drug modeling. Moreover, various technological advancements in drug development is anticipated to offer significant growth opportunities in the market.
The demand for bioinformatics tools and skills has increased as genome sequencing programs have resulted in the exponential growth of sequence databases. Data mining, also termed data discovery, and extensive data analysis for searching trends in data have risen in prominence over time, assisting in extracting intriguing, nontrivial, implicit, previously unknown, and potentially relevant information from data. As a result, the development of computational biology facilities in developing countries assists in market expansion.
The global computational biology market is segmented on the basis of tools and application. Based on tools, the market is segmented as, analysis software & services, hardware and databases. Similarly, based on application, the market is categorized as, cellular & biological simulation, drug discovery and disease modelling, preclinical drug development, clinical trials, and other applications.
By application, the global computational biology market is segmented into Cellular and Biological Simulation, Drug Discovery and Disease Modelling, Preclinical Drug Development, Clinical Trials, and Human Body Simulation Software. Drug Discovery and Disease Modelling is the largest shareholder, and it is estimated to grow at a CAGR of 19.6%. It is further segmented into Target Identification, Target Validation, Lead Discovery, and Lead Optimization. Within Drug Discovery and Disease Modelling, target identification dominated the market. The increase in computational methods for the target identification stage in the drug discovery process is expected to boost segment growth in the near future.
By tool, the computational biology market is segmented into Databases, Infrastructure (Hardware), and Analysis Software and Services. The Database segment accounts for the largest market share, and it is projected to register the highest CAGR of 19.3% during the forecast period, accounting for USD 9,830 million. The immense size of biological databases provides a resource to answer biological questions about molecular modeling, molecular evolution, mapping, and gene expression patterns and help in the structural-based design of therapeutic drugs. Thus, the increased use of databases for various research applications would help boost the market growth and demand for the database market.
By service, the global computational biology market is segmented into In-House and Contract. The contract is the most dominant and the fastest-growing segment. It is expected to reach a value of USD 18,850 million by 2030 at a CAGR of 19.6%. The price for contractual software is more reasonable than in-house software development as outsourcing development teams have more experience in different fields.
By end-user, the global computational biology market is segmented into Academics and Industrial and Commercial. The industrial and commercial segment is the largest and the fastest-growing segment. It is estimated to reach an expected value of USD 14,215 million by 2030. Computational biology databases and solutions are increasingly being used in industries to analyze different data sets to produce various biological products, which is expected to boost the market growth during the forecast.
North America holds the most commanding share in the regional market. It is estimated to grow at a CAGR of 19.7%. The computational biology market is expected to see rapid growth due to increased bioinformatics research, an increasing number of pharmacogenomics and pharmacokinetics studies, and drug discovery and disease modeling processes. An increase in investments by major companies and a higher number of product launches are also expected to boost the market growth.
Europe is the second-largest region. It is estimated to reach a value of USD 7,060 million by 2030. The computational biology market in Germany is projected to grow due to the increasing focus on drug discovery, increasing numbers of clinical trials and increasing research and development expenditure, and rising German healthcare expenditure. Hence, the German market is expected to witness steady market growth.
Asia-Pacific is the fastest-growing region. It is expected grow at a CAGR of 20.6%. The China computational biology market is expanding due to increased bioinformatics research, increased clinical studies in pharmacogenomics and pharmacokinetics, and growth in drug design and disease modeling.
Key Players are Certara, Chemical Computing Group, Compugen Ltd., Dassault Systèmes, Genedata AG, Insilico Biotechnology AG, Leadscope, Inc., Nimbus Therapeutics. Rosa & Co. LLC, Genedata AG, Insilico Biotechnology AG, Instem Plc. (Leadscope Inc.), The key market players are Nimbus Discovery LLC, Strand Life Sciences, Schrodinger, Simulation Plus Inc.
References and Resources also include: