Computational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships.
Computational biology is the science that answers the question “How can we learn and use models of biological systems constructed from experimental measurements?” These models may describe what biological tasks are carried out by particular nucleic acid or peptide sequences, which gene (or genes) when expressed produce a particular phenotype or behavior, what sequence of changes in gene or protein expression or localization lead to a particular disease, and how changes in cell organization influence cell behavior, explains Robert F. Murphy, Ray and Stephanie Lane Professor of Computational Biology Emeritus.
This field is sometimes referred to as bioinformatics, but many scientists use the latter term to describe the field that answers the question “How can I efficiently store, annotate, search and compare information from biological measurements and observations?” In any case, the two fields are closely linked, since “bioinformatics” systems typically are needed to provide data to “computational biology” systems that create models, and the results of those models are often returned for storage in “bioinformatics” databases.
Computational biology is a very broad discipline, in that it seeks to build models for diverse types of experimental data (e.g., concentrations, sequences, images, etc.) and biological systems (e.g., molecules, cells, tissues, organs, etc.), and that it uses methods from a wide range of mathematical and computational fields (e.g., complexity theory, algorithmics, machine learning, robotics, etc.).
Perhaps the most important task that computational biologists carry out (and that training in computational biology should equip prospective computational biologists to do) is to frame biomedical problems as computational problems. This often means looking at a biological system in a new way, challenging current assumptions or theories about the relationships between parts of the system, or integrating different sources of information to make a more comprehensive model than had been attempted before. In this context, it is worth noting that the primary goal need not be to increase human understanding of the system; even small biological systems can be sufficiently complex that scientists cannot fully comprehend or predict their properties. Thus the goal can be the creation of the model itself; the model should account for as much currently available experimental data as possible. Note that this does not mean that the model has been proven, even if the model makes one or more correct predictions about new experiments. With the exception of very restricted cases, it is not possible to prove that a model is correct, only to disprove it and then improve it by modifying it to incorporate the new results.
This view emphasizes the importance of machine learning for constructing models. In most current machine learning applications, statistical and computational methods are used to construct models from large existing datasets and those models are used to process new data. Examples include learning to classify spam emails, to enable fingerprint access to your phone, and to recognize human speech. However, an increasing number of machine learning applications don’t stop learning after their initial training. They can either learn from additional data as it becomes available, or, even choose what additional data they would like to learn from. This last area is termed active machine learning, and it promises to play a very important role in biomedical research in the coming years.
Once the problem has been framed, the second major task of computational biologists begins. This is to borrow, refine, or invent methods to solve the problem. Current computational biology research can be divided into a number of broad areas, mainly based on the type of experimental data that is analyzed or modeled. Among these are analysis of protein and nucleic acid structure and function, gene and protein sequence, evolutionary genomics and proteomics, population genomics, regulatory and metabolic networks, biomedical image analysis and modeling, gene-disease associations, and development and spread of disease.
Software and tools
Computational Biologists use a wide range of software. These range from command line programs to graphical and web-based programs.
Open source software Open source software provides a platform to develop computational biological methods. Specifically, open source means that every person and/or entity can access and benefit from software developed in research. PLOS cites four main reasons for the use of open source software including:
Reproducibility: This allows for researchers to use the exact methods used to calculate the relations between biological data.
Faster Development: developers and researchers do not have to reinvent existing code for minor tasks. Instead they can use pre-existing programs to save time on the development and implementation of larger projects.
Increased quality: Having input from multiple researchers studying the same topic provides a layer of assurance that errors will not be in the code.
Long-term availability: Open source programs are not tied to any businesses or patents. This allows for them to be posted to multiple web pages and ensure that they are available in the future.
Computational Biology Software Market
Top Companies are: Insilico Biotechnology, AutoDock, AMPHORA, Genedata, Entelos, .NET Bio, Leadscope, Anduril, Accelrys, Simulation Plus