Synthetic biology is defined as the application of science, technology and engineering to facilitate and accelerate the design, manufacture and/or modification of genetic materials in living organisms. Synthetic biology brings together disciplines including biology, chemistry, computer science and engineering with the aim of redesigning natural biological systems for greater efficiency and creating new organisms as well as biological components that do not yet exist in the natural world. For example it combines the knowledge of genomics and chemical synthesis of DNA for the rapid production of catalogued DNA sequences.
This type of revolution is founded on several core biological technologies, “but it is all about being able to read, write and edit the code of life,” said Tara O’Toole , former undersecretary of Homeland Security for science and technology and current executive vice president and senior fellow at In-Q-Tel, an Arlington, Virginia-based investment firm that works with defense and intelligence organizations. One core technology is DNA sequencing, or the ability to read DNA. Another is DNA synthesis, or the ability to write code for DNA, she noted. “Our ability to write it, to synthesize DNA … is less advanced,” she said. “It’s slower, it’s more expensive, but again we are getting better and better.” Gene editing is another core biotechnology. It allows scientists to alter a DNA sequence by adding, swapping or removing genes.
Among the potential applications of this new field is the creation of bioengineered microorganisms (and possibly other life forms) that can produce pharmaceuticals, detect toxic chemicals, break down pollutants, repair defective genes, destroy cancer cells, and generate hydrogen for the post petroleum economy. Synthetic biology has many applications ranging from drug and vaccine development, applications in food and agriculture, manufacturing and diagnostic tests. In the fight against the current pandemic, scientists are turning to synthetic biology to speed up the development of a vaccine. Synthetic biology can be applied to help provide diagnostic tests, including Toehold circuits and in vitro CRISPR-based detection approaches, which could be utilised to test COVID-19 patients.
This discipline is enjoying an exponential growth, as it heavily benefits from the byproducts of the genomic revolution: high-throughput multi-omics phenotyping, accelerating DNA sequencing and synthesis capabilities, and CRISPR-enabled genetic editing. This exponential growth is reflected in the private investment in the field, which has totalled ~$12B in the 2009–2018 period and is rapidly accelerating (~$2B in 2017 to ~$4B in 2018)
Synthetic Biology is also predicted to transform Defense and Security. New techniques to edit and modify the genome may allow scientists to harness organisms or biological systems as weapons or to perform engineering tasks typically impractical with conventional methods. DARPA wants to utilize the potential of Synthetic biology, to provide on-demand bio-production of novel drugs, new materials, food, fuels, sensors and coatings whatever suits the military’s needs. Future advances might include the construction of new biological parts and brain-computer interfaces.
The importance of Synthetic Biology in commercial market and defense applications has ensued a competition among countries including US and China to lead in it. “Biotechnologies, including synthetic biology, are going to be foundational to the 21st century economy and they’re also going to be a critical arena for global competition in the geopolitical realm,” said Tara O’Toole. Synthetic Biology is being used a lot, particularly in China, It’s being used because it is very fast, cheap and relatively easy to use. Synthetic biology uses the aforementioned technologies to manipulate multi-cell systems in organisms in a way that can construct new biological parts.
Further researchers are employing AI to accelerate and advance Synthetic Biology by improving it’s accuracy and speed and decrease it’s cost. Each of these biotechnologies are being accelerated and improved by artificial intelligence methods, O’Toole noted. “AI is going to fundamentally improve the accuracy and the speed and decrease the cost of all of these four biotechnologies,” she said. “It is already happening.” “China in particular, is pursuing a very aggressive strategy to become the world leader in biotechnology,” she added. A deepened understanding of biotech has moved the world towards a “biorevolution,” O’Toole said during a webinar hosted by the Center for Strategic and International Studies.
China is partly accomplishing this by combining its internet giants, such as Alibaba, with its biotech companies. The combined strength of these companies’ research focuses on the industrialization of artificial intelligence in which China is “institutionalizing it” whereas the United States is only “experimenting with it,” O’Toole added. China had set goal is to make biotechnology 5 percent of the country’s GDP by 2020. China has changed regulations for its own version of the Food and Drug Administration to be more like that of the United States in order to more easily market to the world. The country has created a talent pipeline that incentivizes its own students to go into the life sciences and bioengineering. China also has at least 20 programs intended to bring scientific talent from the rest of the world.
Part of the problem is that the United States has not done a good job at translating biology to products, O’Toole said, or building infrastructure for securing and promoting the bioeconomy. Our translational infrastructure for biology is mostly coming from small start-up companies in the private sector, which are the innovation engines for biology, but do not provide the robust infrastructure to manage epidemics, whether deliberate or natural.
Engineering the behavior of cells by modification of their genetic machinery holds the potential for revolutionary advances in many important application areas, including medical therapies, vaccination, manufacturing of proteins and other organic compounds, and environmental remediation. As capabilities and potential applications grow, the complexity and cross-disciplinary knowledge required to employ them is also growing rapidly. Managing the complexity of biological engineering is thus a problem of increasing importance. The rapid pace of advancement makes it important to have good methods for integration of new knowledge and procedures into organism engineering workflows.
Although biological organisms are complex and not entirely understood, there are many opportunities for AI techniques to make a major difference in the efficacy of organism engineering. Tools that support or carry out information integration and informed decision
making can improve the efficiency and speed of organism engineering, and enable better results.
Artificial intelligence (AI) in Synthetic Biology
In past, typical synthetic biology workflow for organism engineering was viewed as a cycle of three stages: design maps a behavior specification to a nucleic acid sequence intended to realize this behavior; build draws on synthesis and/or assembly protocols to fabricate said nucleic acid sequence; and test assays (measures) the behavior of cells modified to include the sequence, feeding this information back into the design step completing the cycle.
Synthetic biology engineering Design-Build-Test-Learn (DBTL) cycle
Modern synthetic biology engineering principles recognize the Design-Build-Test-Learn (DBTL) cycle—a loop used recursively to obtain a design that satisfies the desired specifications (e.g., a particular titer, rate, yield or product). The DBTL cycle’s first step is to design (D) a biological system expected to meet the desired outcome. That design is built (B) in the next phase from DNA parts into an appropriate microbial chassis using synthetic biology tools. The next phase involves testing (T) whether the built biological system indeed works as desired in the original design, via a variety of assays: e.g., measurement of production or/and omics (transcriptomics, proteomics, metabolomics) data profiling. It is extremely rare that the first design behaves as desired, and further attempts are typically needed to meet the desired specification. The Learn (L) step leverages the data previously generated to inform the next Design step so as to converge to the desired specification faster than through a random search process.
Design: At the most abstract level, the engineer must determine the arrangement of sensors, actuators, regulatory relationships, and/or enzymatic pathways that will be used to implement a desired behavior. An arrangement is then mapped onto the set of DNA or RNA components that are available, or new components are engineered with the desired specifications while ensuring that there are not conflicts between the components selected in the arrangement. Finally, the components in the arrangement must be linearized, i.e., an order must be determined for genes to appear in the DNA sequence.
The design stage of synthetic biology involves model construction, data mining, the sequence design of synthetic promoters, terminators, enzymes, the metabolic design of pathways and metabolisms, as well as the process design of cell production and fermentation.
With the mass amounts of omics data and biofoundry data available, model construction tools have been developed, including COBRA for constructing biochemical constraint-based models and FluxML for constructing 13C metabolic flux analysis models.
Moreover, PartsGenie is an open-source online software for optimizing synthetic biology parts and bridging design, optimization, application, storage algorithms and databases. MAPPs can be used for mapping reference networks into a graph and search for shortest pathways between two metabolites . novoPathFinder can be used to design pathways based on stoichiometric networks under specific constraints. The robot programming language PR-PR can be used in procedure standardization and sharing among biofoundries, and ease communications between protocols and equipment
Build: The build stage creates organisms modified with the designed nucleic sequence(s). First, the sequence(s) are synthesized (created) or assembled to produce actual physical samples, and the host organisms are cultured (grown) to be ready to receive these sequences. The sequences are then delivered to the organism by one of a variety of protocols.
Both of these stages have a number of issues in yield and quality assurance. Many protocols require a “magic touch” by which some practitioners get reliable results and others frequently build systems with problematic flaws. Next-generation sequencing may help to address issues of quality control, but planning, resourcing, and executing build protocols effectively is still an open and challenging problem.
The build stage of synthetic biology involves DNA assembly, genome editing, genome regulation, and automation. Recently developed automation platforms have substantially accelerated our capabilities in reconstructing engineered strains, but automation requires development of technologies that are simple, modular, multiplexable, and efficient.
Automation-friendly DNA assembly tools include the methyltransferase-assisted BioBrick that uses a site-specific DNA methyltransferase together with endonucleases and allows consecutive constructions without gel purification. Twin-Primer Assembly (TPA) that is an enzyme free in vitro DNA assembly method and could assemble 10-fragments with no sensitivity to junction errors and GC contents, Gibson and NEBuilder assembly that is an homology-based in vitro method and is able to clone large DNA parts with high GC contents, Ligase Cycling Reaction (LCR) that employs bridging oligonucleotides to provide overlaps and allows automated assembly in consecutive steps, and yeast in vivo assembly that relies on the high homology recombination efficiency of S. cerevisiae
Many of these DNA assembly tools have already been utilized in automation. For example, Q-metric has been developed to standardizes automated DNA assembly methods, and computes suitable assembly robotic practices, metrics and protocols based on output, cost and time. Amyris Inc. managed to use transformation-associated recombination (TAR)-based biofoundries to assembly 1500 DNA constructs per week with fidelities over 90%.
.
Efficient and multiplexable genome engineering tools include mutiplexed genome disruption, integration, base editing, SCRaMbLE, automation. For example, Zhang et al. reported the efficient GTR-CRISPR system that managed to simultaneously disrupt six genes in three days and improve yeast production of free fatty acid by 30-fold in 10 days
Test: Finally, the behavior of the newly constructed organism or organisms is assayed (measured) to determine how well it corresponds with the original specification, and to help debug misbehavior such that the next iteration of the design can be closer to the desired behavior.
Here, one of the biggest challenges is in relating assay data to the original specification: many assays produce data in great volume, but the mapping back to the original specification is often qualitative or relative. Likewise, it is often not clear how to relate the observed behavior to predictive models that can provide principled guidance in how to adjust the design phase in order to produce improved results.
The test stage of synthetic biology involves cell culture, cell sorting and cell analysis, and automation has also posed special requirements on the test workflow.
Learn: The Learn phase of the DBTL cycle has traditionally been the most weakly supported and developed, despite its critical importance to accelerate the full cycle. The reasons are multiple, although their relative importance is not entirely clear. Arguably, the main drivers of the lack of emphasis on the L phase are: the lack of predictive power for biological systems behavior, the reproducibility problems plaguing biological experiments, and the traditionally moderate emphasis on mathematical training for synthetic biologists.
The learn stage of synthetic biology involves systems biology analysis and machine learning. Automation platforms can generate massive amount of data, that need to be analyzed and integrated back to the design stage to refine the models and guide the following iterative DBTL cycles through standardized procedures.
Artificial intelligence (AI) for synthetic biology
Artificial intelligence (AI) is the science and engineering of making intelligent machines and programming them with reasoning, learning and decision making behaviours. Machine Learning (ML) [a subfield of AI] a subclass of AI algorithms that extrapolate patterns from data and then use that analysis to make predictions has now become a pervasive technology, underlying many modern applications including internet search, fraud detection, gaming, face detection, image tagging, brain mapping, check processing and computer server health-monitoring.
There is a wide variety of algorithms and processes for implementing ML systems. Most of what we hear about artificial intelligence refers to machine learning, . The more data these algorithms collect, the more accurate their predictions become. Deep learning is a more powerful subcategory of machine learning, where a high number of computational layers called neural networks (inspired by the structure of the brain) operate in tandem to increase processing depth, facilitating technologies like advanced facial recognition (including FaceID on your iPhone).
Artificial intelligence (AI) has the potential to revolutionize the way physicians treat patients and deliver care. An AI system can assist physicians by providing up-to-date medical information from journals, textbooks and clinical practices to inform proper patient care. In addition, an AI system can help to reduce diagnostic and therapeutic errors that are inevitable in the human clinical practice. AI is generating more and better drug candidates (Insitro), sequencing your genome (Veritas Genetics), and detecting your cancer earlier and earlier (Freenom).
Biology, in particular, is one of the most promising beneficiaries of artificial intelligence. From investigating genetic mutations that contribute to obesity to examining pathology samples for cancerous cells, biology produces an inordinate amount of complex, convoluted data. But the information contained within these datasets often offers valuable insights that could be used to improve our health.
A study has found that artificial intelligence (AI) is well-placed to be used in synthetic biology to automatically design new DNA sequences with specific desirable qualities. The findings, published in leading genomics journal Nature Genetics, suggests that AI could be used to predict changes in DNA sequences crucial in medical diagnosis, vaccine development and medical diagnostics. Analysing genomes are typically done in the lab at a comparatively smaller scale. By using deep learning techniques, AI is able to spot patterns in large volumes of data that humans would find near on impossible to process. This makes AI-driven predictions a much cheaper and quicker alternative for genomic research.
Co-author Mikael Huss, data scientist at AI-firm Peltarion and professor at the Karolinska Institutet in Sweden, said, “In genomics, researchers need to understand the effects of changing a piece of DNA, especially when multiple changes are done. Traditionally this is done with lab experiments, but evaluating all possible changes is both costly and time-consuming.” “As work reviewed in this paper shows, deep learning based AI methods could in many cases be used to predict effects of changing DNA much faster and at much lower cost. “For instance, if you want to create a new antimicrobial drug, you can use AI techniques to synthesize a novel compound with specifically desired properties. “This is potentially game-changing.”
Machine learning (ML) arises as an effective tool to predict biological system behavior and empower the Learn phase, enabled by emerging high-throughput phenotyping technologies. By learning the underlying regularities in experimental data, machine learning can provide predictions without a detailed mechanistic understanding. Training data are used to statistically link an input (i.e., features or independent variables) to an output (i.e., response or dependent variables) through models that are expressive enough to represent almost any relationship. After this training, the models can be used to predict the outputs for inputs that the model has never seen before. Machine learning has been used to, e.g., predict the use of addictive substances and political views from Facebook profiles, automate language translation, predict pathway dynamics, optimize pathways through translational control, diagnose skin cancer, detect tumors in breast tissues, predict DNA and RNA protein-binding sequences, and drug side effects. However, the practice of machine learning requires statistical and mathematical expertise that is scarce, and highly competed.
Machine learning and quantitative biology based on constraint-based models has also gained considerable progress for use in identification of correlations between genotype and phenotype. Various techniques have been developed in machine learning to analyze the massive amount of data, including unsupervised learning and dimensionality reduction. Radivojević et al. developed an automated recommendation tool based on machine learning and probabilistic modeling techniques, and improved the production of limonene, bisabolene and dodecanol. Moreover, the development of automated learning technologies is particularly important to realize the iterative engineering of microbial cell factories in the automation procedure. Regarding this need,
In 2019, Mohammad et al. developed a fully automated platform BioAutomata that integrated machine learning algorithms with the iBioFAB robotic system This system as a compelling proof of concept can be used to guided automatically iterative DBTL cycles to accumulate beneficial engineering for bioproduction. This BioAutomata platform designs experiments, executes them and analyzes data to optimize a user-specified biological process in an iterative manner. BioAutomata trains a probabilistic model on initially generated (or available) data and decides the best points of the optimization space to evaluate, i.e., the points that are more likely to result in an improved biosystem. This results in a reduction of the total number of experiments needed to find the maximum of the optimization space.
A machine learning Automated Recommendation Tool for synthetic biology
Scientists at Lawrence Berkeley National Laboratory (Berkeley Lab) in California have created a machine learning algorithm for synthetic biology called ART (Automated Recommendation Tool), and published their study in Sep 2020 in Nature Communications. “Here, we present the Automated Recommendation Tool (ART), a tool that leverages machine learning and probabilistic modeling techniques to guide synthetic biology in a systematic fashion, without the need for a full mechanistic understanding of the biological system.” In the Berkeley Lab study, the research team of Hector Garcia Martin, Kenneth Workman, Zak Costello and Tijana Radivojević created a patent-pending algorithm called ART (Automated Recommendation Tool) to accelerate development through guided bioengineering, quantification of uncertainty, and access to AI machine learning techniques.
ART enables researchers to better predict outcomes by using training instances made of a set of vectors of measurements and associated system responses. The algorithm brings together various machine learning models from the scikit-learn library using a Bayesian ensemble approach in order to predict the output’s probability distribution. As proof-of-concept, the researchers then collaborated with other scientists at the Novo Nordisk Foundation Center for Biosustainability at the Technical University of Denmark, TeselaGen Biotechnology, and their global research colleagues. Together the researchers used ART to manage the metabolic engineering process in efforts to increase tryptophan production using baker’s yeast (Saccharomyces cerevisiae) and published the joint study on September 25 in Nature Communications.
Tryptophan is an essential amino acid used for making and managing neurotransmitters, muscles, enzymes, and proteins. Tryptophan is required for normal growth in infants and is not produced by the body. First the Danish researchers and their colleagues created a combinatorial library, a collection of chemicals or molecules synthesized by combinatorial chemistry and set up a large phenotypic dataset. Combinatorial chemistry is the chemical synthetic method that enables to produce large quantities of compounds in a single process.
“To construct a combinatorial library targeting equal representation of 30 promoters expressing five target genes, we harnessed high-fidelity homologous recombination in yeast together with the targetability of CRISPR/Cas9 genome engineering for a one-pot assembly of a maximum of 7,776 (65) different combinatorial designs,” wrote the researchers.The researchers trained ART to associate certain amino acid production with gene expression using experimental data on a small percentage, just 250 genotypes, out of the 7,776 possible combinations of biological pathways of five target genes as the input training dataset. ART extrapolated how the remaining thousands of combinations would impact tryptophan production, then produced designs to improve high tryptophan production ranked in priority.
“From a single data-generation cycle, this enables successful forward engineering of complex aromatic amino acid metabolism in yeast, with the best machine learning-guided design recommendations improving tryptophan titer and productivity by up to 74 and 43%, respectively, compared to the best designs used for algorithm training,” wrote the researchers.
The researchers demonstrated the capabilities of machine learning to accelerate metabolic engineering. The worldwide synthetic biology market is expected to reach USD 18.9 billion with a compound annual growth rate (CAGR) of 28.8 percent during the period of 2019-2024 according to BCC Research. Artificial intelligence and synthetic biology are innovative technologies where the intersection amplifies the potential benefit to humanity in the future.
Artificial intelligence makes enzyme engineering easy
Enzymes perform impressive functions, enabled by the unique arrangement of their constituent amino acids, but usually only within a specific cellular environment. When you change the cellular environment, the enzyme rarely functions well—if at all. Thus, a long-standing research goal has been to retain or even improve upon the function of enzymes in different environments; for example, conditions that are favorable for biofuel production. Traditionally, such work has involved extensive experimental trial-and-error that might have little assurance of achieving an optimal result.
Artificial intelligence (a computer-based tool) can minimize this trial-and-error, but still relies on experimentally obtained crystal structures of enzymes—which can be unavailable or not especially useful. Thus, “the pertinent amino acids one should mutate in the enzyme might be only best-guesses,” says Teppei Niide, co-senior author. “To solve this problem, we devised a methodology of ranking amino acids that depends only on the widely available amino acid sequence of analogous enzymes from other living species.”
The researchers focused on the amino acids that are involved in the specificity of the malic enzyme to the molecule that the enzyme transforms (i.e., the substrate) and to the substance that helps the transformation proceed (i.e., the cofactor). By identifying the amino acid sequences that did not change over the course of evolution, the researchers identified the amino acid mutations that are adaptations to different cellular conditions in different species.
“By using artificial intelligence, we identified unexpected amino acid residues in malic enzyme that correspond to the enzyme’s use of different redox cofactors,” says Hiroshi Shimizu, co-senior author. “This helped us understand the substrate specificity mechanism of the enzyme and will facilitate optimal engineering of the enzyme in laboratories.”
This work succeeded in using artificial intelligence to dramatically accelerate and improve the success of substantially reconfiguring an enzyme’s specific mode of action, without fundamentally altering the enzyme’s function. Future advances in enzyme engineering will greatly benefit fields such as pharmaceutical and biofuel production that require carefully tuning the versatility of enzymes to different biochemical environments—even in the absence of corresponding enzymes’ crystal structures.
References and Resources also include:
https://www.sciencedirect.com/science/article/pii/S1369527421001466
https://www.eurekalert.org/news-releases/969812