While silicon microprocessors have so far been at the heart of the computing world, they are reaching their physical limit. With ongoing challenges of physical speed, energy efficiency, and miniaturization limitations of silicon microprocessors, there is a need to find alternatives.
Similarly, in a world flooded with data, figuring out where and how to store it efficiently and inexpensively becomes a larger problem every day. Demand for data storage is growing exponentially, but the capacity of existing storage media is not keeping up. Most of the world’s data today is stored on magnetic and optical media. Despite improvements in optical discs, storing a zettabyte of data would still take many millions of units, and use significant physical space.
Facebook’s new cold storage facility in Fort Worth, Texas is a 2.6m square-foot facility spanning 150-acres. It is scheduled to be completed in 2022 with a total cost of $1.5B. However space, power, and cooling requirements are not the only challenge. In addition, synchronization across multiple EB archives is currently impossible. If we are to preserve the world’s data, we need to seek significant advances in storage density and durability.
DNA is emerging as an alternative and has the potential to take computing and storage to new levels. Compared to conventional computers, DNA used as a computing medium may prove to be a billion times more energy-efficient and to have a trillion times more data-storage capacity. (DNA stores information at a density of about 1 bit/ nm3, about a trillion times as efficient as videotape.)
Recently biocomputers are becoming feasible due to advancements in nanobiotechnology and Synthetic Biology. Biocomputers use systems of biologically derived molecules—such as DNA and proteins—to perform computational calculations. It is expected that the most significant advantage of the DNA chip will be parallel processing. DNA computers will be able to do more than one calculation in parallel. This is a limitation of conventional computers that operate linearly and can take hundreds of years to perform complex calculations. It is this promise of parallel computing that is proving to be genuinely revolutionary.
Moreover, with the cheap supply of DNA and the evolving DNA manufacturing processes, the process to develop DNA chip is becoming much cleaner and realistic. DNA-based computers will not only make computers smaller, but they will hold more data as well. The DNA microprocessors are transformative.
Scientists are also using DNA for digital storage which is a requirement for building biological computers. The use of DNA for digital storage is appealing in theory because DNA is ultracompact enough to store, replicate, and transmit massive amounts of information. While this is not practical yet due to the current state of DNA synthesis and sequencing, these technologies are improving quite rapidly with advances in the biotech industry.
DNA offers potential to store the massive amount of digital data, allowing to store one million times more dense information than flash drives, and this data stored in DNA can be preserved for over 1,000 years. Using DNA to archive data is an attractive possibility because it is extremely dense (up to about 1 exabyte per cubic millimeter) and durable (half-life of over 500 years). Studies show that DNA properly encapsulated with a salt remains stable for decades at room temperature and should last much longer in the controlled environs of a data center. DNA doesn’t require maintenance, and files stored in DNA are easily copied for negligible cost.
To store data in DNA, a data file is first converted from its digital sequence of 0’s and 1’s into a DNA sequence of A’s, C’s, T’s and G’s. The DNA data file is synthesized in short segments of DNA from 200 to 300 bases long, then stored. Each short segment contains an index to indicate its place within the overall data file. To retrieve the data, the segments are sequenced and decoded back into the original file.
The DNA indexing system allows part of the file to be biologically recovered or “random access” before sequencing, so only data of interest is sequenced. Error-correcting algorithms are used during the encode/decode process, enabling all data to be recovered error-free.
Furthermore, it is also easier to prevent or detect the attempt to modify the stored data in the DNA. Currently, various research and possible attempts are ongoing to commercialize this technology. For instance, in September 2018, the Arch Mission Foundation partnered with Microsoft, University of Washington, and Twist Bioscience to archive 10,000 crowd sourced images and full text of 20 important books, among others, in Astrobotic’s 2020 mission to the moon. DNA-based data storage allows data to be encoded into billions of synthetic DNA molecules and encapsulated for long-term preservation.
DNA storage breakthroughs
Researchers at Microsoft and the University of Washington reached an early but important milestone in DNA storage by storing a record 200 megabytes of data on the molecular strands. Microsoft stored more than 100 books and a music video in a DNA strand, occupying a spot in a test tube “much smaller than the tip of a pencil,” said Douglas Carmean. The DNA data writing involved the translation of data from 1s to 0s into letters of nucleotide bases of four basic DNA strand, translating the letters into molecules and returning them back.
“DNA is an amazing information storage molecule that encodes data about how a living system works. We’re repurposing that capacity to store digital data — pictures, videos, documents,” said Ceze, who is conducting research in the team’s Molecular Information Systems Lab (MISL), which is housed in a basement on the University of Washington campus. “This is one important example of the potential of borrowing from nature to build better computer systems.”
Scientists from Columbia University and the New York Genome Center have created the highest-density DNA data storage ever invented, surpassing Church and his team’s first research. Led by Yaniv Erlich, the team of engineers successfully stored and retrieved 214 pentabytes of data (214,000 gigabytes) into DNA. They took advantage of the structure of DNA molecules, which look like twisting ladders denoted by the letters A, C, G, and T. This genetic sequence typically acts as a building block for living things, and if one can convert it into binary numbers 0 and 1, DNA molecules can encode almost anything. Of course, the process is not that easy because not all DNA sequences are robust enough, said Erlich. What’s more, not all data stored in DNA can be retrieved successfully.
Calling their process a “DNA Fountain,” the researchers first compressed all the data into a single master archive and split it into short strings of binary digits, made up of ones and zeros. Next, the duo used an “erasure-correcting algorithm called fountain codes” to randomly packaged the strings into droplets. Each droplet contains a barcode in the sequence that helped the researchers reassembling the file.
The researchers then “mapped the ones and zeros in each droplet to the four nucleotide bases in DNA: A, G, C and T,” and ended up with a digital list of 72,000 DNA strands that contained the encoded data. This way, the DNA sequence can still be decoded even if a few codes get lost. If stored appropriately, DNA can last hundreds of thousands of years and save millions of data. “DNA won’t degrade over time like cassette tapes and CDs,” said Erlich. “[I]t won’t become obsolete.
Roswell, a privately-held biotechnology company in the United States, appears to be on its way to revolutionizing DNA reading technology. The significance of this technology is enormous as it will unlock limitless possibilities for the advancement of a DNA economy through the power of Molecular Electronics.
While this is not practical yet due to the current state of DNA synthesis and sequencing, these technologies are improving quite rapidly with advances in the biotech industry. Given the impending limits of silicon technology (end of Moore’s Law), we believe hybrid silicon and biochemical systems are worth serious consideration. Biotechnology has benefitted tremendously from progress in silicon technology developed by the computer industry; now is the time for computer architects to consider incorporating biomolecules as an integral part of computer design.
IARPA announces the launch of the MIST program in Jan 2018
The Intelligence Advanced Research Projects Activity (IARPA), within the Office of the Director of National Intelligence, announced in 2018 the launch of the Molecular Information Storage (MIST) program. MIST is a multi-year research effort to develop next-generation data storage technologies that can scale into the exabyte (1 million terabyte) regime, and beyond, with significantly reduced physical footprint, power and cost requirements, relative to conventional approaches. The program will pursue this goal by using synthetic DNA as a data storage medium and developing a new category of devices that can write information to, and read from, synthetic DNA media at scale.
The program aims to develop a storage technology that eventually can scale into the Exabyte regime and beyond. It must meet reduced footprint, power, and cost requirements, without degradation of data. Specifically, it must demonstrate the writing of 1 TB (1TB = 1,000 gigabytes) and reading of 10 TB in 24 hours for $1,000 operational cost.
The DNA computation and storage technologies fall under the IARPA’s four areas of focus that are analysis, anticipatory intelligence, collection and computing. Generally, Dixon said, improving analytical capabilities is about making better use of data to, in turn, make better decisions.
In a quest to find out if that’s attainable, practically speaking, the Intelligence Advanced Research Projects Activity IARPA has launched a four-year competition. There are two research teams. Each is multi-disciplinary, made up of industry, university and research institute members with expertise in biological systems, chemistry, data storage systems, and statistics.
Los Alamos National Laboratory is playing the role of referee, providing the testing and evaluation of the many facets of this challenge and helping researchers refine their work. The winning technologies will demonstrate an end-to-end storage and retrieval workflow, with the potential to provide the United States with an overwhelming intelligence advantage. The success of the program will change the way stakeholders can archive the incredible amounts of information our modern society is generating.
“The goal of the MIST program is to dramatically advance data storage technology in terms of density and cost. LANL’s role is to test the systems that the development teams produce, and evaluate them for aspects such as capacity, density, and cost,” Said Tracy Erkkila, Program Manager and lead scientist for the MIST Test and Evaluation Team.
Together with Sandia National Laboratories and the U.S. Army Research Laboratory South (ARL-S), the T&E team will review and evaluate deliverables, participate in monthly progress and technical program reviews, and will have an on-site presence during milestone demonstrations, with access to each team’s system. The T&E team will then collaborate to develop milestone demonstration test plans and evaluate the researchers’ results.
The three laboratories bring a broad array of unique expertise to the program. Los Alamos brings deep experience in genomics and bioinformatics with experts in biochemistry and synthesis. Sandia’s expertise for this venture is in microsystems design and fabrication along with extensive microfabrication capabilities, including die level processing. And ARL’s expertise lies in nucleic acid synthesis, gene synthesis and assembly, directed evolution, and protein engineering.
The scale and complexity of the world’s big data problems are rapidly increasing. Use cases requiring storage and random access from an exabyte of data are well-established in the private sector and increasingly relevant to the public sector. Meeting these requirements poses logistical and financial challenges. Today’s exabyte-scale data centers, for example, occupy large warehouses, consume megawatts of power, and cost hundreds of millions of dollars to build, operate and maintain. This resource-intensive model limits the availability of exascale storage and future scalability.
“The MIST program is a data storage moonshot to develop technologies that allow us to shrink an exabyte-scale data warehouse down to a tabletop form factor, with equally large reductions in operation and maintenance costs,” said IARPA Program Manager David Markowitz. “This would be a transformative capability for big data stakeholders in government and industry.”
Through a competitive Broad Agency Announcement, IARPA awarded MIST research contracts to teams led by the Georgia Tech Research Institute, as well as the Broad Institute of MIT and Harvard University. Los Alamos National Laboratory, Sandia National Laboratories and the U.S. Army Research Laboratory will work together to independently test the new systems — drawing on expertise in DNA synthesis, sequencing, nanofabrication, information theory and large-scale file systems.
Illumina, Microsoft, Twist Lead New DNA Data Storage Alliance
Fifteen tech-based companies and institutions have formed an alliance in Nov 2020, aimed at advancing DNA data storage by agreeing upon a “roadmap” of definitions and standards to help the industry achieve interoperability between solutions.
Illumina, Microsoft, Twist Bioscience, and data storage giant Western Digital are leading the effort as founding members of the DNA Data Storage Alliance. The four founding members and 11 other member companies and institutions have committed to addressing the explosive growth of digital data by establishing the foundations of a cost-effective commercial archival storage ecosystem. That ecosystem, the Alliance, asserts, could potentially deliver a low-cost archival data storage solution alternative to current storage technologies, which have limited longevity and require data migration to achieve long-term data storage.
By contrast, DNA provides a stable format storage medium that is durable for thousands of years when properly stored, the Alliance says. DNA enables cost effective and rapid duplication within a tiny space: Ten full-length digital movies can be stored within the equivalent volume of a single grain of salt, though digital data preserved in DNA can be encased in glass beads or stored in capsules or pellets.
The Alliance cited a figure from Gartner projecting that by 2024, 30% of digital businesses will address the exponential growth of data that is poised to overwhelm existing storage technology by mandating DNA storage trials. “In collaboration with University of Washington, we have demonstrated a fully automated end-to-end system capable of storing and retrieving data from DNA, and we have separately stored 1GB of data in DNA synthesized by Twist and recovered data from it,” Karin Strauss, PhD, senior principal research manager at Microsoft, said in a statement. In addition to developing an industry roadmap, the Alliance plans to develop use cases in various markets and industries as well as promote adoption of this future solution through efforts to educate the broader data storage community.
“We’re encouraged by the potential for more sustainable data storage with DNA and look forward to collaborating with others in the industry to explore early commercialization of this technology,” Strauss added.
Joining Illumina, Microsoft, Twist, and Western Digital as members of the Alliance are:
- Ansa Biotechnologies, a DNA synthesis service provider for synthetic biology research.
- CATALOG, developer of what it says is the world’s first DNA-based digital data storage and computation platform.
- The Claude Nobs Foundation, focused on digital preservation of the audiovisual collection of its namesake, the founder of the Montreux Jazz festival.
- DNA Script, developer of SYNTAXTM, the world’s first benchtop DNA printer powered by enzymatic technology.
- EPFL (École Polytechnique Fédérale de Lausanne) – Cultural Innovation & Innovation Center (Montreux Jazz Digital Project)
- ETH Zurich – The Swiss Federal Institute of Technology
- Interuniversity Microelectronics Centre (Imec), an R&D hub for nano- and digital technologies
- Iridia, established in 2016 to develop the world’s first commercially-attractive, DNA-based data storage solution
- Molecular Assemblies, developer of an enzymatic DNA synthesis technology designed to power DNA-based products for industrial synthetic biology, precision medicine, and emerging applications that include DNA for data information storage
- Molecular Information Systems Lab at the University of Washington (UW), a partnership between UW Computer Science, Electrical Engineering, and Microsoft Research
“DNA is an incredible molecule that, by its very nature, provides ultra-high-density storage for thousands of years,” said Emily M. Leproust, PhD, CEO and co-founder of Twist Bioscience. “By joining with other technology leaders to develop a common framework for commercial implementation, we drive a shared vision to build this new market solution for digital storage.”
References and Resources also include: