DNA Storage To Solve Digital Data Storage Shortages

Humanity generates data at a very fast pace and storage technologies aren’t enough to store all those. The quantity of data (including videos and photos) that we produce increases by 10-folds every five years. Even though everything need not be stored, data storage manufacturers are not making enough flash chips and hard drives to contain only those data that we want to preserve. And as humanity is not going to cease capturing photos and recording movies, new ways have to be developed to save those.

Over millions of years, an incredible data storage medium has been evolved in nature. Yes, it is the DNA!Though it has been evolved to store genetic information, it can be used for purposes other than that as well. This material is much denser than modern storage media: a matchbox sized DNA package can hold data that has to be stored on hundreds of thousands of DVDs. This material is also much more durable- it can last for thousands of years as against the decades of life of today’s hard drives. Further, DNA will never be outdated, unlike the hard drive formats and connection standards.

The concept of using DNA for digital data storage is not new, it evolved some decades back. However, recent works have shown that this idea could materialize today. A recent research by the European Bioinformatics Institute and Harvard have sowed that the advancements in modern DNA manipulation methods could make digital data storage in DNA both practical and possible today. Several research groups including the University of Illinois at Urbana-Champaign, the ETH Zurich, and Columbia University are working on this issue. Researchers at Microsoft and the University of Washington hold the world record for the quantity of data successfully stored in and recovered from DNA – 200 megabytes.

Preparing bits to become atoms
Conventional storage devices such as DVDs, thumb drives, and hard drives store digital data in them by modifying the optical, electrical, or magnetic properties of the material to store it in the form of 0s and 1s.

To store digital data in DNA, the same idea,but a different process has to be used. DNA molecules comprise of long sequences of units called nucleotides. These nucleotides are named adenine, thymine, guanine and cytosine, designated generally as A, T, G and C. So, information has to be stored in DNA in the form of sequence of nucleotides, instead of sequences of 0s and 1s like electronic media.

Several methods can be employed for this, but their common concept is to assign DNA nucleotides with digital data patterns. For example, A can be coded as 00, T can be coded as 10, C can be 01, and G can be 11. So, for storing a digital image in a JPEG format, a long string of 0s and 1s has to be coded. If the first eight bits of the file are 01111000; these have to be split into pairs – 01 11 10 00. This would correspond to C-G-T-A. S0, nucleotides are joined in this order to form a DNA strand.

Unlike digital files which can be quite large (even sizes of terabytes), individual DNA strands have to be shorter (only some 20 bytes each). This is because it is difficult to chemically build a longer strand of DNA.

This calls for the need to break the data into smaller chunks, and include an indicator to each chunk so as to mark the place where it falls in the sequence. This indicator will ensure that all the data chunks stay in their right order when the DNA stored data is read.

Storing the data
After determining the order of letters in the DNA sequences, these have to be actually produced letter by letter by employing chemical reactions. A specialized equipment drives these reactions in bottles of A’s, T’s, G’s and C’s, and combines them in a chemical solution to generate specific orders of nucleotides, generating the physical DNA strands.

This process also benefits by enabling the production of several backup copies as well. The DNA strands thus created are then dried and kept in protective containers.

Reading the data back
DNA stored data is useful only if can be retrieved at a later period. A sequencing machine is used to read back the stored data. This recognizes the molecules, and generates a letter sequence per molecule. This is then decoded into an orderly binary sequence of 0s and 1s. If the DNA gets destroyed in this process, the backup copies can be used. And if there is a depletion of backup copies, duplicate copies can be easily generated to refill the storage.

Currently, DNA storage is experimental. Before it is widely used, complete automation is required, and the DNA building and reading processes have to be improved. These challenges, will soon be overcome by the rapidly improving technologies. Further, since DNA data storage need not be perfectly accurate as in nature, researchers will find faster and cheaper ways to store information in DNA.

Share This Article

Written by

Total Views: 2,130 views