Since time immemorial, mankind has wanted to share and use information for later use. First, it was through the caveman paintings and symbols. Then we invented the alphabets, ideograms, numbers and other symbols. Using these, books were written and stored for future generations, in palm leaves, papyrus sheets or paper. The invention of printing brought the Gutenberg revolution, making multiple copies easily and spreading education to millions of people.
Printed books occupy space. Libraries and archives are bursting at the seams. Enter the computer age and digitization using the binary code of combining zeros and ones (0,1) for alphabets and other such symbols, and reading them using the on-off electrical signals, which has made electronic storage possible, cutting down the size and space for ‘hard copies’. Integrated circuits, processors and related electronic wizardry have shrunk the size of computers and storage devices from room-size to finger nail size.
But even so, the amount of information storable in a given ‘hard drive’ (from a printed book to an Amazon or Kindle e-book, or the Encyclopaedia Britannica to Google) is growing exponentially. “That means the cost of storage is rising but our budgets are not”, as Dr. Nick Goldman of the European Bioinformatics Institute at Hinxton, UK told The Economist (in its January 26, 2013 issue). Goldman (together with 4 colleagues at Hinxton and 2 from Agilent Technologies, California, U.S.) decided to use DNA (yes, the molecule which stores the code to make life possible) as the information storage device, rather than electronics. Their paper titled “Towards practical, high-capacity, low maintenance information storage in synthesized DNA” has just been published in the journal Nature two weeks ago (doi:10.1038/nature 11875).
Why DNA? Indeed the question should be ‘why not DNA”. It is a long chain, consisting of 4 alphabets (chemical units called bases and referred to as A, G, C and T) put together in a string of sequence — similar to what the English language does with its 26 alphabets and punctuation marks, or digital computers with the combination of zeros and ones in chosen sequences. DNA has been used since life was born over 2 billion years ago to store and transfer information right through evolution. It is small in size — the entire information content of a human is stored in a 3 billion long sequence of A, G, C and T, and packed into the nucleus of a cell smaller than a micron (thousandth of a millimetre). It is stable and has an admirable shelf life. People have isolated DNA from the bones of dinosaurs dead about 65 millions ago, read the sequence of bases in it and understood much information about the animal. The animal (shall we say the ‘host’ of the DNA) is long since dead but the information lives on.
DNA is thus a long-lived, stable and easily synthesized storage hard drive. While the current electronic storage devices require active and continued maintenance and regular transferring between storage media (punched cards to magnetic tapes to floppy disks to CD...), DNA based storage needs no active maintenance. Just store in a cool, dark and dry place!
The Goldman group is not the first one to think of DNA as a storage device. Dr E.B. Baum tried building an associative memory vastly larger than the brain in 1995, Dr C.T. Clelland and others ‘hid’ messages in DNA microdots in 1999, JPL Cox wrote in 2001 on long-term data storage in DNA, Allenberg and Rotstein came up with a coding method for archiving text, images and music characters in DNA, and in 2012 Church, Gao and Kosuri have discussed the next-generation digital information storage in DNA.
What is novel in the Hinxton method is that they moved away from the conventional binary (0 and 1) code and used a ternary code system (three numerals 0, 1 and 2 using combinations of the bases A, G, C and T) and encode the information into DNA. This novelty avoids any reading errors, particularly when encountering repetitive base sequences. Also, rather than synthesize one long string of DNA to code for an entire item of information, they broke the file down to smaller chunks, so that no errors occur during synthesis or read-out. These chunks are then read in an appropriate manner or protocol, providing for 100 per cent accuracy.
How much information can be stored in DNA? Goldman and co have been able to store 2.2 petabytes (a peta is a million billion or 10 raised to power 15) in one gram of DNA (and as The Economist says “enough, in other words, to fit all of the world’s digital information into the back of a lorry”). What about the speed? And how does one read the files?
Today, the speed is slow and the reading using DNA sequencers is expensive, but in time both the speed will improve and the cost come down considerably. Recall that it took $3 billion to read out the entire human genome a decade ago, and months to do so. Today, the speed has improved, and it is predicted that in a couple of years, the human genome can be read for $1000. But even today, DNA–based information storage is a realistic option to archive long-term, infrequently accessed material.
What did Goldman and group store in DNA? For starters, they stored all 154 sonnets of Shakespeare (in ASCII text), the 1953 Watson-Crick paper on the DNA double helix (in PDF format), a colour photograph of Hinxton (in JPEG) and a clip from the “I have a Dream” speech of Martin Luther King (in MP3 format).
Natural selection and evolution have used DNA to store and read out to make our bodies. And we are now using DNA to store and archive the products of our brains. What a twist!
Scientists have been able to store 2.2 petabytes in one gram of DNA