Storing data in DNA sounds like science fiction, but it’s happening in the near future. Professor Tom de Greef expects the first DNA data center to be operational within five to ten years. Data will not be stored as zeros and ones on the hard drive but in the base pairs that make up DNA: AT and CG. Such a data center would be in the form of a lab, many times smaller than today’s. De Greef could already picture it all. In one part of the building, new files will be encoded through DNA synthesis. Another section will contain a large field of capsules, each capsule packed with a file. The robotic arm will eject the capsule, read its contents, and place it back.
We are talking about synthetic DNA. In the laboratory, bases stick together in a specific order to form synthetically produced DNA strands. Files and photos currently stored in the data center can then be stored in DNA. For now, this technique is only suitable for archival storage. This is because readings of stored data are very expensive, so you want to consult the DNA files as little as possible.
Energy-wasting Big Data Centers Become Obsolete
Storing data in DNA offers many advantages. DNA files can be stored much more compactly, for example, and the data age is many times longer. But perhaps most importantly, this new technology is making large, energy-intensive data centers obsolete. And this is urgently needed, De Greef warns, “because in three years, we’re going to generate so much data around the world that we won’t be able to store half of it.”
Together with PhD student Bas Bögels, Microsoft and a group of university partners, De Greef has developed a new technique to make data storage innovations with synthetic DNA scalable. The results have been published today in the journal Nature Nanotechnology. De Greef works in the Department of Biomedical Engineering and the Institute for Complex Molecular Systems (ICMS) at TU Eindhoven and is a visiting professor at Radboud University.
The idea of using DNA strands for data storage emerged in the 1980s but was too difficult and expensive at the time. It became technically possible three decades later, when DNA synthesis started to take off. George Church, a geneticist at Harvard Medical School, outlined the idea in 2011. Since then, the synthesis and reading of data has become much cheaper, eventually bringing the technology to market.
In recent years, De Greef and his group especially looked into reading stored data. For now, this is the biggest problem facing the new technique. The PCR method currently used for this, called ‘random access’, is very error prone. Therefore, you can only read one file at a time and, in addition, the quality of the data degrades too much each time you read the file. Not scalable.
Here’s how it works: PCR (Polymerase Chain Reaction) makes millions of copies of the piece of DNA you need by adding a primer with the desired DNA code. Laboratory tests for Corona, for example, are based on this: even tiny amounts of coronavirus material from your nose can be detected when copied many times. But if you want to read multiple files simultaneously, you need multiple pairs of primaries doing their thing simultaneously. This creates a lot of errors in the copying process.
Each Capsule Contains One File
This is where capsules come into play. De Greef’s group developed protein and polymer microcapsules and then tethered one tuft per capsule. De Greef: “The capsule has thermal properties that we can use to our advantage.” Above 50 degrees Celsius, the capsules close on their own, allowing the PCR process to take place separately in each capsule. Not much room for error. De Greef calls it ‘thermo-confined PCR’. In the lab, so far it’s managed to read 25 files simultaneously without any significant errors.
If you then lower the temperature again, the copy will detach from the capsule and the tethered original will remain, meaning that the quality of your original will not suffer. De Greef: “We are currently losing 0.3 percent after three readings, compared to 35 percent with the existing method.”
Searched By Fluorescence
And that’s not all. De Greef also made data libraries more searchable. Each file is labeled neon and each capsule is its own color. The device can then recognize the colors and separate them from one another. This brings us back to the imaginary robotic arm at the start of this story, which would neatly select the desired files from a collection of future capsules.
This solves the problem of reading data. De Greef: “Now it’s just a matter of waiting for the cost of DNA synthesis to drop. The technique will then be ready for implementation.” As a result, he hopes that the Netherlands will soon be able to open its first DNA data center – the first in the world.
The paper appears in the journal Nature Nanotechnology under the title ‘Storage of DNA in thermoresponsive microcapsules for repeat random multiplex data access’. DOI: 10.1038/s41565-023-01377-4. Industry partner: Microsoft. University partners: University of Washington, Radboud University, University of Bristol, Shanghai Jiao Tong University. Partnerships: Center for Living Technology, Eindhoven-Wageningen-Utrecht Alliance