Encoding techniques for long-term storage of digital images into synthetic DNA

Abstract : Data explosion is one of the greatest challenges of digital evolution, causing the storage demand to grow at such a rate that it cannot compete with the actual capabilities of devices. The digital universe is forecast to grow to over 175 zettabytes by 2025 while 80% is infrequently accessed (“cold” data), yet safely archived in off-line tape drives due to security and regulatory compliance reasons. At the same time, conventional storage devices have a limited lifespan of 10 to 20 years and therefore should be frequently replaced to ensure data reliability, a process which is expensive both in terms of money and energy. Recent studies have shown that due to its biological properties, DNA is a very promising candidate for the long-term archiving of “cold” digital data for centuries or even longer under the condition that the information is encoded in a quaternary stream made up of the symbols A, T, C and G, to represent the 4 components of the DNA molecule, while also respecting some important encoding constraints. Pioneering works have proposed different algorithms for DNA coding leaving room for further improvement. In this thesis we present some novel image coding techniques for the efficient storage of digital images into DNA. We implemented a novel fixed length algorithm for the construction of a robust quaternary code that respects the biological constraints and proposed two different mapping functions to allow flexibility according to the encoding needs. Furthermore, one of the main challenges of DNA data storage being the expensive cost of DNA synthesis, we make a very first attempt to introduce controlled compression in the proposed encoding workflow. The, proposed codec is competitive compared to the state of the art. Furthermore, our end-to-end coding/decoding solution has been experimented in a wet lab experiment to prove feasibility of the theoretical study in practice
