Optimal Codes Correcting a Single Indel / Edit for DNA-Based Data Storage

by   Kui Cai, et al.

An indel refers to a single insertion or deletion, while an edit refers to a single insertion, deletion or substitution. In this paper, we investigate codes that combat either a single indel or a single edit and provide linear-time algorithms that encode binary messages into these codes of length n. Over the quaternary alphabet, we provide two linear-time encoders. One corrects a single edit with log n + O(log log n) redundancy bits, while the other corrects a single indel with log n + 2 redundant bits. These two encoders are order-optimal. The former encoder is the first known order-optimal encoder that corrects a single edit, while the latter encoder (that corrects a single indel) reduces the redundancy of the best known encoder of Tenengolts (1984) by at least four bits. Over the DNA alphabet, we impose an additional constraint: the GC-balanced constraint and require that exactly half of the symbols of any DNA codeword to be either C or G. In particular, via a modification of Knuth's balancing technique, we provide a linear-time map that translates binary messages into GC-balanced codewords and the resulting codebook is able to correct a single indel or a single edit. These are the first known constructions of GC-balanced codes that correct a single indel or a single edit.


Every Bit Counts: A New Version of Non-binary VT Codes with More Efficient Encoder

In this work, we present a new version of non-binary VT codes that are c...

Beyond Single-Deletion Correcting Codes: Substitutions and Transpositions

We consider the problem of designing low-redundancy codes in settings wh...

Properties and constructions of constrained codes for DNA-based data storage

We describe properties and constructions of constraint-based codes for D...

Average Redundancy of Variable-Length Balancing Schemes à la Knuth

We study and propose schemes that map messages onto constant-weight code...

Low-redundancy codes for correcting multiple short-duplication and edit errors

Due to its higher data density, longevity, energy efficiency, and ease o...

Balanced reconstruction codes for single edits

Motivated by the sequence reconstruction problem initiated by Levenshtei...

Near-Linear Time Insertion-Deletion Codes and (1+ε)-Approximating Edit Distance via Indexing

We introduce fast-decodable indexing schemes for edit distance which can...

Please sign up or login with your details

Forgot password? Click here to reset