Database Matching Under Column Repetitions
Motivated by synchronization errors in the sampling of time-indexed databases, matching of random databases under random column repetitions (including deletions) is investigated. Column histograms are used as a permutation-invariant feature to detect the repetition pattern, whose asymptotic-uniqueness is proved using information-theoretic tools. Repetition detection is followed by a row matching scheme. Considering this overall scheme, sufficient conditions for successful database matching in terms of the database growth rate are derived. A modified version of Fano's inequality leads to a tight necessary condition for successful matching, establishing the matching capacity under column repetitions. This capacity is equal to the erasure bound, which assumes the repetition locations are known a-priori.
READ FULL TEXT