Estimating the Longest Increasing Subsequence in Nearly Optimal Time

by   Alexandr Andoni, et al.

Longest Increasing Subsequence (LIS) is a fundamental statistic of a sequence, and has been studied for decades. While the LIS of a sequence of length n can be computed exactly in time O(nlog n), the complexity of estimating the (length of the) LIS in sublinear time, especially when LIS ≪ n, is still open. We show that for any integer n and any λ = o(1), there exists a (randomized) non-adaptive algorithm that, given a sequence of length n with LIS ≥λ n, approximates the LIS up to a factor of 1/λ^o(1) in n^o(1) / λ time. Our algorithm improves upon prior work substantially in terms of both approximation and run-time: (i) we provide the first sub-polynomial approximation for LIS in sub-linear time; and (ii) our run-time complexity essentially matches the trivial sample complexity lower bound of Ω(1/λ), which is required to obtain any non-trivial approximation of the LIS. As part of our solution, we develop two novel ideas which may be of independent interest: First, we define a new Genuine-LIS problem, where each sequence element may either be genuine or corrupted. In this model, the user receives unrestricted access to actual sequence, but does not know apriori which elements are genuine. The goal is to estimate the LIS using genuine elements only, with the minimal number of "genuiness tests". The second idea, Precision Forest, enables accurate estimations for composition of general functions from "coarse" (sub-)estimates. Precision Forest essentially generalizes classical precision sampling, which works only for summations. As a central tool, the Precision Forest is initially pre-processed on a set of samples, which thereafter is repeatedly reused by multiple sub-parts of the algorithm, improving their amortized complexity.


page 1

page 2

page 3

page 4


Minimal Roman Dominating Functions: Extensions and Enumeration

Roman domination is one of the many variants of domination that keeps mo...

MCMC for Hierarchical Semi-Markov Conditional Random Fields

Deep architecture such as hierarchical semi-Markov models is an importan...

Approximating the Longest Common Subsequence problem within a sub-polynomial factor in linear time

The Longest Common Subsequence (LCS) of two strings is a fundamental str...

Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing

We develop two methods for the following fundamental statistical task: g...

Longest Increasing Subsequence under Persistent Comparison Errors

We study the problem of computing a longest increasing subsequence in a ...

A Fast Algorithm for Adaptive Private Mean Estimation

We design an (ε, δ)-differentially private algorithm to estimate the mea...

A Theory of Selective Prediction

We consider a model of selective prediction, where the prediction algori...

Please sign up or login with your details

Forgot password? Click here to reset