Scalable Data Series Subsequence Matching with ULISSE

09/22/2020
by   Michele Linardi, et al.
0

Data series similarity search is an important operation and at the core of several analysis tasks and applications related to data series collections. Despite the fact that data series indexes enable fast similarity search, all existing indexes can only answer queries of a single length (fixed at index construction time), which is a severe limitation. In this work, we propose ULISSE, the first data series index structure designed for answering similarity search queries of variable length (within some range). Our contribution is two-fold. First, we introduce a novel representation technique, which effectively and succinctly summarizes multiple sequences of different length. Based on the proposed index, we describe efficient algorithms for approximate and exact similarity search, combining disk based index visits and in-memory sequential scans. Our approach supports non Z-normalized and Z-normalized sequences, and can be used with no changes with both Euclidean Distance and Dynamic Time Warping, for answering both k-NN and epsilon-range queries. We experimentally evaluate our approach using several synthetic and real datasets. The results show that ULISSE is several times, and up to orders of magnitude more efficient in terms of both space and time cost, when compared to competing approaches. (Paper published in VLDBJ 2020)

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2021

Fast Data Series Indexing for In-Memory Data

Data series similarity search is a core operation for several data serie...
research
12/26/2022

ProS: Data Series Progressive k-NN Similarity Search and Classification with Probabilistic Quality Guarantees

Existing systems dealing with the increasing volume of data series canno...
research
09/02/2020

MESSI: In-Memory Data Series Indexing

Data series similarity search is a core operation for several data serie...
research
04/17/2023

Dumpy: A Compact and Adaptive Index for Large Data Series Collections

Data series indexes are necessary for managing and analyzing the increas...
research
01/26/2023

Odyssey: A Journey in the Land of Distributed Data Series Similarity Search

This paper presents Odyssey, a novel distributed data-series processing ...
research
09/22/2020

Effective and Efficient Variable-Length Data Series Analytics

In the last twenty years, data series similarity search has emerged as a...
research
04/19/2021

Local Similarity Search on Geolocated Time Series Using Hybrid Indexing

Geolocated time series, i.e., time series associated with certain locati...

Please sign up or login with your details

Forgot password? Click here to reset