Locality-Sensitive Hashing for Earthquake Detection: A Case Study Scaling Data-Driven Science

by   Kexin Rong, et al.

In this work, we report on a novel application of Locality Sensitive Hashing (LSH) to seismic data at scale. Based on the high waveform similarity between reoccurring earthquakes, our application identifies potential earthquakes by searching for similar time series segments via LSH. However, a straightforward implementation of this LSH-enabled application has difficulty scaling beyond 3 months of continuous time series data measured at a single seismic station. As a case study of a data-driven science workflow, we illustrate how domain knowledge can be incorporated into the workload to improve both the efficiency and result quality. We describe several end-to-end optimizations of the analysis pipeline from pre-processing to post-processing, which allow the application to scale to time series data measured at multiple seismic stations. Our optimizations enable an over 100x speed up in the end-to-end analysis pipeline. This improved scalability enabled seismologists to perform seismic analysis on more than ten years of continuous time series data from over ten seismic stations, and has directly enabled the discovery of 597 new earthquakes near the Diablo Canyon nuclear power plant in California and 6123 new earthquakes in New Zealand.


Locality-Sensitive Hashing for Earthquake Detection: A Case Study of Scaling Data-Driven Science

In this work, we report on a novel application of Locality Sensitive Has...

Modeling Atmospheric Data and Identifying Dynamics: Temporal Data-Driven Modeling of Air Pollutants

Atmospheric modelling has recently experienced a surge with the advent o...

PSEUDo: Interactive Pattern Search in Multivariate Time Series with Locality-Sensitive Hashing and Relevance Feedback

We present PSEUDo, an adaptive feature learning technique for exploring ...

Real-time regression analysis with deep convolutional neural networks

We discuss the development of novel deep learning algorithms to enable r...

New statistical model for misreported data with application to current public health challenges

The main goal of this work is to present a new model able to deal with p...

Attention Augmented Convolutional Transformer for Tabular Time-series

Time-series classification is one of the most frequently performed tasks...

WordStream Maker: A Lightweight End-to-end Visualization Platform for Qualitative Time-series Data

Whether it is in the form of transcribed conversations, blog posts, or t...

Please sign up or login with your details

Forgot password? Click here to reset