DSLOB: A Synthetic Limit Order Book Dataset for Benchmarking Forecasting Algorithms under Distributional Shift

by   Defu Cao, et al.
JPMorgan Chase & Co.
University of Southern California

In electronic trading markets, limit order books (LOBs) provide information about pending buy/sell orders at various price levels for a given security. Recently, there has been a growing interest in using LOB data for resolving downstream machine learning tasks (e.g., forecasting). However, dealing with out-of-distribution (OOD) LOB data is challenging since distributional shifts are unlabeled in current publicly available LOB datasets. Therefore, it is critical to build a synthetic LOB dataset with labeled OOD samples serving as a testbed for developing models that generalize well to unseen scenarios. In this work, we utilize a multi-agent market simulator to build a synthetic LOB dataset, named DSLOB, with and without market stress scenarios, which allows for the design of controlled distributional shift benchmarking. Using the proposed synthetic dataset, we provide a holistic analysis on the forecasting performance of three different state-of-the-art forecasting methods. Our results reflect the need for increased researcher efforts to develop algorithms with robustness to distributional shifts in high-frequency time series data.


Benchmark Dataset for Mid-Price Prediction of Limit Order Book data

Presently, managing prediction of metrics in high frequency financial ma...

Biquality Learning: a Framework to Design Algorithms Dealing with Closed-Set Distribution Shifts

Training machine learning models from data with weak supervision and dat...

Machine Learning for Forecasting Mid Price Movement using Limit Order Book Data

Forecasting the movements of stock prices is one the most challenging pr...

Deep Learning for Market by Order Data

Market by order (MBO) data - a detailed feed of individual trade instruc...

Temporal Attention augmented Bilinear Network for Financial Time-Series Data Analysis

Financial time-series forecasting has long been a challenging problem be...

Shifts 2.0: Extending The Dataset of Real Distributional Shifts

Distributional shift, or the mismatch between training and deployment da...

Please sign up or login with your details

Forgot password? Click here to reset