Dynamic Feature Engineering and model selection methods for temporal tabular datasets with regime changes

12/30/2022
by   Thomas Wong, et al.
0

The application of deep learning algorithms to temporal panel datasets is difficult due to heavy non-stationarities which can lead to over-fitted models that under-perform under regime changes. In this work we propose a new machine learning pipeline for ranking predictions on temporal panel datasets which is robust under regime changes of data. Different machine-learning models, including Gradient Boosting Decision Trees (GBDTs) and Neural Networks with and without simple feature engineering are evaluated in the pipeline with different settings. We find that GBDT models with dropout display high performance, robustness and generalisability with relatively low complexity and reduced computational cost. We then show that online learning techniques can be used in post-prediction processing to enhance the results. In particular, dynamic feature neutralisation, an efficient procedure that requires no retraining of models and can be applied post-prediction to any machine learning model, improves robustness by reducing drawdown in regime changes. Furthermore, we demonstrate that the creation of model ensembles through dynamic model selection based on recent model performance leads to improved performance over baseline by improving the Sharpe and Calmar ratios of out-of-sample prediction performances. We also evaluate the robustness of our pipeline across different data splits and random seeds with good reproducibility of results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/02/2019

Proving Data-Poisoning Robustness in Decision Trees

Machine learning models are brittle, and small changes in the training d...
research
05/01/2020

DriveML: An R Package for Driverless Machine Learning

In recent years, the concept of automated machine learning has become ve...
research
07/23/2021

Model Selection for Offline Reinforcement Learning: Practical Considerations for Healthcare Settings

Reinforcement learning (RL) can be used to learn treatment policies and ...
research
02/01/2019

The Spatially-Conscious Machine Learning Model

Successfully predicting gentrification could have many social and commer...
research
03/14/2023

Understanding Model Complexity for temporal tabular and multi-variate time series, case study with Numerai data science tournament

In this paper, we explore the use of different feature engineering and d...
research
10/20/2020

Model-specific Data Subsampling with Influence Functions

Model selection requires repeatedly evaluating models on a given dataset...
research
06/27/2023

Non-parametric online market regime detection and regime clustering for multidimensional and path-dependent data structures

In this work we present a non-parametric online market regime detection ...

Please sign up or login with your details

Forgot password? Click here to reset