Survival regression with accelerated failure time model in XGBoost
Survival regression is used to estimate the relation between time-to-event and feature variables, and is important in application domains such as medicine, marketing, risk management and sales management. Nonlinear tree based machine learning algorithms as implemented in libraries such as XGBoost, scikit-learn, LightGBM, and CatBoost are often more accurate in practice than linear models. However, existing implementations of tree-based models have offered limited support for survival regression. In this work, we propose and implement loss functions for learning accelerated failure time (AFT) models in XGBoost, to increase the support for survival modeling for different kinds of label censoring. The AFT model assumes effects that directly accelerate or decelerate the survival time for different kinds of censored data sets. We demonstrate with real and simulated experiments the effectiveness of AFT in XGBoost with respect to a number of baselines, in two respects: generalization performance and training speed. Furthermore, we take advantage of the support for NVIDIA GPUs in XGBoost to achieve substantial speedup over multi-coreCPUs. To our knowledge, our work is the first implementation of AFT that utilizes the processing power of NVIDIA GPUs.
READ FULL TEXT