A flexible, semi-parametric, cluster-based approach for predicting wildfire extremes across the contiguous United States
This paper details the methodology proposed by the Lancaster Ducks team for the EVA 2021 conference data challenge. This aim of this challenge was to predict the number and size of wildfires over the contiguous US between 1993-2015, with more importance placed on extreme events. Our approach proposes separate methods for modelling the bodies and tails of the distributions of both wildfire variables. For the former, a hierarchical clustering technique is proposed to first group similar locations, with a non-parametric approach subsequently used to model the non-extreme data. To capture tail behaviour, separate techniques derived from univariate extreme value theory are proposed for both variables. For the count data, a generalised Pareto distribution with a generalised additive model structure is used to capture effects from covariates on values above a high threshold. For burnt area, a non-stationary generalised Pareto distribution enables us to capture the tail behaviour of proportions obtained through a transformation of observed area data. The resulting predictions are shown to perform reasonably well, improving on the benchmark method proposed in the challenge outline. We also provide a discussion about the limitations of our modelling framework and evaluate ways in which it could be extended.
READ FULL TEXT