Modelling heterogeneity in the classification process in multi-species distribution models can improve predictive performance

by   Kwaku Peprah Adjei, et al.

1. Species distribution models and maps from large-scale biodiversity data are necessary for conservation management. One current issue is that biodiversity data are prone to taxonomic misclassifications. Methods to account for these misclassifications in multispecies distribution models have assumed that the classification probabilities are constant throughout the study. In reality, classification probabilities are likely to vary with several covariates. Failure to account for such heterogeneity can lead to bias in parameter estimates. 2. Here we present a general multispecies distribution model that accounts for heterogeneity in the classification process. The proposed model assumes a multinomial generalised linear model for the classification confusion matrix. We compare the performance of the heterogeneous classification model to that of the homogeneous classification model by assessing how well they estimate the parameters in the model and their predictive performance on hold-out samples. We applied the model to gull data from Norway, Denmark and Finland, obtained from GBIF. 3. Our simulation study showed that accounting for heterogeneity in the classification process increased precision by 30 model framework to the gull dataset did not improve the predictive performance between the homogeneous and heterogeneous models due to the smaller misclassified sample sizes. However, when machine learning predictive scores are used as weights to inform the species distribution models about the classification process, the precision increases by 70 multiple multinomial regression to be used to model the variation in the classification process when the data contains relatively larger misclassified samples. Machine prediction scores should be used when the data contains relatively smaller misclassified samples.


Accounting for Misclassification in Multispecies Distribution Models

1. Species identification errors may have severe implications for the in...

Joint species distribution models with imperfect detection for high-dimensional spatial data

Determining spatial distributions of species and communities are key obj...

Predictive Heterogeneity: Measures and Applications

As an intrinsic and fundamental property of big data, data heterogeneity...

Grabit: Gradient Tree Boosted Tobit Models for Default Prediction

We introduce a novel model which is obtained by applying gradient tree b...

Two-stage approaches to the analysis of occupancy data I: The homogeneous case

Occupancy models are used in statistical ecology to estimate species dis...

Gaussian Process Regression with Local Explanation

Gaussian process regression (GPR) is a fundamental model used in machine...

eDNAPlus: A unifying modelling framework for DNA-based biodiversity monitoring

DNA-based biodiversity surveys involve collecting physical samples from ...

Please sign up or login with your details

Forgot password? Click here to reset