Predicting milk traits from spectral data using Bayesian probabilistic partial least squares regression

by   Szymon Urbas, et al.

High-dimensional spectral data – routinely generated in dairy production – are used to predict a range of traits in milk products. Partial least squares regression (PLSR) is ubiquitously used for these prediction tasks. However PLSR is not typically viewed as arising from statistical inference of a probabilistic model, and parameter uncertainty is rarely quantified. Additionally, PLSR does not easily lend itself to model-based modifications, coherent prediction intervals are not readily available, and the process of choosing the latent-space dimension, 𝚀, can be subjective and sensitive to data size. We introduce a Bayesian latent-variable model, emulating the desirable properties of PLSR while accounting for parameter uncertainty. The need to choose 𝚀 is eschewed through a nonparametric shrinkage prior. The flexibility of the proposed Bayesian partial least squares regression (BPLSR) framework is exemplified by considering sparsity modifications and allowing for multivariate response prediction. The BPLSR framework is used in two motivating settings: 1) trait prediction from mid-infrared spectral analyses of milk samples, and 2) milk pH prediction from surface-enhanced Raman spectral data. The prediction performance of BPLSR at least matches that of PLSR. Additionally, the provision of correctly calibrated prediction intervals objectively provides richer, more informative inference for stakeholders in dairy production.


Valid and efficient imprecise-probabilistic inference with partial priors, II. General framework

Bayesian inference requires specification of a single, precise prior dis...

Accurate Uncertainties for Deep Learning Using Calibrated Regression

Methods for reasoning under uncertainty are a key building block of accu...

Bayesian Approaches to Shrinkage and Sparse Estimation

In all areas of human knowledge, datasets are increasing in both size an...

Correcting misclassification errors in crowdsourced ecological data: A Bayesian perspective

Many research domains use data elicited from "citizen scientists" when a...

Latent Variable Method Demonstrator – Software for Understanding Multivariate Data Analytics Algorithms

The ever-increasing quantity of multivariate process data is driving a n...

Parsimonious Bayesian Factor Analysis for modelling latent structures in spectroscopy data

In recent years animal diet has been receiving increased attention, in p...

Please sign up or login with your details

Forgot password? Click here to reset