Three approaches to supervised learning for compositional data with pairwise logratios

11/17/2021
by   Germa Coenders, et al.
0

The common approach to compositional data analysis is to transform the data by means of logratios. Logratios between pairs of compositional parts (pairwise logratios) are the easiest to interpret in many research problems. When the number of parts is large, some form of logratio selection is a must, for instance by means of an unsupervised learning method based on a stepwise selection of the pairwise logratios that explain the largest percentage of the logratio variance in the compositional dataset. In this article we present three alternative stepwise supervised learning methods to select the pairwise logratios that best explain a dependent variable in a generalized linear model, each geared for a specific problem. The first method features unrestricted search, where any pairwise logratio can be selected. This method has a complex interpretation if some pairs of parts in the logratios overlap, but it leads to the most accurate predictions. The second method restricts parts to occur only once, which makes the corresponding logratios intuitively interpretable. The third method uses additive logratios, so that K-1 selected logratios involve exactly K parts. This method in fact searches for the subcomposition with the highest explanatory power. Once the subcomposition is identified, the researcher's favourite logratio representation may be used in subsequent analyses, not only pairwise logratios. Our methodology allows logratios or non-compositional covariates to be forced into the models based on theoretical knowledge, and various stopping criteria are available based on information measures or statistical significance with the Bonferroni correction. We present an illustration of the three approaches on a dataset from a study predicting Crohn's disease. The first method excels in terms of predictive power, and the other two in interpretability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2019

Generative Hierarchical Models for Parts, Objects, and Scenes

Compositional structures between parts and objects are inherent in natur...
research
05/02/2022

Reproducing Kernels and New Approaches in Compositional Data Analysis

Compositional data, such as human gut microbiomes, consist of non-negati...
research
04/16/2020

A Transformation-free Linear Regression for Compositional Outcomes and Predictors

Compositional data are common in many fields, both as outcomes and predi...
research
10/14/2020

Estimations of means and variances in a Markov linear model

Multivariate regression models and ANOVA are probably the most frequentl...
research
06/12/2018

Evaluation of Unsupervised Compositional Representations

We evaluated various compositional models, from bag-of-words representat...
research
05/15/2022

Supervised Learning and Model Analysis with Compositional Data

The compositionality and sparsity of high-throughput sequencing data pos...
research
07/07/2023

GeoCoDA: Recognizing and Validating Structural Processes in Geochemical Data. A Workflow on Compositional Data Analysis in Lithogeochemistry

Geochemical data are compositional in nature and are subject to the prob...

Please sign up or login with your details

Forgot password? Click here to reset