Conditionally Invariant Representation Learning for Disentangling Cellular Heterogeneity

07/02/2023
by   Hananeh Aliee, et al.
0

This paper presents a novel approach that leverages domain variability to learn representations that are conditionally invariant to unwanted variability or distractors. Our approach identifies both spurious and invariant latent features necessary for achieving accurate reconstruction by placing distinct conditional priors on latent features. The invariant signals are disentangled from noise by enforcing independence which facilitates the construction of an interpretable model with a causal semantic. By exploiting the interplay between data domains and labels, our method simultaneously identifies invariant features and builds invariant predictors. We apply our method to grand biological challenges, such as data integration in single-cell genomics with the aim of capturing biological variations across datasets with many samples, obtained from different conditions or multiple laboratories. Our approach allows for the incorporation of specific biological mechanisms, including gene programs, disease states, or treatment conditions into the data integration process, bridging the gap between the theoretical assumptions and real biological applications. Specifically, the proposed approach helps to disentangle biological signals from data biases that are unrelated to the target task or the causal explanation of interest. Through extensive benchmarking using large-scale human hematopoiesis and human lung cancer data, we validate the superiority of our approach over existing methods and demonstrate that it can empower deeper insights into cellular heterogeneity and the identification of disease cell states.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/11/2022

ZIN: When and How to Learn Invariance by Environment Inference?

It is commonplace to encounter heterogeneous data, of which some aspects...
research
11/07/2022

Learning Causal Representations of Single Cells via Sparse Mechanism Shift Modeling

Latent variable models such as the Variational Auto-Encoder (VAE) have b...
research
01/25/2019

Finding Archetypal Spaces for Data Using Neural Networks

Archetypal analysis is a type of factor analysis where data is fit by a ...
research
11/28/2022

Regression-based heterogeneity analysis to identify overlapping subgroup structure in high-dimensional data

Heterogeneity is a hallmark of complex diseases. Regression-based hetero...
research
03/29/2022

Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets

Pooling multiple neuroimaging datasets across institutions often enables...
research
09/14/2022

Modelling Technical and Biological Effects in scRNA-seq data with Scalable GPLVMs

Single-cell RNA-seq datasets are growing in size and complexity, enablin...
research
11/14/2021

Invariant Risk Minimisation for Cross-Organism Inference: Substituting Mouse Data for Human Data in Human Risk Factor Discovery

Human medical data can be challenging to obtain due to data privacy conc...

Please sign up or login with your details

Forgot password? Click here to reset