Harmonization with Flow-based Causal Inference
Heterogeneity in medical data, e.g., from data collected at different sites and with different protocols in a clinical study, is a fundamental hurdle for accurate prediction using machine learning models, as such models often fail to generalize well. This paper presents a normalizing-flow-based method to perform counterfactual inference upon a structural causal model (SCM) to harmonize such data. We formulate a causal model for observed effects (brain magnetic resonance imaging data) that result from known confounders (site, gender and age) and exogenous noise variables. Our method exploits the bijection induced by flow for harmonization. We can infer the posterior of exogenous variables, intervene on observations, and draw samples from the resultant SCM to obtain counterfactuals. We evaluate on multiple, large, real-world medical datasets to observe that this method leads to better cross-domain generalization compared to state-of-the-art algorithms. Further experiments that evaluate the quality of confounder-independent data generated by our model using regression and classification tasks are provided.
READ FULL TEXT