A Framework for Inferring Causality from Multi-Relational Observational Data using Conditional Independence
The study of causality or causal inference - how much a given treatment causally affects a given outcome in a population - goes way beyond correlation or association analysis of variables, and is critical in making sound data driven decisions and policies in a multitude of applications. The gold standard in causal inference is performing "controlled experiments", which often is not possible due to logistical or ethical reasons. As an alternative, inferring causality on "observational data" based on the "Neyman-Rubin potential outcome model" has been extensively used in statistics, economics, and social sciences over several decades. In this paper, we present a formal framework for sound causal analysis on observational datasets that are given as multiple relations and where the population under study is obtained by joining these base relations. We study a crucial condition for inferring causality from observational data, called the "strong ignorability assumption" (the treatment and outcome variables should be independent in the joined relation given the observed covariates), using known conditional independences that hold in the base relations. We also discuss how the structure of the conditional independences in base relations given as graphical models help infer new conditional independences in the joined relation. The proposed framework combines concepts from databases, statistics, and graphical models, and aims to initiate new research directions spanning these fields to facilitate powerful data-driven decisions in today's big data world.
READ FULL TEXT