Dependence versus Conditional Dependence in Local Causal Discovery from Gene Expression Data

by   Eric V. Strobl, et al.

Motivation: Algorithms that discover variables which are causally related to a target may inform the design of experiments. With observational gene expression data, many methods discover causal variables by measuring each variable's degree of statistical dependence with the target using dependence measures (DMs). However, other methods measure each variable's ability to explain the statistical dependence between the target and the remaining variables in the data using conditional dependence measures (CDMs), since this strategy is guaranteed to find the target's direct causes, direct effects, and direct causes of the direct effects in the infinite sample limit. In this paper, we design a new algorithm in order to systematically compare the relative abilities of DMs and CDMs in discovering causal variables from gene expression data. Results: The proposed algorithm using a CDM is sample efficient, since it consistently outperforms other state-of-the-art local causal discovery algorithms when samples sizes are small. However, the proposed algorithm using a CDM outperforms the proposed algorithm using a DM only when sample sizes are above several hundred. These results suggest that accurate causal discovery from gene expression data using current CDM-based algorithms requires datasets with at least several hundred samples. Availability: The proposed algorithm is freely available at


page 1

page 2

page 3

page 4


Efficient Local Causal Discovery Based on Markov Blanket

We study the problem of local causal discovery learning which identifies...

Towards Efficient Local Causal Structure Learning

Local causal structure learning aims to discover and distinguish direct ...

Causal Structural Learning on MPHIA Individual Dataset

The Population-based HIV Impact Assessment (PHIA) is an ongoing project ...

Structural restrictions in local causal discovery: identifying direct causes of a target variable

We consider the problem of learning a set of direct causes of a target v...

Boosting Local Causal Discovery in High-Dimensional Expression Data

We study how well Local Causal Discovery (LCD), a simple and efficient c...

Predicting phenotypes from microarrays using amplified, initially marginal, eigenvector regression

Motivation: The discovery of relationships between gene expression measu...

Comparison between instrumental variable and mediation-based methods for reconstructing causal gene networks in yeast

Causal gene networks model the flow of information within a cell, but re...

Please sign up or login with your details

Forgot password? Click here to reset