This repository contains the implementation for the paper "Testing Causal Models with Hidden Variables in Polynomial Delay via Conditional Independencies" by Hyunchai Jeong*, Adiba Ejaz*, Jin Tian, and Elias Bareinboim. More specifically, the algorithms for listing all conditional independence relations (CIs) implied by a causal model. These authors* contributed equally.
Please run the following commands to install the package:
git clone https://github.com/CausalAILab/ListConditionalIndependencies.git
cd ListConditionalIndependencies
pip3 install -r requirements.txt
pip3 install -e .
To run some examples, try running the following command:
python3 main.py clmp graphs/paper/fig4b.txt
- First argument:
gmp,lmp, orclmp. Specifies a listing algorithm that outputs all CIs invoked by the following property.
gmp: The global Markov property (runs the algorithm ListGMP)lmp: The ordered local Markov property (runs the algorithm ListCIBF)clmp: The c-component local Markov property (runs the algorithm ListCI)
- Second argument:
graphs/paper/fig4b.txt. The path to a text file that contains graph information, such as nodes and edges of a graph. Please check the formatting for details.
Consider graphs/paper/fig4b.txt as an example.
<NODES>
Each line represents the name of a variable/node (e.g., A and B).
<EDGES>
Each line describes an edge. Two types of edges are supported:
- Directed edge:
A -> Brepresents a directed edge fromAtoB. - Bidirected edge:
A -- Fmeans there exists some unmeasured confounder (i.e., latent variable) betweenAandF.
Try running the following command:
python3 test_model.py graphs/sachs/graph_gt.txt datasets/sachs/dataset.csv
The example tests a ground-truth graph provided by experts (11 nodes and 17 edges) against a real-world protein signaling dataset (853 samples) by (Sachs et al. 2005). The details are provided in Appendix F.2 in the paper.
- First argument:
graphs/sachs/graph_gt.txt. The path to a graph file. - Second argument:
datasets/sachs/dataset.csv. The path to a dataset file.
We consider experiments shown in Appendix F.1 and F.3.
-
Appendix F.1: Comparison of ListCI and two other algorithms - ListGMP and ListCIBF.
- Running bnlearn instances (with varying projection levels) over three algorithms. The original bnlearn instances are available here.
Try running the following command:
python3 run_bnlearn_graph.py clmp graphs/bnlearn/sm/asia.txtWhen the experiment finishes, a report file named
bnlearn_report_asia_clmp.csvwill be generated. Please check the heading Parameters of a report file for more details.- First argument:
gmp,lmp, orclmp. An algorithm to run. - Second argument:
graphs/bnlearn/sm/asia.txt. The path to a bnlearn graph file.
All bnlearn graphs are placed in the directory
graphs/bnlearn. There are three types of graphs, classified based onn: number of nodes.-
Small graphs,
n: [1,20)Placed in
graphs/bnlearn/smsubdirectory. -
Medium graphs,
n: [21,50)Placed in
graphs/bnlearn/mdsubdirectory. -
Large graphs,
n: [51,100)Placed in
graphs/bnlearn/lgsubdirectory.
Measured parameters for all three algorithms.
n: number of nodes.m: number of edges.md: number of directed edges.mu: number of bidirected edges.# CI: number of CIs invoked by a Markov property.runtime:running time of an algorithm in seconds (rounded to the nearest second).
Two additional paramters are measured for ListCIBF.
# S: number of ancestral sets.# S+: number of maximal ancestral sets (MASs).
One additional paramter is measured for both ListCIBF and ListCI.
s: size of the largest c-component.
The following parameter may be modified in
run_bnlearn_graph.pyto change experiment settings.numBatches: number of batches to run for an experiment. For each single batch, 10 random graphs are generated in total: one graph for each projection levelU(forU = 0, 10, ..., 90). Default is set to10, totalling 100 sample graphs. Located at line 355 ofrun_bnlearn_graph.py.fixOrdering: If set toTrue, a topological order over variables will be fixed (across all projections) for more consistent results. Default isTrue.randomSeed: A random seed to be provided torandom.random(). Default isNone.
-
Appendix F.3: Analysis of C-LMP
- Running experiments over random graphs to understand the total number of valid CIs invoked by C-LMP.
Try running the following command:
python3 run_case.py 1aA report file named
report_case_1a.csvwill be generated.- First argument: Any one string from the list
['1a', '1b', '1v', '1r', '2a', '2b', '3a'].
- Case 1
1a: Experiments corresponding to plots shown in Fig. F.3.1.1v,1r: Fig. F.3.2a and Fig. F.3.2b, respectively.1b: Fig. F.3.3.
- Case 2
2a: Fig. F.3.4.2b: Fig. F.3.5.
- Case 3
3a: Fig. F.3.6.
The following parameters may be modified to change experiment settings. The mentioned variables are defined below the line:
if __name__ == '__main__':.n: number of nodes.numBatches: number of batches to run for an experiment. For each single batch, 10 random graphs are generated. Default is set to10, totalling 100 sample graphs.randomSeed: A random seed to be provided torandom.random(). Default isNone.
- Jeong, H., Ejaz, A., Tian, J., and Bareinboim, E. 2024. Testing Causal Models with Hidden Variables in Polynomial Delay via Conditional Independencies. Causal Artificial Intelligence Lab, Columbia University.
- Scutari, M. 2010. Learning Bayesian Networks with the bnlearn R Package. Journal of Statistical Software, 35(3): 1–22.
- Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D. A., and Nolan, G. P. 2005. Causal protein-signaling networks derived from multiparameter single-cell data. Science, 308(5721): 523–529.