| Category | Sub-category | Variable | Dim. | Type | Range | Description |
|---|---|---|---|---|---|---|
| Global | Scene | scene |
(1,) | D | 6 types | Scene name/identifier |
| Global | Scene | gravity |
(1,) | C | - | Acceleration of gravity |
| Global | Object | render_asset |
(1,) | D | 90 types | Specifies visual appearance |
| Dynamic | Object | position |
(T,3) | C | - | 3D coordinates across time |
| Dynamic | Object | rotation |
(T,3) | C | - | Euler angles across time |
| CARVIEW |
TL;DR
CausalVerse is the first comprehensive benchmark for causal representation learning with controllable high-fidelity simulations. It allows users to inspect, modify, and configure causal graphs to match various CRL assumptions and tasks, and provides empirical insights to guide researchers in selecting or improving CRL frameworks for real-world causal reasoning.
Dataset Overview
Static Image Generation
- Human in Retail Store (11.2k)
- Indoor environments
- Varying poses & appearances
- Multiple lighting conditions
Physical Simulation
- Cylinder Spring (40k images)
- Simple Collision (20k videos)
- Projectile motion
- Object interactions
Robotic Manipulation
- Robot in Kitchen (2.7k videos)
- Multi-view capture
- Object-centric tasks
- Embodied agents
Traffic Analysis
- Traffic in Town01 (1.97k videos)
- Multi-agent interactions
- Urban environments
- Variable conditions
Key Features: 3-129 latent variables per scene | 1024×1024 and 800×600 resolutions | 3-32 second video durations | Multi-camera viewpoints
Ground Truth Access
Complete access to causal variables, structures, and generation processes with high-fidelity visual data
Diverse Scenarios
From static to dynamic, single to multi-agent, covering physical simulations, robotics, and traffic
Configurable Settings
Flexible control over causal assumptions, domain labels, temporal dependencies, and interventions
Rigorous Evaluation
Test CRL methods under both satisfied and unmet assumptions with standardized metrics
Configuration Example
Each scene provides detailed ground-truth variables. Below is an example of the available metadata structure, which is consistent across the dataset.
Data Showcase
Sample videos from different domains in CausalVerse, showcasing the variety of scenes and viewpoints available.
Static Image Generation
Scene: Scene1-4
Scene1
Scene2
Scene3
Scene4
Physical Simulation (Image)
Scene: Fall, Refraction, Slope, Spring
Fall
Refraction
Slope
Spring
Physical Simulation (Video)
Scene: Projectile_Hard
Birdview
Frontview
Leftview
Rightview
Robotic Manipulation
Scene: Kitchen
Agentview
Birdview
Frontview
Eyeview
Sideview
Traffic Situation Analysis
Scenes: Town1 & Town2
Town1
Town2
Evaluation
Evaluation on Mean Correlation Coefficient (MCC) and Coefficient of Determination (R²) for both image and video data.
Citation
If you find our work useful, please consider citing our paper:
@inproceedings{chen2025causalverse,
title = {CausalVerse: Benchmarking Causal Representation Learning with Configurable High-Fidelity Simulations},
author = {Chen, Guangyi and Deng, Yunlong and Zhu, Peiyuan and Li, Yan and Shen, Yifan and Li, Zijian and Zhang, Kun},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025}
}