You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
LLCP: Learning Latent Causal Processes for Reasoning-based Video Question Answer
This repository contains the implementation for ICLR2024 paper LLCP: Learning Latent Causal Processes for Reasoning-based Video Question Answerpdf
LLCP is a causal framework designed to enhance video reasoning by focusing on the spatial-temporal dynamics of objects within events, without the need for extensive data annotations. By employing self-supervised learning and leveraging the modularity of causal mechanisms, LLCP learns multivariate generative model for spatial-temporal dynamics and thus enables effective accident attribution and counterfactual prediction of Reasoning-based VideoQA.
Environment
First, please install the recent version of Pytorch and Torchvision as pip install torch torchvision. Then, you can install other package by running pip install -r requirements.txt
Download Data
We provide the processed features used in our experiments. Please download the data and model in this link1 and this link2. Then please decompress the floders as ./data/ and ./results/ and replace the original floders as the downloaded ones.
If you find our work useful in your research, please consider citing:
@inproceedings{chen2024llcp,
title={LLCP: Learning Latent Causal Processes for Reasoning-based Video Question Answer},
author={Chen, Guangyi and Li, Yuke and Liu, Xiao and Li, Zijian and Al Surad, Eman and Wei, Donglai and Zhang, Kun}
booktitle={ICLR},
year={2024}
}
Acknowledgement
Our implementation is mainly based on the SUTD-TrafficQA and Tem-Adapter, we thank the authors to release their codes.