[CVPR 2025] CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos

TL;DR: CityWalker leverages thousands of hours of online city walking and driving videos to train autonomous agents for robust, generalizable navigation in dynamic urban environments through scalable, data-driven imitation learning.

Xinhao Liu*, Jintong Li*, Yicheng Jiang, Niranjan Sujay, Zhicheng Ynag, Juexiao Zhang, John Abanes, Jing Zhang, Chen Feng†

Checkout a mosaic demo of our dataset:

mosaic_h264.mp4

Getting Started

Installation

The project should be compatible with latest Pytorch and CUDA versions. The code is tested with Python 3.11, PyTorch 2.5.0, and CUDA 12.1. To install the dependencies, run:

conda env create -f environment.yml
conda activate citywalker

Data Preparation

Please see dataset/README.md for details on how to prepare the dataset.

Training

To train the model, run:

python train.py --config config/citywalk_2000hr.yaml

We provide our pretrained model in the releases tab.

Fine-tuning

To fine-tune the model, run:

python fine_tune.py --config config/finetune.yaml --checkpoint <path_to_checkpoint>

Testing

To test the model, run:

python test.py --config config/finetune.yaml --checkpoint <path_to_checkpoint>

Citation

@inproceedings{liu2025citywalker,
  title={Citywalker: Learning embodied urban navigation from web-scale videos},
  author={Liu, Xinhao and Li, Jintong and Jiang, Yicheng and Sujay, Niranjan and Yang, Zhicheng and Zhang, Juexiao and Abanes, John and Zhang, Jing and Feng, Chen},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={6875--6885},
  year={2025}
}

Acknowledgements

The work was supported by NSF grants 2238968, 2121391, 2322242 and 2345139; and in part through the NYU IT High Performance Computing resources, services, and staff expertise. We thank Xingyu Liu and Zixuan Hu for their help in data collection.

We also thank the authors of the following repositories for their open-source implementations:

ViNT: A Foundation Model for Visual Navigation, CoRL 2023
NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration, ICRA 2024

Relevant work

Here is a list for highly relevant work for interested readers:

MetaUrban: An Embodied AI Simulation Platform for Urban Micromobility, ICLR 2025
NaVILA: Legged Robot Vision-Language-Action Model for Navigation, RSS 2025
Learning to Drive Anyware with Model-Based Reannotation, arXiv 2025
From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning, arXiv 2025

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
config		config
data		data
dataset		dataset
model		model
pl_modules		pl_modules
src		src
thirdparty		thirdparty
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
fine_tune.py		fine_tune.py
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[CVPR 2025] CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos

TL;DR: CityWalker leverages thousands of hours of online city walking and driving videos to train autonomous agents for robust, generalizable navigation in dynamic urban environments through scalable, data-driven imitation learning.

Getting Started

Installation

Data Preparation

Training

Fine-tuning

Testing

Citation

Acknowledgements

Relevant work

About

Uh oh!

Releases 1

Contributors 2

Languages

License

ai4ce/CityWalker

Folders and files

Latest commit

History

Repository files navigation

[CVPR 2025] CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos

TL;DR: CityWalker leverages thousands of hours of online city walking and driving videos to train autonomous agents for robust, generalizable navigation in dynamic urban environments through scalable, data-driven imitation learning.

Getting Started

Installation

Data Preparation

Training

Fine-tuning

Testing

Citation

Acknowledgements

Relevant work

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors 2

Languages