| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Mon, 03 Nov 2025 22:26:50 GMT
access-control-allow-origin: *
etag: W/"69092c2a-15046"
expires: Mon, 29 Dec 2025 09:28:29 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: FA2C:1F53DD:892480:9A104D:69524765
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 09:18:29 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210023-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766999909.091318,VS0,VE217
vary: Accept-Encoding
x-fastly-request-id: 9e032842440355b7f5686ac4ad6963a4279c7bdb
content-length: 14357
Anurag Arnab
Publications
Up-to-date list on Google Scholar
2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
Thesis
[Slides]
Invited talk at Google Visits POSTECH at POSTECH, South Korea. December 2022.
[Slides]
[Slides]
[Slides]
[Slides]
[Slides]
[Slides]
[Slides]
[Slides]
[Slides]
[Slides]
[Slides]
[Video]
[Slides]
I am a Research Scientist at Google DeepMind working primarily on multimodal understanding and generation. I completed my PhD with Philip Torr at the University of Oxford, where I focused on deep structured models for pixel-level scene understanding. Prior to that, I completed my undergraduate degree at the University of Cape Town.
Publications
2024
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Gagan Jain, Nidhi Hegde, Aditya Kusupati, Arsha Nagrani, Shyamal Buch, Prateek Jain, Anurag Arnab, Sujoy Paul
Neural Information Processing Systems (NeurIPS), 2024
@inproceedings{mone_neurips_2024,
title={Mixture of Nested Experts: Adaptive Processing of Visual Tokens,
author={Gagan Jain and Nidhi Hegde and Aditya Kusupati and Arsha Nagrani and Shyamal Buch and Prateek Jain and Anurag Arnab and Sujoy Paul},
booktitle={NeurIPS},
year={2024}
}
title={Mixture of Nested Experts: Adaptive Processing of Visual Tokens,
author={Gagan Jain and Nidhi Hegde and Aditya Kusupati and Arsha Nagrani and Shyamal Buch and Prateek Jain and Anurag Arnab and Sujoy Paul},
booktitle={NeurIPS},
year={2024}
}
Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels
Heeseong Shin, Chaehyun Kim, Sunghwan Hong, Seokju Cho, Anurag Arnab, Paul Hongsuck Seo, Seungryong Kim
Neural Information Processing Systems (NeurIPS), 2024
@inproceedings{shin_neurips_2024,
title={Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels,
author={Heeseong Shin and Chaehyun Kim and Sunghwan Hong and Seokju Cho and Anurag Arnab and Paul Hongsuck Seo and Seungryong Kim},
booktitle={NeurIPS},
year={2024}
}
title={Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels,
author={Heeseong Shin and Chaehyun Kim and Sunghwan Hong and Seokju Cho and Anurag Arnab and Paul Hongsuck Seo and Seungryong Kim},
booktitle={NeurIPS},
year={2024}
}
Streaming Dense Video Captioning
Xingyi Zhou*, Anurag Arnab*, Shyamal Buch, Shen Yan, Austin Myers, Xuehan Xiong, Arsha Nagrani, Cordelia Schmid
Computer Vision and Pattern Recognition (CVPR), 2024
@inproceedings{streaming_dvc_cvpr_2024,
title={Streaming Dense Video Captioning,
author={Xingyi Zhou* and Anurag Arnab* and Shyamal Buch and Shen Yan and Austin Myers and Xuehan Xiong and Arsha Nagrani and Cordelia Schmid},
booktitle={CVPR},
year={2024}
}
title={Streaming Dense Video Captioning,
author={Xingyi Zhou* and Anurag Arnab* and Shyamal Buch and Shen Yan and Austin Myers and Xuehan Xiong and Arsha Nagrani and Cordelia Schmid},
booktitle={CVPR},
year={2024}
}
Time-Memory-and Parameter-Efficient Visual Adaptation
Otniel-Bogdan Mercea, Alexey Gritsenko, Cordelia Schmid, Anurag Arnab
Computer Vision and Pattern Recognition (CVPR), 2024
@inproceedings{losa_cvpr_2024,
title={Time-Memory-and Parameter-Efficient Visual Adaptation,
author={Otniel-Bogdan Mercea and Alexey Gritsenko and Cordelia Schmid and Anurag Arnab},
booktitle={CVPR},
year={2024}
}
title={Time-Memory-and Parameter-Efficient Visual Adaptation,
author={Otniel-Bogdan Mercea and Alexey Gritsenko and Cordelia Schmid and Anurag Arnab},
booktitle={CVPR},
year={2024}
}
Highlight paper
End-to-End Spatio-Temporal Action Localisation with Video Transformers
Alexey A Gritsenko, Xuehan Xiong, Josip Djolonga, Mostafa Dehghani, Chen Sun, Mario Lucic, Cordelia Schmid, Anurag Arnab
Computer Vision and Pattern Recognition (CVPR), 2024
@inproceedings{star_cvpr_2024,
title={End-to-end Spatio-Temporal Action Localisation with Video Transformers,
author={Alexey A Gritsenko and Xuehan Xiong and Josip Djolonga and Mostafa Dehghani and Chen Sun and Mario Lucic and Cordelia Schmid and Anurag Arnab},
booktitle={CVPR},
year={2024}
}
title={End-to-end Spatio-Temporal Action Localisation with Video Transformers,
author={Alexey A Gritsenko and Xuehan Xiong and Josip Djolonga and Mostafa Dehghani and Chen Sun and Mario Lucic and Cordelia Schmid and Anurag Arnab},
booktitle={CVPR},
year={2024}
}
CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
Seokju Cho, Heeseong Shin, Sunghwan Hong, Anurag Arnab, Paul Hongsuck Seo, Seungryong Kim
Computer Vision and Pattern Recognition (CVPR), 2024
@inproceedings{cat-seg_cvpr_2024,
title={CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation,
author={Seokju Cho and Heeseong Shin and Sunghwan Hong and Anurag Arnab and Paul Hongsuck Seo and Seungryong Kim},
booktitle={CVPR},
year={2024}
}
title={CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation,
author={Seokju Cho and Heeseong Shin and Sunghwan Hong and Anurag Arnab and Paul Hongsuck Seo and Seungryong Kim},
booktitle={CVPR},
year={2024}
}
Highlight paper
Pixel Aligned Language Models
Jiarui Xu, Xingyi Zhou, Shen Yan, Xiuye Gu, Anurag Arnab, Chen Sun, Xiaolong Wang, Cordelia Schmid
Computer Vision and Pattern Recognition (CVPR), 2024
@inproceedings{pixel_llm_cvpr_2024,
title={Pixel Aligned Language Models},
author={Jiarui Xu and Xingyi Zhou and Shen Yan and Xiuye Gu and Anurag Arnab and Chen Sun and Xiaolong Wang and Cordelia Schmid},
booktitle={CVPR},
year={2024}
}
title={Pixel Aligned Language Models},
author={Jiarui Xu and Xingyi Zhou and Shen Yan and Xiuye Gu and Anurag Arnab and Chen Sun and Xiaolong Wang and Cordelia Schmid},
booktitle={CVPR},
year={2024}
}
PaLI-X: On Scaling up a Multilingual Vision and Language Model
Google
Computer Vision and Pattern Recognition (CVPR), 2024
@inproceedings{palix_cvpr_2024,
title={Pali-x: On scaling up a multilingual vision and language model},
author={Chen, Xi and Djolonga, Josip and Padlewski, Piotr and Mustafa, Basil and Changpinyo, Soravit and Wu, Jialin and Ruiz, Carlos Riquelme and Goodman, Sebastian and Wang, Xiao and Tay, Yi and others},
booktitle={CVPR},
year={2024}
}
title={Pali-x: On scaling up a multilingual vision and language model},
author={Chen, Xi and Djolonga, Josip and Padlewski, Piotr and Mustafa, Basil and Changpinyo, Soravit and Wu, Jialin and Ruiz, Carlos Riquelme and Goodman, Sebastian and Wang, Xiao and Tay, Yi and others},
booktitle={CVPR},
year={2024}
}
VicTR: Video-conditioned Text Representations for Activity Recognition
Kumara Kahatapitiya, Anurag Arnab, Arsha Nagrani, Michael Ryoo
Computer Vision and Pattern Recognition (CVPR), 2024
@inproceedings{victr_cvpr_2024,
title={VicTR: Video-conditioned Text Representations for Activity Recognition},
author={Kumara Kahatapitiya and Anurag Arnab and Arsha Nagrani and Michael Ryoo},
booktitle={CVPR},
year={2024}
}
title={VicTR: Video-conditioned Text Representations for Activity Recognition},
author={Kumara Kahatapitiya and Anurag Arnab and Arsha Nagrani and Michael Ryoo},
booktitle={CVPR},
year={2024}
}
2023
Audiovisual Masked Autoencoders
Mariana-Iuliana Georgescu, Eduardo Fonseca, Radu Tudor Ionescu, Mario Lucic, Cordelia Schmid, Anurag Arnab
International Conference on Computer Vision (ICCV), 2023
@inproceedings{avmae_iccv_2023,
title={Audiovisual Masked Autoencoders},
author={Mariana-Iuliana Georgescu and Eduardo Fonseca and Radu Tudor Ionescu and Mario Lucic and Cordelia Schmid and Anurag Arnab },
booktitle={ICCV},
year={2023}
}
title={Audiovisual Masked Autoencoders},
author={Mariana-Iuliana Georgescu and Eduardo Fonseca and Radu Tudor Ionescu and Mario Lucic and Cordelia Schmid and Anurag Arnab },
booktitle={ICCV},
year={2023}
}
UnLoc: A Unified Framework for Video Localization Tasks
Shen Yan, Xuehan Xiong, Arsha Nagrani, Anurag Arnab, Zhonghao Wang, Weina Ge, David Ross, Cordelia Schmid
International Conference on Computer Vision (ICCV), 2023
@inproceedings{unloc_iccv_2023,
title={UnLoc: A Unified Framework for Video Localization Tasks},
author={Shen Yan and Xuehan Xiong and Arsha Nagrani and Anurag Arnab and Zhonghao Wang and Weina Ge and David Ross and Cordelia Schmid},
booktitle={ICCV},
year={2023}
}
title={UnLoc: A Unified Framework for Video Localization Tasks},
author={Shen Yan and Xuehan Xiong and Arsha Nagrani and Anurag Arnab and Zhonghao Wang and Weina Ge and David Ross and Cordelia Schmid},
booktitle={ICCV},
year={2023}
}
Does Visual Pretraining Help End-to-End Reasoning?
Chen Sun, Calvin Luo, Xingyi Zhou, Anurag Arnab, Cordelia Schmid
Neural Information Processing Systems (NeurIPS), 2023
@inproceedings{sun_neurips_2023,
title={Does Visual Pretraining Help End-to-End Reasoning?},
author={Chen Sun and Calvin Luo and Xingyi Zhou and Anurag Arnab and Cordelia Schmid},
booktitle={NeurIPS},
year={2023}
}
title={Does Visual Pretraining Help End-to-End Reasoning?},
author={Chen Sun and Calvin Luo and Xingyi Zhou and Anurag Arnab and Cordelia Schmid},
booktitle={NeurIPS},
year={2023}
}
How Can Objects Help Action Recognition?
Xingyi Zhou, Anurag Arnab, Chen Sun, Cordelia Schmid
Computer Vision and Pattern Recognition (CVPR), 2023
@inproceedings{zhou_cvpr_2023,
title={How Can Objects Help Action Recognition?},
author={Zhou, Xingyi and Arnab, Anurag and Sun, Chen and Schmid, Cordelia},
booktitle={CVPR},
year={2023}
}
title={How Can Objects Help Action Recognition?},
author={Zhou, Xingyi and Arnab, Anurag and Sun, Chen and Schmid, Cordelia},
booktitle={CVPR},
year={2023}
}
Token Turing Machines
Michael S. Ryoo, Keerthana Gopalakrishnan, Kumara Kahatapitiya, Ted Xiao, Kanishka Rao, Austin Stone, Yao Lu, Julian Ibarz, Anurag Arnab
Computer Vision and Pattern Recognition (CVPR), 2023
@inproceedings{ryoo_cvpr_2023,
title={Token Turing Machines},
author={Ryoo, Michael S and Gopalakrishnan, Keerthana and Kahatapitiya, Kumara and Xiao, Ted and Rao, Kanishka and Stone, Austin and Lu, Yao and Ibarz, Julian and Arnab, Anurag},
booktitle={CVPR},
year={2023}
}
title={Token Turing Machines},
author={Ryoo, Michael S and Gopalakrishnan, Keerthana and Kahatapitiya, Kumara and Xiao, Ted and Rao, Kanishka and Stone, Austin and Lu, Yao and Ibarz, Julian and Arnab, Anurag},
booktitle={CVPR},
year={2023}
}
Scaling Vision Transformers to 22 Billion Parameters
International Conference on Machine Learning (ICML), 2023
Google Research
@inproceedings{dehghani_icml_2023,
title={Scaling Vision Transformers to 22 Billion Parameters},
author={Dehghani, Mostafa and Djolonga, Josip and Mustafa, Basil and Padlewski, Piotr and Heek, Jonathan and Gilmer, Justin and Steiner, Andreas and Caron, Mathilde and Geirhos, Robert and Alabdulmohsin, Ibrahim and others},
booktitle={ICML},
year={2023}
}
title={Scaling Vision Transformers to 22 Billion Parameters},
author={Dehghani, Mostafa and Djolonga, Josip and Mustafa, Basil and Padlewski, Piotr and Heek, Jonathan and Gilmer, Justin and Steiner, Andreas and Caron, Mathilde and Geirhos, Robert and Alabdulmohsin, Ibrahim and others},
booktitle={ICML},
year={2023}
}
Adaptive Computation with Elastic Input Sequence
Fuzhao Xue, Valerii Likhosherstov, Anurag Arnab, Neil Houlsby, Mostafa Dehghani, Yang You
International Conference on Machine Learning (ICML), 2023
@inproceedings{xue_icml_2023,
title={Adaptive Computation with Elastic Input Sequence},
author={Xue, Fuzhao and Likhosherstov, Valerii and Arnab, Anurag and Houlsby, Neil and Dehghani, Mostafa and You, Yang},
booktitle={ICML},
year={2023}
}
title={Adaptive Computation with Elastic Input Sequence},
author={Xue, Fuzhao and Likhosherstov, Valerii and Arnab, Anurag and Houlsby, Neil and Dehghani, Mostafa and You, Yang},
booktitle={ICML},
year={2023}
}
2022
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
Valerii Likhosherstov*, Anurag Arnab*, Krzysztof Marcin Choromanski, Mario Lucic, Yi Tay, Mostafa Dehghani*
Transactions on Machine Learning Research (TMLR), 2022
@article{likhosherstov_tmlr_2022,
title={PolyViT: Co-training Vision Transformers on Images, Videos and Audio},
author={Likhosherstov, Valerii and Arnab, Anurag and Choromanski, Krzysztof and Lucic, Mario and Tay, Yi and Weller, Adrian and Dehghani, Mostafa}, ,
journal={TMLR},
year={2022}
}
title={PolyViT: Co-training Vision Transformers on Images, Videos and Audio},
author={Likhosherstov, Valerii and Arnab, Anurag and Choromanski, Krzysztof and Lucic, Mario and Tay, Yi and Weller, Adrian and Dehghani, Mostafa}, ,
journal={TMLR},
year={2022}
}
Simple Open-Vocabulary Object Detection with Vision Transformers
Matthias Minderer*, Alexey Gritsenko*, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, Neil Houlsby
European Conference on Computer Vision (ECCV), 2022
@inproceedings{minderer_eccv_2022,
title={Simple Open-Vocabulary Object Detection with Vision Transformers},
author={Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, Neil Houlsby},
booktitle={ECCV},
year={2022}
}
title={Simple Open-Vocabulary Object Detection with Vision Transformers},
author={Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, Neil Houlsby},
booktitle={ECCV},
year={2022}
}
M&M Mix: A Multimodal Multiview Transformer Ensemble
Xuehan Xiong, Anurag Arnab, Arsha Nagrani, Cordelia Schmid
Winner of the Epic Kitchens Action Recognition Challenge at CVPR 2022
@inproceedings{xiong_cvprw_2022,
title={M&M Mix: A Multimodal Multiview Transformer Ensemble},
author={Xiong, Xuehan and Arnab, Anurag and Nagrani, Arsha and Schmid, Cordelia},
booktitle={CVPR Workshop},
year={2022}
}
title={M&M Mix: A Multimodal Multiview Transformer Ensemble},
author={Xiong, Xuehan and Arnab, Anurag and Nagrani, Arsha and Schmid, Cordelia},
booktitle={CVPR Workshop},
year={2022}
}
Multiview Transformers for Video Recognition
Shen Yan, Xuehan Xiong, Anurag Arnab, Zhichao Lu, Mi Zhang, Chen Sun, Cordelia Schmid
Computer Vision and Pattern Recognition (CVPR), 2022
@inproceedings{yan_cvpr_2022,
title={Multiview Transformers for Video Recognition},
author={Yan, Shen and Xiong, Xuehan and Arnab, Anurag and Lu, Zhichao and Zhang, Mi and Sun, Chen and Schmid, Cordelia},
booktitle={CVPR},
year={2022}
}
title={Multiview Transformers for Video Recognition},
author={Yan, Shen and Xiong, Xuehan and Arnab, Anurag and Lu, Zhichao and Zhang, Mi and Sun, Chen and Schmid, Cordelia},
booktitle={CVPR},
year={2022}
}
End-to-end Generative Pretraining for Multimodal Video Captioning
Paul Hongsuck Seo, Arsha Nagrani, Anurag Arnab, Cordelia Schmid
Computer Vision and Pattern Recognition (CVPR), 2022
@inproceedings{seo_cvpr_2022,
title={End-to-end Generative Pretraining for Multimodal Video Captioning},
author={Seo, Paul Hongsuck and Nagrani, Arsha and Arnab, Anurag and Schmid, Cordelia},
booktitle={CVPR},
year={2022}
}
title={End-to-end Generative Pretraining for Multimodal Video Captioning},
author={Seo, Paul Hongsuck and Nagrani, Arsha and Arnab, Anurag and Schmid, Cordelia},
booktitle={CVPR},
year={2022}
}
Learning with Neighbor Consistency for Noisy Labels
Ahmet Iscen, Jack Valmadre, Anurag Arnab, Cordelia Schmid
Computer Vision and Pattern Recognition (CVPR), 2022
@inproceedings{iscen_cvpr_2022,
title={Learning with Neighbor Consistency for Noisy Labels},
author={Iscen, Ahmet and Valmadre, Jack and Arnab, Anurag and Schmid, Cordelia},
booktitle={CVPR},
year={2022}
}
title={Learning with Neighbor Consistency for Noisy Labels},
author={Iscen, Ahmet and Valmadre, Jack and Arnab, Anurag and Schmid, Cordelia},
booktitle={CVPR},
year={2022}
}
The Efficiency Misnomer
Mostafa Dehghani*, Anurag Arnab*, Lucas Beyer*, Ashish Vaswani, Yi Tay*
International Conference on Learning Representations (ICLR), 2022
@inproceedings{dehghani_iclr_2022,
title={The Efficiency Misnomer},
author={Dehghani, Mostafa and Arnab, Anurag and Beyer, Lucas and Vaswani, Ashish and Tay, Yi},
booktitle={ICLR},
year={2022}
}
title={The Efficiency Misnomer},
author={Dehghani, Mostafa and Arnab, Anurag and Beyer, Lucas and Vaswani, Ashish and Tay, Yi},
booktitle={ICLR},
year={2022}
}
Scenic: A JAX library for Computer Vision Research and Beyond
Mostafa Dehghani, Alexey Gritsenko, Anurag Arnab, Matthias Minderer, Yi Tay
Computer Vision and Pattern Recognition (CVPR) Demo, 2022
@inproceedings{dehghani_cvprw_2022,
title={Scenic: A JAX library for Computer Vision Research and Beyond},
author={Dehghani, Mostafa and Gritsenko, Alexey and Arnab, Anurag and Minderer, Matthias and Vaswani, Ashish and Tay, Yi},
booktitle={CVPR Demo},
year={2022}
}
title={Scenic: A JAX library for Computer Vision Research and Beyond},
author={Dehghani, Mostafa and Gritsenko, Alexey and Arnab, Anurag and Minderer, Matthias and Vaswani, Ashish and Tay, Yi},
booktitle={CVPR Demo},
year={2022}
}
2021
ViViT: A Video Vision Transformer
Anurag Arnab*, Mostafa Dehghani*, Georg Heigold, Chen Sun, Mario Lucic, Cordelia Schmid
International Conference on Computer Vision (ICCV), 2021
@inproceedings{arnab2021vivit,
title={ViViT: A video vision transformer},
author={Arnab, Anurag and Dehghani, Mostafa and Heigold, Georg and Sun, Chen and Lu{\v{c}}i{\'c}, Mario and Schmid, Cordelia},
booktitle={ICCV},
year={2021}
}
title={ViViT: A video vision transformer},
author={Arnab, Anurag and Dehghani, Mostafa and Heigold, Georg and Sun, Chen and Lu{\v{c}}i{\'c}, Mario and Schmid, Cordelia},
booktitle={ICCV},
year={2021}
}
Unified Graph Structured Models for Video Understanding
Anurag Arnab, Chen Sun, Cordelia Schmid
International Conference on Computer Vision (ICCV), 2021
@inproceedings{arnab2021unified,
title={Unified Graph Structured Models for Video Understanding},
author={Arnab, Anurag and Sun, Chen and Schmid, Cordelia},
booktitle={ICCV},
year={2021}
}
title={Unified Graph Structured Models for Video Understanding},
author={Arnab, Anurag and Sun, Chen and Schmid, Cordelia},
booktitle={ICCV},
year={2021}
}
Compressive Visual Representations
Kuang-Huei Lee*, Anurag Arnab*, Sergio Guadarrama, John Canny, Ian Fischer*
Conference on Neural Information Processing Systems (NeurIPS), 2021
@inproceedings{lee2021compressive,
title={Compressive Visual Representations},
author={Lee, Kuang-Huei and Arnab, Anurag and Guadarrama, Sergio and Canny, John and Fischer, Ian},
booktitle={NeurIPS},
year={2021}
}
title={Compressive Visual Representations},
author={Lee, Kuang-Huei and Arnab, Anurag and Guadarrama, Sergio and Canny, John and Fischer, Ian},
booktitle={NeurIPS},
year={2021}
}
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Michael S. Ryoo, AJ Piergiovanni, Anurag Arnab, Mostafa Dehghani, Anelia Angelova
Conference on Neural Information Processing Systems (NeurIPS), 2021
@inproceedings{ryoo2021tokenlearner,
title={TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?},
author={Ryoo, Michael S and Piergiovanni, AJ and Arnab, Anurag and Dehghani, Mostafa and Angelova, Anelia},
booktitle={NeurIPS},
year={2021}
}
title={TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?},
author={Ryoo, Michael S and Piergiovanni, AJ and Arnab, Anurag and Dehghani, Mostafa and Angelova, Anelia},
booktitle={NeurIPS},
year={2021}
}
Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, Chen Sun
Conference on Neural Information Processing Systems (NeurIPS), 2021
@inproceedings{nagrani2021attention,
title={Attention bottlenecks for multimodal fusion},
author={Nagrani, Arsha and Yang, Shan and Arnab, Anurag and Jansen, Aren and Schmid, Cordelia and Sun, Chen},
booktitle={NeurIPS},
year={2021}
}
title={Attention bottlenecks for multimodal fusion},
author={Nagrani, Arsha and Yang, Shan and Arnab, Anurag and Jansen, Aren and Schmid, Cordelia and Sun, Chen},
booktitle={NeurIPS},
year={2021}
}
2020
Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos
Anurag Arnab, Chen Sun, Arsha Nagrani, Cordelia Schmid
European Conference on Computer Vision (ECCV), 2020
@inproceedings{arnab_eccv_2020,
author = {Li Zhang and Dan Xu and Anurag Arnab and Philip Torr},
title = {Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2020}
}
author = {Li Zhang and Dan Xu and Anurag Arnab and Philip Torr},
title = {Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2020}
}
Dynamic Graph Message Passing Networks
Li Zhang, Dan Xu, Anurag Arnab, Philip H.S. Torr
Computer Vision and Pattern Recognition (CVPR), 2020
Oral presentation
@inproceedings{arnab_cvpr_2019,
author = {Li Zhang and Dan Xu and Anurag Arnab and Philip Torr},
title = {Dynamic Graph Message Passing Networks},
booktitle = {Computer Vision and Pattern Recognition (CVPR)},
year = {2020}
}
author = {Li Zhang and Dan Xu and Anurag Arnab and Philip Torr},
title = {Dynamic Graph Message Passing Networks},
booktitle = {Computer Vision and Pattern Recognition (CVPR)},
year = {2020}
}
Meta-Learning Deep Visual Words for Fast Video Object Segmentation
Harkirat Singh Behl, Mohammad Najafi, Anurag Arnab, Philip H.S. Torr.
Intelligent Robots and Systems (IROS), 2020
NeurIPS Machine Learning for Autonomous Driving Workshop, 2019
@inproceedings{behl_iros_2020,
author = {Harkirat Singh Behl, Mohammad Najafi, Anurag Arnab, Philip H.S. Torr},
title = {Meta-Learning Deep Visual Words for Fast Video Object Segmentation},
booktitle = {IROS},
year = {2020}
}
author = {Harkirat Singh Behl, Mohammad Najafi, Anurag Arnab, Philip H.S. Torr},
title = {Meta-Learning Deep Visual Words for Fast Video Object Segmentation},
booktitle = {IROS},
year = {2020}
}
2019
Exploiting Temporal Context for 3D Human Pose Estimation In The Wild
Anurag Arnab*, Carl Doersch*, Andrew Zisserman
Computer Vision and Pattern Recognition (CVPR), 2019
@inproceedings{arnab_cvpr_2019,
author = {Anurag Arnab and Carl Doersch and Andrew Zisserman},
title = {Exploiting temporal context for 3D human pose estimation in the wild},
booktitle = {Computer Vision and Pattern Recognition (CVPR)},
year = {2019}
}
author = {Anurag Arnab and Carl Doersch and Andrew Zisserman},
title = {Exploiting temporal context for 3D human pose estimation in the wild},
booktitle = {Computer Vision and Pattern Recognition (CVPR)},
year = {2019}
}
Dual Graph Convolutional Network for Semantic Segmentation
Li Zhang*, Xiangtai Li*, Anurag Arnab, Kuiyuan Yang, Yunhai Tong, Philip
H.S. Torr
British Machine Vision Conference (BMVC), 2019
@inproceedings{arnab_cvpr_2019,
author = {Li Zhang and Xiangtai Li and Anurag Arnab and Kuiyuan Yang and Yunhai Tong and Philip H.S. Torr},
title = {Dual Graph Convolutional Network for Semantic Segmentation},
booktitle = {British Machine Vision Conference (BMVC)},
year = {2019}
}
author = {Li Zhang and Xiangtai Li and Anurag Arnab and Kuiyuan Yang and Yunhai Tong and Philip H.S. Torr},
title = {Dual Graph Convolutional Network for Semantic Segmentation},
booktitle = {British Machine Vision Conference (BMVC)},
year = {2019}
}
2018
Weakly- and Semi-Supervised Panoptic Segmentation
Qizhu Li*, Anurag Arnab*, Philip H.S Torr
European Conference on Computer Vision (ECCV), 2018
@inproceedings{li_eccv_2018,
author = {Qizhu Li and Anurag Arnab and Philip H. S. Torr},
title = {Weakly- and Semi-Supervised Panoptic Segmentation},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2018}
}
author = {Qizhu Li and Anurag Arnab and Philip H. S. Torr},
title = {Weakly- and Semi-Supervised Panoptic Segmentation},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2018}
}
On the Robustness of Semantic Segmentation Models to Adversarial Attacks
Anurag Arnab, Ondrej Miksik, Philip H.S Torr
Computer Vision and Pattern Recognition (CVPR), 2018
Pattern Analysis and Machine Intelligence (PAMI), 2019
Pattern Analysis and Machine Intelligence (PAMI), 2019
@inproceedings{arnab_cvpr_2018,
author = {Anurag Arnab and Ondrej Miksik and Philip H. S. Torr},
title = {On the Robustness of Semantic Segmentation Models to Adversarial Attacks},
booktitle = {Computer Vision and Pattern Recognition (CVPR)},
year = {2018}
}
author = {Anurag Arnab and Ondrej Miksik and Philip H. S. Torr},
title = {On the Robustness of Semantic Segmentation Models to Adversarial Attacks},
booktitle = {Computer Vision and Pattern Recognition (CVPR)},
year = {2018}
}
Conditional Random Fields Meet Deep Neural Networks for Semantic Segmentation
Anurag Arnab, Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Måns Larsson, Alexander Kirillov, Bogdan Savchynskyy, Carsten Rother, Fredrik Kahl, Philip H.S. Torr
IEEE Signal Processing Magazine,
2018
@article{Arnab_IEEESPM_2018,
author={A. Arnab and S. Zheng and S. Jayasumana and B. Romera-Paredes and M. Larsson and A. Kirillov and B. Savchynskyy and C. Rother and F. Kahl and P. H. S. Torr},
journal={IEEE Signal Processing Magazine},
title={Conditional Random Fields Meet Deep Neural Networks for Semantic Segmentation: Combining Probabilistic Graphical Models with Deep Learning for Structured Prediction},
year={2018},
volume={35},
number={1},
pages={37-52},
doi={10.1109/MSP.2017.2762355},
ISSN={1053-5888},
month={Jan}
}
author={A. Arnab and S. Zheng and S. Jayasumana and B. Romera-Paredes and M. Larsson and A. Kirillov and B. Savchynskyy and C. Rother and F. Kahl and P. H. S. Torr},
journal={IEEE Signal Processing Magazine},
title={Conditional Random Fields Meet Deep Neural Networks for Semantic Segmentation: Combining Probabilistic Graphical Models with Deep Learning for Structured Prediction},
year={2018},
volume={35},
number={1},
pages={37-52},
doi={10.1109/MSP.2017.2762355},
ISSN={1053-5888},
month={Jan}
}
Revisiting Deep Structured Models for Pixel-Level Labeling with Gradient-Based Inference
Måns Larsson, Anurag Arnab, Shuai Zheng, Philip H.S. Torr, Fredrik Kahl.
SIAM Journal on Imaging Sciences, 2018
@inproceedings{larsson_siam_2018,
author = {Mans Larsson and Anurag Arnab and Shuai Zheng and Philip H.S. Torr and Fredrik Kahl},
title = {Revisiting Deep Structured Models for Pixel-Level Labeling with Gradient-Based Inference},
journal = {SIAM Journal on Imaging Sciences},
year = {2018}
volume = {11},
number = {4},
pages = {2610-2628},
doi = {10.1137/18M1167267},
URL = {https://doi.org/10.1137/18M1167267}
}
author = {Mans Larsson and Anurag Arnab and Shuai Zheng and Philip H.S. Torr and Fredrik Kahl},
title = {Revisiting Deep Structured Models for Pixel-Level Labeling with Gradient-Based Inference},
journal = {SIAM Journal on Imaging Sciences},
year = {2018}
volume = {11},
number = {4},
pages = {2610-2628},
doi = {10.1137/18M1167267},
URL = {https://doi.org/10.1137/18M1167267}
}
2017
Pixelwise Instance Segmentation with a Dynamically Instantiated Network
Anurag Arnab, Philip H.S. Torr
Computer Vision and Pattern Recognition (CVPR), 2017
@inproceedings{arnab_cvpr_2017,
author = {Anurag Arnab and Philip H. S. Torr},
title = {Pixelwise Instance Segmentation with a Dynamically Instantiated Network},
booktitle = {Computer Vision and Pattern Recognition (CVPR)},
year = {2017}
}
author = {Anurag Arnab and Philip H. S. Torr},
title = {Pixelwise Instance Segmentation with a Dynamically Instantiated Network},
booktitle = {Computer Vision and Pattern Recognition (CVPR)},
year = {2017}
}
Holistic, Instance-level Human Parsing
Qizhu Li*, Anurag Arnab*, Philip H.S Torr
British Machine Vision Conference (BMVC), 2017
@inproceedings{li_bmvc_2017,
author = {Qizhu Li and Anurag Arnab and Philip H. S. Torr},
title = {Holistic, Instance-level Human Parsing},
booktitle = {British Machine Vision Conference (BMVC)},
year = {2017}
}
author = {Qizhu Li and Anurag Arnab and Philip H. S. Torr},
title = {Holistic, Instance-level Human Parsing},
booktitle = {British Machine Vision Conference (BMVC)},
year = {2017}
}
A Projected Gradient Descent Method for CRF Inference allowing End-To-End Training of Arbitrary Pairwise Potentials
Måns Larsson, Anurag Arnab, Fredrik Kahl, Shuai Zheng, Philip H.S. Torr
Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR), 2017
@inproceedings{larsson_emmcvpr_2017,
author = {Mans Larsson and Anurag Arnab and Fredrik Kahl and Shuai Zheng and Philip H.S. Torr},
title = {A Projected Gradient Descent Method for CRF Inference allowing End-To-End Training of Arbitrary Pairwise Potentials},
booktitle = {Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR)},
year = {2017}
}
author = {Mans Larsson and Anurag Arnab and Fredrik Kahl and Shuai Zheng and Philip H.S. Torr},
title = {A Projected Gradient Descent Method for CRF Inference allowing End-To-End Training of Arbitrary Pairwise Potentials},
booktitle = {Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR)},
year = {2017}
}
2016
Higher Order Conditional Random Fields in Deep Neural Networks
Anurag Arnab, Sadeep Jayasumana, Shuai Zheng, Philip H.S Torr
European Conference on Computer Vision (ECCV), 2016
@inproceedings{arnab_eccv_2016,
author = {Anurag Arnab and Sadeep Jayasumana and Shuai Zheng and Philip H. S. Torr},
title = {Higher Order Conditional Random Fields in Deep Neural Networks},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2016}
}
author = {Anurag Arnab and Sadeep Jayasumana and Shuai Zheng and Philip H. S. Torr},
title = {Higher Order Conditional Random Fields in Deep Neural Networks},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2016}
}
Bottom-up Instance Segmentation using Deep Higher-Order CRFs
Anurag Arnab, Philip H.S Torr.
British Machine Vision Conference (BMVC), 2016
@inproceedings{arnab_bmvc_2016,
author = {Anurag Arnab and Philip H. S. Torr},
title = {Bottom-up Instance Segmentation using Deep Higher-Order CRFs},
booktitle = {British Machine Vision Conference (BMVC},
year = {2016}
}
author = {Anurag Arnab and Philip H. S. Torr},
title = {Bottom-up Instance Segmentation using Deep Higher-Order CRFs},
booktitle = {British Machine Vision Conference (BMVC},
year = {2016}
}
2015
Joint Object-Material Category Segmentation from Audio-Visual Cues
Anurag Arnab, Michael Sapienza, Stuart Golodetz, Julien Valentin, Ondrej Miksik, Shahram Izadi, Philip H.S. Torr.
British Machine Vision
Conference (BMVC), 2015
@inproceedings{arnab_bmvc_2016,
author = {Anurag Arnab and Michael Sapienza and Stuart Golodetz and Julien Valentin and Ondrej Miksik and Shahram Izadi and Philip H.S. Torr.},
title = {Joint Object-Material Category Segmentation from Audio-Visual Cues},
booktitle = {British Machine Vision Conference (BMVC},
year = {2015}
}
author = {Anurag Arnab and Michael Sapienza and Stuart Golodetz and Julien Valentin and Ondrej Miksik and Shahram Izadi and Philip H.S. Torr.},
title = {Joint Object-Material Category Segmentation from Audio-Visual Cues},
booktitle = {British Machine Vision Conference (BMVC},
year = {2015}
}
SemanticPaint: A Framework for the Interactive Segmentation of 3D Scenes
Stuart Golodetz, Michael Sapienza, Julien Valentin, Vibhav Vineet, Ming-Ming Cheng, Anurag Arnab, Victor Adrian Prisacariu, Olaf Kaehler, Carl Yuheng Ren, David W. Murray, Shahram Izadi, Philip H.S. Torr
ACM SIGGRAPH 2015 Emerging Technologies, 2015 (live demo)
arXiv 1510.03727, 2015
@inproceedings{golodetz_arxiv_2015,
title={SemanticPaint: A Framework for the Interactive Segmentation of 3D Scenes},
author={Golodetz, Stuart and Sapienza, Michael and Valentin, Julien PC and Vineet, Vibhav and Cheng, Ming-Ming and Arnab, Anurag and Prisacariu, Victor A and K{\"a}hler, Olaf and Ren, Carl Yuheng and Murray, David W and Izadi, Shahram and Torr, Philip},
booktitle={arXiv preprint arXiv:1510.03727},
year={2015}
}
title={SemanticPaint: A Framework for the Interactive Segmentation of 3D Scenes},
author={Golodetz, Stuart and Sapienza, Michael and Valentin, Julien PC and Vineet, Vibhav and Cheng, Ming-Ming and Arnab, Anurag and Prisacariu, Victor A and K{\"a}hler, Olaf and Ren, Carl Yuheng and Murray, David W and Izadi, Shahram and Torr, Philip},
booktitle={arXiv preprint arXiv:1510.03727},
year={2015}
}
Thesis
Advanced Architectures for Vision
Invited talk at African Computer Vision Summer School (ACVSS) at Nairobi, Kenya. July 2024. [Slides]
Large-Scale Video Understanding with Transformers
Invited talk at GIST Workshop for Accelerating Intelligence at GIST, South Korea. December 2022. Invited talk at Google Visits POSTECH at POSTECH, South Korea. December 2022.
[Slides]
Large-Scale Video Understanding with Transformers
Invited talk at Holistic Video Understanding Workshop at CVPR. June 2022. [Slides]
Winning entry to the Epic Kitchens Action Recognition Challenge
Invited talk at Epic Kitchens Workshop at CVPR. June 2022. [Slides]
Video Understanding with Imperfect Data
Invited talk at Learning from Limited and Imperfect Data (L2ID) workshop at CVPR. June 2021. [Slides]
Transformers: A Review, and Recent Developments in Vision
Invited lecture at Deep Learning Indaba X Tanzania. June 2021. [Slides]
Structured Models for Video Understanding
Invited talk at Ulsan National Institute of Science and Technology (UNIST), South Korea. June 2021 [Slides]
Video Understanding in the Wild with Incomplete Supervision
Invited talk at 1st Visual Intelligence Seminar at Fudan University, China. January 2021 [Slides]
Scene Understanding with Deep Structured Models
Invited talk at University of Warsaw. January 2020 [Slides]
Learning from Weak Supervision: Panoptic Segmentation and 3D Human Pose Estimation
Invited talk at Learning from Imperfect
Data Workshop at CVPR. June 2019
[Slides]
Pixelwise Instance Segmentation with a Dynamically Instantiated Network
ETH Zurich, August 2017 [Slides]
Holistic Scene Understanding with Deep Learning and Dense Random Fields
Invited tutorial at Deep
Learning Meets Model Optimization and Statistical Inference at European
Conference on Computer
Vision (ECCV), October 2016. [Slides]
Joint Object-Material Category Segmentation from Audio-Visual Cues
Vision and Learning Seminar (Online), February 2016 [Video]
Joint Object-Material Category Segmentation from Audio-Visual Cues
CVSSP Seminar, University of Surrey, November 2015 [Slides]


