| CARVIEW |
Aljosa Osep, Ph.D.
Hi, I'm Aljosa! I am a Senior Research Scientist at NVIDIA, working on learning to understand the dynamic world from raw, unlabeled streams of sensory data.
I come from the Alpine side of Slovenia. I earned my Ph.D. from RWTH Aachen University under the supervision of Prof. Bastian Leibe. I was a postdoctoral fellow at the Technical University of Munich and the Robotics Institute, Carnegie Mellon University.
Research Statement (2019) Twitter Scholar
Research
My research focuses on enabling AI systems to robustly understand the dynamic, 3D world from raw sensor streams, such as video and LiDAR. Key areas include learning directly from raw data, tracking and segmenting objects, understanding complex spatiotemporal scenes, and predicting future events in open-world environments. Hover over each topic below to explore related publications.
Relevant Papers: Learning From Raw Sensory Data
- Large-Scale Object Mining for Object Discovery from Unlabeled Video (ICRA 2019)
- 4D Generic Video Object Proposals (ICRA 2020)
- Learning to Discover and Detect Objects (NeurIPS 2022)
- Pix2Map: Cross-modal Retrieval for Inferring Street Maps from Images (CVPR 2023)
- What Moves Together Belongs Together (CVPR 2024)
- Better Call SAL: Towards Learning to Segment Anything in Lidar (ECCV 2024)
- Towards Learning to Complete Anything in Lidar (ICML 2025)
Relevant Papers: Tracking and Panoptic Perception
- Combined Image- and World-Space Tracking in Traffic Scenes (ICRA 2017)
- Track, then Decide: Category-Agnostic Vision-based Multi-Object Tracking (ICRA 2018)
- MOTS: Multi-Object Tracking and Segmentation (CVPR 2019)
- HOTA: A Higher Order Metric for Evaluating Multi-Object Tracking (IJCV 2020)
- STEP: Segmenting and Tracking Every Pixel (NeurIPS Datasets 2022)
- Opening up Open-World Tracking (CVPR 2022)
- PolarMOT: How far can geometric relations take us in 3D multi-object tracking? (ECCV 2022)
- EagerMOT: 3D Multi-Object Tracking via Sensor Fusion (ICRA 2021)
Relevant Papers: Forecasting & Behavioral AI
News
- October 2025: Having a talk about recent research at the legendary GRASP lab at UPenn on segmenting what we cannot (directly) see 👻: link.
- June 2024: Having a talk at the CVPR Area Chair meeting, titled Learning To Understand The World From Video.
- June 2024: I joined NVIDIA as a Senior Research Scientist!
- March 2024: Our paper on Learning to segment anything in Lidar (SAL) was featured at GTC2024! Check out the NVIDIA AI Tools for Autonomous Vehicle Developers.
- September 2022: Two papers accepted to NeurIPS 2022! Excited to be back to NOLA soon!
- June 2022: I was featured in the TWIMLAI podcast! Listen to the episode.
- August 2021: I am one of the three persons listed as outstanding reviewers for all top-tier computer vision conferences in 2020/21. See the informal analysis by Simon Niklaus! Thanks to ACs for the recognition and Simon for pointing this out.
- August 2021: I was awarded Borchers Plaquette at RWTH Aachen University for outstanding doctoral dissertation!
- June 2021: I am spending a year at the Carnegie Mellon University (The Robotics Institute, CMU Argo AI Center for Autonomous Vehicle Research) in Pittsburgh! Thanks to Deva Ramanan for hosting me!
- April 2020: Learned how to make pancakes! Check out the evidence.
Students Supervised
- Ayça Takmaz (NVIDIA, intern) ETH Zurich
- Yushan Zhang (NVIDIA, intern) Linköping University
- Neehar Peri (NVIDIA, intern) Carnegie Mellon University
- Xindi Wu (CMU, 2022) → Princeton
- Vladimir Fomenko (TUM, 2022) → Microsoft, OpenAI
- Anirudh S Chakravarthy (CMU, 2022) → Cruise AI
- Meghana Reddy Ganesina (CMU, 2022) → Zoox
- Abhinav Agarwalla (CMU, 2022) → Prior Argo AI, now Neural Magic
- Vladimir Yugay (TUM, 2022) → University of Amsterdam
- Alexandr Kim (TUM, 2019-2021) → Meta
- Yang Liu (TUM, 2020) → Huawei
- Manuel Kolmet (TUM, 2021) → GLASS Imaging
- Anselm Coogan (TUM, 2021) → Scandit
- Maximilian Kempa (TUM, 2020) → Bosch
- Mehmet Aygün (TUM, 2020) → University of Edinburgh
- Johannes Gross (RWTH Aachen, 2019)
- Deyvid Kochanov (RWTH Aachen, 2016)
- Dirk Klostermann (RWTH Aachen, 2015) → BMW
Talks
- June 2024: CVPR 2024 Area Chair Panel, invited talk: Learning To Understand The World From Video, Slides
- June 2023: CVPR 2023, Visual Perception via Learning in an Open World, invited talk: Learning To Understand The World From Video, Slides
- June 2023: University of Ljubljana, invited talk: Learning To Understand The World From Video, Slides
- February 2023: University of Ljubljana, invited talk: Learning To Understand The World From Video, Slides
- October 2022: ECCV’22 Workshop on 3D Perception in Autonomous Driving Details, Slides
- October 2022: ECCV’22 Workshop on Cross-Modal Human-Robot Interaction Details, Slides
- June 2022: Slovenian data-science meetup talk (in Slovenian): Slides
- April 2022: UT Austin AI colloquium, Unifying Segmentation, Tracking, and Forecasting, Slides
- September 2021: ICCV’21 Workshop on 3D Object Detection from Images, 4D Panoptic LiDAR Segmentation, Slides
- July 2021: RSS 2021 on Workshop on Behavioral Inference of Remotely Sensed Multi-agent Systems, invited talk, Tracking Every Object and Pixel, Slides
- July 2021: RSS 2021 Workshop on Perception and Control for Autonomous Navigation in Crowded, Dynamic Environments, invited talk, Tracking Every Object and Pixel, Slides, Talk
- June 2021: CVPR'21 JackRobbot dataset and benchmark (JRDB) workshop talk, Tracking Every Object and Pixel, Slides
- April 2021: Cornell Robotics Seminar, Slides
- September 2020: University of Bonn - Research talk, Slides
- June 2019: RWTH Aachen University - Thesis Defense, Slides
- June 2019: Georgia Tech - Research Talk, Slides
- March 2019: Carnegie Mellon University VASC Seminar, Slides
Teaching
- Summer 2022: Lecturer for IN2375: Computer Vision III: Detection, Segmentation and Tracking (CV3DST) (TU Munich)
- Summer 2022: Lecturer for IN2346: Introduction to Deep Learning (I2DL) (TU Munich)
- Winter 2020/21: IN2157: Fundamental Algorithms (TU Munich, lecturer)
- Summer 2019/20: IN2375: Computer Vision 3: Detection, Segmentation and Tracking (TU Munich, guest lecturer), Lectures available on YouTube
- Winter 2019/20: IN2157: Fundamental Algorithms (TU Munich, lecturer), TUM Moodle, IN2375: Computer Vision 3: Detection, Segmentation and Tracking (TU Munich, guest lecturer)
- Summer 2016/17: Machine Learning (RWTH Aachen, exercise class)
- Winter 2014/15: Computer Vision (RWTH Aachen, exercise class)
- Winter 2013/14: Machine Learning (RWTH Aachen, exercise class)
Service
- Area Chair (AC) for ICLR, CVPR, ECCV, ICCV, WACV, ACCV.
- I am the main organizer of 6th BMTT MOTChallenge Workshop: Segmenting and Tracking Every Point and Pixel at ICCV'21 workshop, and co-organizer of: 7th Workshop on Benchmarking Multi-Target Tracking: How Far Can Synthetic Data Take us? at CVPR'22, Tracking and its many guises Workshop at ECCV'2020, Multi-Object Tracking and Segmentation Workshop at CVPR'2020.
- Reviewer for (machine learning, vision conferences) CVPR, ECCV, ICCV, BMVC, NeurIPS, ICML, ICLR; (robotics conferences) ICRA, IROS, RSS; (journals) IJCV, RAL, TPAMI.
- I am in RSS Pioneers 2021 program committee and on the ECCV'24 organization team!
Publications
P. Dendorfer, A. Ošep, A. Milan, K. Schindler, D. Cremers, I. Reid, S. Leal-Taixé: MOTChallenge: A Benchmark for Single-camera Multiple Target Tracking, International Journal of Computer Vision (IJCV), 2020.
paper
A. Ošep, P. Voigtlaender, J. Luiten, S. Breuers, B. Leibe: Towards Large-Scale Video Object Mining, ECCV 2018 Workshop on Interactive and Adaptive Learning in an Open World, 2018.
paper
A. Ošep, A. Hermans, F. Engelmann, D. Klostermann, M. Mathias, B. Leibe: Multi-Scale Object Candidates for Generic Object Tracking in Street Scenes, International Conference on Robotics and Automation (ICRA), 2016.
paper
D. Mitzel, J. Diesel, A. Ošep, U. Rafi, B. Leibe: A Fixed-Dimensional 3D Shape Representation for Matching Partially Observed Objects in Street Scenes, International Conference on Robotics and Automation (ICRA), 2015.
paper
M. Weinmann, A. Ošep, R. Ruiters, R. Klein: Multi-View Normal Field Integration for 3D Reconstruction of Mirroring Objects, International Conference on Computer Vision (ICCV), 2013.
paper
M. Weinmann, R. Ruiters, A. Ošep, C. Schwartz, R. Klein: Fusing Structured Light Consistency and Helmholtz Normals for 3D Reconstruction, British Machine Vision Conference (BMVC), 2012.
paper