| CARVIEW |
1st Workshop on Long Multi-Scene Video Foundations
Generation, Understanding and Evaluation
20th of October - Room 303B
Cutting-edge video modeling techniques have achieved impressive results in computer vision, especially in understanding and generating video content. However, these techniques are usually limited to short, single-scene videos and face challenges when applied to real-world scenarios involving complex, long-form narratives with multiple dynamic scenes. This workshop aims to bring together experts in long, multi-scene video modeling to discuss generation, understanding, evaluation, and ethical considerations. The workshop will establish a collaborative platform for exchanging recent breakthroughs and deliberating on the future direction of visual computing models capable of handling extended video content. Through this exchange of ideas and insights, we hope to overcome the challenges of creating and understanding long video narratives and contribute to their practical applications in various fields, including entertainment, education, and health.
Event Starts In:
This workshop will be held at Honolulu, Hawaii as part of ICCV 2025.
News & Updates
Non-Proceedings Track
Open
Submission Deadline
August 30, 2025
Proceedings Track
Deadlines Postponed
New Submission Deadline
July 1, 2025
June 8, 2025
Call for Papers
We invite contributions that explore technical, methodological, and societal aspects of working with complex video data that spans extended temporal durations and diverse content. Topics of interest include, but are not limited to:
- Multi-scene video generation, including text-to-video (T2V) approaches
- Vision-language models designed for long-form video understanding and generation
- Efficient training and inference strategies for large-scale video models
- Techniques for editing long-form or multi-scene video content
- Representation learning tailored to long videos
- Ensuring long-term temporal consistency in generated or analyzed video sequences
- Long-range reasoning and semantic understanding across scenes
- Factuality and grounding in video generation and comprehension tasks
- Development of evaluation metrics and benchmarks for long-form video
- Analysis of ethical and societal implications of large-scale long video models
Submission Tracks
For the Proceedings Track, submitted papers are expected to present original, unpublished work and must adhere to a strict double-blind review process. To ensure anonymity, authors should carefully avoid including any identifying information within the paper itself. The formatting of your submission is crucial and must strictly follow the comprehensive guidelines detailed in the ICCV 2025 Author Kit. Submissions should be between 4 and 8 pages in length, with additional pages allowed exclusively for the list of references. Papers accepted into this prestigious track will be compiled and published in the official ICCV 2025 Workshop Proceedings and authors of accepted papers will have the opportunity to present their work in person at the workshop.
Submit to Proceedings Track via: OpenReview
The No Proceedings Track offers a more flexible avenue for presenting a wider range of contributions. This track is well-suited for:
- Extended abstracts and short papers: We welcome submissions of up to 4 pages that describe work still in progress, report negative findings that are valuable to the community, or articulate insightful position papers on relevant topics.
- Previously published work: We also accept submissions of work that has already been published elsewhere. This includes papers that may have been accepted at the main ICCV 2025 conference itself.
Submit to No Proceedings Track via: OpenReview
Important Dates
Proceedings
Submission Deadline
July 1, 2025
June 8, 2025
Preliminary Author Notification
July 10, 2025
June 26, 2025
Camera-ready Deadline
July 31, 2025
June 12, 2025
Non Proceedings
Submission Deadline
August 30, 2025
Preliminary Author Notification
September 14, 2025
All dates are in GMT.
Event Schedule
01:00 PM - 01:10 PM
Opening Remarks
01:10 PM - 01:35 PM
Sayak Paul - Controllable Video Diffusion for Length
01:45 PM - 01:55 PM
Mask$^2$DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation
01:55 PM - 02:15 PM
Short Break
02:15 PM - 02:40 PM
Katerina Fragkiadak - 3D VLMs for Long Video Understanding
02:40 PM - 02:50 PM
Re-thinking Temporal Search for Long-Form Video Understanding
03:00 PM - 03:45 PM
Poster Session
04:00 PM - 04:25 PM
Jiajun Wu - Five Benchmarks for Multi-Modal, Long-Context Video Foundation Models
04:25 PM - 04:55 PM
Panel Discussion
04:55 PM - 05:00 PM
Final Remarks