CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Tue, 21 Oct 2025 21:30:17 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"68f7fb69-2a19" expires: Mon, 29 Dec 2025 00:28:50 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: 7C60:2B0FD4:81F765:91EDC0:6951C8E9 accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 00:18:50 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210082-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766967530.911775,VS0,VE206 vary: Accept-Encoding x-fastly-request-id: 6433e25c62843dfff49c67b3adc73b03bb06863a content-length: 2656 CPL

Continual Predictive Learning from Videos

Geng Chen* Wendong Zhang* Han Lu Siyu Gao Yunbo Wang Mingsheng Long Xiaokang Yang

Figure 1: The new problem of continual predictive learning and the general framework of our approach at test time.

Abstract

Predictive learning ideally builds the world model of physical processes in one or more given environments. Typical setups assume that we can collect data from all environments at all times. In practice, however, different prediction tasks may arrive sequentially so that the environments may change persistently throughout the training procedure. Can we develop predictive learning algorithms that can deal with more realistic, non-stationary physical environments? In this paper, we study a new continual learning problem in the context of video prediction, and observe that most existing methods suffer from severe catastrophic forgetting in this setup. To tackle this problem, we propose the continual predictive learning (CPL) approach, which learns a mixture world model via predictive experience replay and performs test-time adaptation with non-parametric task inference. We construct two new benchmarks based on RoboNet and KTH, in which different tasks correspond to different physical robotic environments or human actions. Our approach is shown to effectively mitigate forgetting and remarkably outperform the naïve combinations of previous art in video prediction and continual learning.

Method

Figure 2: The overall network architecture of the mixture world model and the predictive experience replay training scheme in the proposed CPL method

Result on RoboNet Benchmark

Figure 3: Showcases of action-conditioned video prediction in the first environment of RoboNet (i.e., Berkeley) after training the models in the last environment (i.e., Stanford). We compare our method (CPL-full) with the naïve combinations of existing world models and continual learning algorithms.