Exporters From Japan
Wholesale exporters from Japan   Company Established 1983
CARVIEW
Select Language

Object Insertion

We demonstrate GEM's capability to insert objects into scenes and precisely control their motion. In the following examples, we insert a new car into the scene and can even control the movement of existing cars.

Unconditional Generation

Insertion Control

Human Pose Control

GEM can use human poses to control pedestrian motion within the scene. In these examples, pedestrians either cross the street or stop according to the provided controls.

Move poses control

Static poses control

Long Generation

We compare our long generation with the only world model trained on OpenDV capable of generating long sequences. We observe that our generations have higher ego motion temporal consistency and more realistic dynamics.

GEM's Long Generation

Vista's Long Generation

Interesting Observations

We show interesting behaviors observed in the generated videos. These behaviors do not necessarily exist in the ground truth videos, but emerge from the model's learned dynamics.

Break lights go off before moving

Smooth takeover dynamics on a long generation

Multimodal

GEM generates two modalities simultaneously: RGB and Depth. We show examples of multimodal generations.

Multidomain

GEM is finetuned on two other ego centric domains and we observe it quickly adapts to these new domains.

1. Drone Flights

Drone Flights GIF Drone Flights GIF Drone Flights GIF Drone Flights GIF

2. Human Egocentric

Human EgoCentric GIF Human EgoCentric GIF Human EgoCentric GIF Human EgoCentric GIF

Pseudo-labeling

Below, we present visualizations demonstrating our pseudo-labeling pipeline’s capability to generate skeleton poses, depth maps, and ego-motion trajectories.