Holo-World

Unified Camera, Object and Weather Control for Video World Model

Xiangchen Yin^1,3 Wenzhang Sun² Jiahui Yuan¹ Zijie Liu¹
Yinda Chen¹ Wei Li¹ Dachun Kai¹ Chunfeng Wang² Xiaoyan Sun^1,3,†

¹University of Science and Technology of China ²Li Auto.
³Institute of Artificial Intelligence, Hefei Comprehensive National Science Center
^†Corresponding authors.
yinxc2000@gmail.com

Paper Code Model Cite

📋 ABSTRACT

HoloStateData

Holo-World

Core Ideas

Unified World Control

A unified video world model interface

Holo-World jointly controls camera motion, object dynamics, and weather state from a single image.

World Preservation

Source-side controls define the scene scaffold to preserve

Camera trajectory, rendered geometry buffers, and object controls anchor the background structure and dynamic entities, keeping the generated world consistent with the observed source.

Weather-State Transfer

Target weather specifies the state to render

The target-weather prompt guides the preserved scene scaffold into a new weather state, allowing weather-dependent appearance and particle effects to emerge within the same controlled world.

Demo Video

🔬 Method

Holo-World jointly controls camera motion, object dynamics, and scene weather state within the same observed world. The model must change weather from a single image while still following explicit camera and object controls, rather than relying on a complete source video as in video-to-video weather editing.

Dataset & Benchmark

15K

HoloStateData

15000+ training samples across Real / Simulation / V2V subsets, carrying paired controls for camera, object, and weather supervision.

150

Benchmark

150 mutually exclusive evaluation samples for world preservation and weather transfer tracks.

🌦️ Weather-State Transfer

Given the same input: camera trajectory and rendered controls, Holo-World renders the controlled scene under different target weather states. Our learn weather-state transfer within the same scene scaffold, rather than regenerating a different world.

Sample 1 Simulation subset

Camera Trajectory

Rendered RGB

Origin

First Frame

Snow

Cloud

Fog

Rain

Sample 2 V2V subset

Camera Trajectory

Rendered RGB

Origin

First Frame

Snow

Cloud

Fog

Rain

🧭 World Preservation

Source-side controls define the scene scaffold Holo-World should preserve. Camera trajectory, rendered world controls, and object controls anchor background structure and dynamic entities, enabling temporally consistent video synthesis.

Sample 1

Camera Trajectory

First Frame

Rendered RGB

Rendered Normal

Rendered Depth

Holo-World Output Ours

Sample 2

Camera Trajectory

First Frame

Rendered RGB

Rendered Normal

Rendered Depth

Holo-World Output Ours

🎯 Camera & Object Control Comparison

Controllable video generation under camera trajectory and object manipulation. We compare Holo-World with VerseCrafter, Gen3C, and Uni3C. The top rows show the camera trajectory, Ground Truth, each method's rendered RGB control, and the bottom row shows the corresponding generated results.

Sample 1

Camera Trajectory

Ground Truth

Holo-World Rendered RGB

VerseCrafter Rendered RGB

Gen3C Rendered RGB

Uni3C Rendered RGB

Holo-World Ours

VerseCrafter

Gen3C

Uni3C

Sample 2

Camera Trajectory

Ground Truth

Holo-World Rendered RGB

VerseCrafter Rendered RGB

Gen3C Rendered RGB

Uni3C Rendered RGB

Holo-World Ours

VerseCrafter

Gen3C

Uni3C

⛈️ Weather Transfer Comparison

Note: Our I2V Advantage over V2V Methods

Holo-World performs Image-to-Video (I2V) generation, synthesizing temporal dynamics and weather effects from only a single input frame plus camera trajectory. In contrast, other methods (Wan2.7-Edit, Cosmos-Transfer-2.5) are Video-to-Video (V2V) approaches that receive the full origin video as input and perform video editing or transfer, which is a less challenging task as they can leverage the temporal information from the origin video.

Sample 1 Snow Scene

Camera Trajectory

Origin Video

First Frame

Cosmos-Transfer-2.5 (V2V)

Wan2.7-Edit (V2V)

Holo-World (I2V)

Sample 2 Fog Scene

Camera Trajectory

Origin Video

First Frame

Cosmos-Transfer-2.5 (V2V)

Wan2.7-Edit (V2V)

Holo-World (I2V)

Sample 3 Rain Scene

Camera Trajectory

Origin Video

First Frame

Cosmos-Transfer-2.5 (V2V)

Wan2.7-Edit (V2V)

Holo-World (I2V)

📚 Citation

If you find HoloWorld useful in your research, please cite us:

@article{yin2026holoworld,
    title={Holo-World: Unified Camera, Object and Weather Control for Video World Model}, 
    author={Xiangchen Yin and Wenzhang Sun and Jiahui Yuan and Zijie Liu and Yinda Chen and Wei Li and Dachun Kai and Chunfeng Wang and Xiaoyan Sun},
    year={2026},
    eprint={2606.20083},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2606.20083}, 
}