2024 GenieGenerativeInteractiveEnvir

(Bruce et al., 2024) ⇒ Jake Bruce, Michael Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, Yusuf Aytar, Sarah Bechtle, Feryal Behbahani, Stephanie Chan, Nicolas Heess, Lucy Gonzalez, Simon Osindero, Sherjil Ozair, Scott Reed, Jingwei Zhang, Konrad Zolna, Jeff Clune, Nando de Freitas, Satinder Singh, and Tim Rocktäschel. (2024). “Genie: Generative Interactive Environments.”

Subject Headings:

Notes

It introduces Genie, a groundbreaking generative AI framework for generating interactive, controllable video generation environments from text, images, or sketches, eliminating the need for action annotations or text annotations during training.
It employs a comprehensive model architecture consisting of three primary components: a video tokenizer, a latent action model, and a dynamics model, each utilizing memory-efficient spatiotemporal transformers for effective temporal dynamics capture.
It showcases the ability to generate high-quality, controllable videos across various domains from different image prompts, including text-to-image outputs, sketches, and photos, and to model complex physical phenomena and object interactions accurately.
It achieves notable quantitative results with an 11 billion parameter model achieving an FVD score of 40.1 on a filtered 30k hour gaming video dataset, demonstrating Genie's proficiency in creating realistic and dynamic virtual environments.
It highlights the latent action space as a key feature that enables imitation learning in unseen environments, pushing the frontiers of generative video modeling and world simulation without requiring costly action annotations.
It discusses the societal impact of Genie, emphasizing its potential to augment human creativity and find applications in gaming and simulation industries, while also stressing the importance of responsible and ethical usage.
It opts not to release model weights or training data at this time, advocating for further research into the safe and ethical deployment of generative interactive environments.

Cited By

http://scholar.google.com/scholar?q=%222024%22+Genie%3A+Generative+Interactive+Environments

Quotes

Abstract

We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model. Genie enables users to act in the generated environments on a frame-by-frame basis despite training without any ground-truth action labels or other domain-specific requirements typically found in the world model literature. Further the resulting learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening the path for training generalist agents of the future.

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2024 GenieGenerativeInteractiveEnvir	Simon Osindero Jeff Clune Scott Reed Nando de Freitas Sherjil Ozair Matthew Lai Tim Rocktäschel Nicolas Heess Jake Bruce Michael Dennis Ashley Edwards Jack Parker-Holder Yuge Shi Edward Hughes Aditi Mavalankar Richie Steigerwald Chris Apps Yusuf Aytar Sarah Bechtle Feryal Behbahani Stephanie Chan Lucy Gonzalez Jingwei Zhang Konrad Zolna Satinder Singh			Genie: Generative Interactive Environments						2024

2024 GenieGenerativeInteractiveEnvir

Notes

Cited By

Quotes

Abstract

References

Navigation menu

Search