Generative Foundation Models

The Lumina series model are flow-based diffusion transformer architecture to transform text into any modality with enhanced scalability and efficiency. The model series is expected to efficiently generate high-fidelity data points with arbitrary resolution, serving as a diffusion-based foundation model. The models are also envisioned as a family of multimodal autoregressive models that can perform both text-centric and image-centric tasks, such as image captioning, multi-turn dialog, and any resolution text-to-image generation.

By adopting a unified architecture for images and text, the Lumina models are benefited from the well-studied scalability properties of large language models (LLMs) and seamlessly integrate infrastructures and techniques developed for LLMs, which optimize the training and inference of the models.

CUHK Interdisciplinary
Artificial Intelligence Research Institute

Generative Foundation Models

Contact Information

CUHK Interdisciplinary Artificial Intelligence Research Institute

Page Header

Generative Foundation Models

CUHK Interdisciplinary
Artificial Intelligence Research Institute