A universal image generation framework powered by visual in-context learning for versatile task execution and generalization.
更新时间:2025-04-15 09:51:13
VisualCloze is a cutting-edge image generation framework designed to handle a wide variety of visual tasks. Unlike traditional task-specific models, it leverages visual in-context learning to identify and perform tasks from visual demonstrations. This framework excels in its ability to generalize to unseen tasks, making it a powerful tool for image generation across various domains. By incorporating a graph-structured dataset (Graph200K), it enhances task density and enables the transfer of knowledge between related tasks. VisualCloze represents a significant leap in the realm of generative models, moving beyond language-based instructions to a more intuitive, visual approach to task execution.
To use VisualCloze, users provide a set of visual demonstrations that outline the task they wish to execute. The model interprets these visual cues to perform tasks such as image generation, restoration, or editing. Through in-context learning, VisualCloze adapts to new tasks without requiring task-specific training, making it versatile and efficient. Users can also take advantage of the Graph200K dataset, which enhances the model’s ability to handle complex, multi-task problems. The model integrates seamlessly with advanced infilling models like FLUX, ensuring high-quality results without the need for additional architectural changes.
VisualCloze is available for use through an online demo and open-source code. The framework leverages advanced models and datasets, such as FLUX for image infilling and the Graph200K dataset for multi-task learning. Pricing details are not explicitly mentioned, as the framework appears to be open-source and freely accessible through platforms like Hugging Face.
VisualCloze was developed by a team of researchers from Nankai University, Beijing University of Posts and Telecommunications, Tsinghua University, Shanghai AI Laboratory, and The Chinese University of Hong Kong. Key contributors include Zhong-Yu Li, Ruoyi Du, Juncheng Yan, Le Zhuo, Zhen Li, Peng Gao, Zhanyu Ma, and Ming-Ming Cheng.
For inquiries, you can contact the team through the following email addresses: Zhen Li ([email protected]) and Ming-Ming Cheng ([email protected]).
Follow VisualCloze on social media: - Twitter: @VisualCloze - Instagram: @visualcloze