Few-shot Text-to-Image Generation

By Y. Zhou et al
Read the original document by opening this link in a new tab.

Table of Contents

1. Introduction
2. Preliminaries: Probing Multimodal Feature Space
3. Proposed Method: Retrieval-then-Optimization

Summary

This paper presents a novel method for pre-training text-to-image generation models on image-only datasets. The method involves a retrieval-then-optimization procedure to synthesize pseudo text features for better alignment with images. The proposed method, LAFITE 2, demonstrates good transferability in various scenarios, including few-shot, semi-supervised, and fully-supervised text-to-image generation. Extensive experiments show the effectiveness of the approach, achieving state-of-the-art results on fully-supervised text-to-image generation tasks.
×
This is where the content will go.