Visual Reasoning with a General Conditioning Layer
By E. Perez et al
Read the original document by opening this link in a new tab.
Table of Contents
Abstract
1 Introduction
2 Method
2.1 Feature-wise Linear Modulation
2.2 Model
3 Related Work
4 Experiments
4.1 CLEVR Task
Summary
This paper introduces a general-purpose conditioning method for neural networks called FiLM (Feature-wise Linear Modulation). FiLM layers influence neural network computation via a simple, feature-wise affine transformation based on conditioning information. The authors show that FiLM layers are highly effective for visual reasoning tasks, such as answering image-related questions which require a multi-step, high-level process. The paper presents the method, its effectiveness in visual reasoning tasks, and comparisons with existing methods. Experimental results on the CLEVR dataset demonstrate the strength of the FiLM model in achieving state-of-the-art performance in visual reasoning tasks.