Towards Understanding Mixture of Experts in Deep Learning

By Z. Chen et al.
Published on Aug. 4, 2022
Read the original document by opening this link in a new tab.

Table of Contents

Abstract
1 Introduction
2 Related Work
3 Problem Setting and Preliminaries
3.1 Data distribution
3.2 Structure of the MoE layer
3.3 Training Algorithm

Summary

The document provides a formal study on the Mixture-of-Experts (MoE) layer in deep learning, focusing on how it enhances neural network learning performance. The authors investigate the cluster structure and non-linearity crucial to MoE success. They explore a classi cation problem and demonstrate the effectiveness of MoE with nonlinear CNNs. The paper delves into the theoretical understanding and empirical results, showcasing the specialization of experts and routing efficiency. Notable contributions include a negative result on single expert performance and the superior performance of nonlinear over linear experts in the MoE model. The study employs gradient descent for training and emphasizes the practical significance of MoE with nonlinear experts.
×
This is where the content will go.