Distilling The Knowledge In A Neural Network

By Geoffrey Hinton et al
Published on March 9, 2015
Read the original document by opening this link in a new tab.

Table of Contents

Abstract
1 Introduction
Distillation
Preliminary experiments on MNIST
Experiments on speech recognition
Training ensembles of specialists on very big datasets

Summary

The document discusses the concept of distilling knowledge in a neural network, improving machine learning algorithms by training multiple models, and transferring knowledge from a large model to a smaller one through distillation. It includes experiments on MNIST and speech recognition, as well as training ensembles of specialists on large datasets like JFT. The results show significant improvements in test accuracy by using specialist models in conjunction with the baseline system.
×
This is where the content will go.