Tensor Programs IIb: Architectural Universality of Neural Tangent Kernel Training Dynamics

By Greg Yang et al
Published on June 10, 2021
Read the original document by opening this link in a new tab.

Table of Contents

1. Introduction
2. Background
3. Motivating Examples
4. Tensor Programs

Summary

The document discusses the architectural universality of the Neural Tangent Kernel (NTK) behavior during training dynamics. It shows that neural networks follow a kernel gradient descent dynamics in function space during training. The Tensor Programs technique is applied to analyze the SGD dynamics. The paper introduces a new graphical notation for Tensor Programs and demonstrates the universality of the NTK theory for various architectures. The analysis involves examples with 1-hidden-layer and 2-hidden-layers MLP architectures. The results are obtained by unrolling SGD updates and understanding the interactions of weight matrices undergoing training.
×
This is where the content will go.