Scaling Limits Of Wide Neural Networks With Weight Sharing: Gaussian Process Behavior, Gradient Independence, And Neural Tangent Kernel Derivation
By Greg Yang et al.
Read the original document by opening this link in a new tab.
Table of Contents
Abstract
1. Introduction
2. Related Works and Our Corollaries
2.1. Gaussian Behavior of Wide Neural Networks
2.2. Signal Propagation in Neural Networks
2.3. Neural Tangent Kernel
2.4. Other Works
3. Tensor Programs
Summary
This paper explores scaling limits of wide random neural networks with weight sharing, focusing on Gaussian process behavior, gradient independence, and neural tangent kernel derivation. The authors introduce a tensor program framework that can express various neural network computations and analyze the scaling limits of these networks. They discuss the convergence of random neural networks to Gaussian processes, conditions for gradient independence, and the convergence of the Neural Tangent Kernel. The paper provides insights into the design of stronger Gaussian Processes, initialization schemes, and understanding of stochastic gradient descent dynamics.