Learning by Turning: Neural Architecture Aware Optimisation
By Yang Liu et al
Published on Sept. 18, 2021
Read the original document by opening this link in a new tab.
Table of Contents
Abstract
1. Introduction
2. Related work
2.1. Neural Architecture Design
2.2. Descent Methods in Deep Learning
2.3. Homeostatic Control in Neuroscience
3. Background Theory
3.1. Balanced Network Architectures
3.2. Stable Descent Steps
4. Nero: the Neuronal Rotator
5. Experiments
5.1. Constraints Help Nero
5.2. Per-Neuron Updates are a Good Middle Ground
Summary
Descent methods for deep networks are notoriously capricious, requiring careful tuning of parameters. This paper introduces Nero, a new optimiser that performs well without the need for extensive tuning. Nero combines ideas of projected gradient descent and neuron-specific updates. The paper discusses the impact of this optimiser on theories of generalisation in deep learning. It explores neural architecture design, stable descent steps, and introduces the concept of balanced networks. Nero aims to reduce the burden of hyperparameter tuning by incorporating architectural information into the optimiser. Experimental results show the effectiveness of Nero with constraints switched on and per-neuron updates. Overall, the paper presents a novel approach to optimisation in deep learning.