On the Implicit Bias in Deep-Learning Algorithms

By Gal Vardi et al
Read the original document by opening this link in a new tab.

Table of Contents

Abstract
1 Introduction
2 The double-descent phenomenon
3 Implicit bias in classification
3.1 Logistic regression
3.2 Linear networks
3.3 Homogeneous neural networks
3.4 Extensions
4 Implicit bias in regression

Summary

Gradient-based deep-learning algorithms exhibit remarkable performance in practice, but it is not well-understood why they are able to generalize despite having more parameters than training examples. It is believed that implicit bias is a key factor in their ability to generalize, and hence it was widely studied in recent years. In this short survey, we explain the notion of implicit bias, review main results and discuss their implications. Deep learning has been highly successful in recent years and has led to dramatic improvements in multiple domains. Deep-learning algorithms often generalize quite well in practice, namely, given access to labeled training data, they return neural networks that correctly label unobserved test data. However, despite much research our theoretical understanding of generalization in deep learning is still limited. Neural networks used in practice often have far more learnable parameters than training examples. In such overparameterized settings, one might expect overfitting to occur, that is, the learned network might perform well on the training dataset and perform poorly on test data. Surprisingly, it seems that gradient-based deep-learning algorithms prefer the solutions that generalize well. Decades of research in learning theory suggest that in order to avoid overfitting one should use a model which is 'not more expressive than necessary'. Namely, the model should be able to perform well on the training data, but should be as 'simple' as possible. This idea goes back to the Occam’s Razor philosophical principle, which argues that we should prefer simple explanations over complicated ones. The article discusses the implicit bias in training neural networks using gradient-based methods, the double-descent phenomenon, and various aspects of implicit bias in classification and regression tasks.
×
This is where the content will go.