2019 13th International conference on Sampling Theory and Applications (SampTA) | 2019

Overparameterized Nonlinear Optimization with Applications to Neural Nets

 

Abstract


Occam’s razor is a fundamental problem-solving principal and states that one should seek the simplest possible explanation. Indeed, classical machine learning models such as (sparse) linear regression aims to find simple explanations to data by using with as few parameters as possible. On the other hand, modern techniques such as deep networks are often trained in the overparameterized regime where the model size exceeds the size of the training dataset. While this increases the risk of overfitting and the complexity of the explanation, deep networks are known to have good generalization properties. In this talk, we take a step towards resolving this paradox: We show that solution found by first order methods, such as gradient descent, has the property that it has near shortest distance to the initialization of the algorithm among all other solutions. We also advocate that shortest distance property can be a good proxy for the simplest explanation. We discuss the implications of these results on neural net training and also highlight some outstanding challenges.

Volume None
Pages 1-4
DOI 10.1109/SampTA45681.2019.9030880
Language English
Journal 2019 13th International conference on Sampling Theory and Applications (SampTA)

Full Text