Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Nitish Shirish Keskar is active.

Publication


Featured researches published by Nitish Shirish Keskar.


Optimization Methods & Software | 2016

A second-order method for convex 1-regularized optimization with active-set prediction

Nitish Shirish Keskar; Jorge Nocedal; Figen Öztoprak; Andreas Wächter

We describe an active-set method for the minimization of an objective function φ that is the sum of a smooth convex function f and an -regularization term. A distinctive feature of the method is the way in which active-set identification and second-order subspace minimization steps are integrated to combine the predictive power of the two approaches. At every iteration, the algorithm selects a candidate set of free and fixed variables, performs an (inexact) subspace phase, and then assesses the quality of the new active set. If it is not judged to be acceptable, then the set of free variables is restricted and a new active-set prediction is made. We establish global convergence for our approach under the assumptions of Lipschitz-continuity and strong-convexity of f, and compare the new method against state-of-the-art codes.


international conference on acoustics, speech, and signal processing | 2015

A nonmonotone learning rate strategy for SGD training of deep neural networks

Nitish Shirish Keskar; George Saon

The algorithm of choice for cross-entropy training of deep neural network (DNN) acoustic models is mini-batch stochastic gradient descent (SGD). One of the important decisions for this algorithm is the learning rate strategy (also called stepsize selection). We investigate several existing schemes and propose a new learning rate strategy which is inspired by nonmonotone linesearch techniques in nonlinear optimization and the NewBob algorithm. This strategy was found to be relatively insensitive to poorly tuned parameters and resulted in lower word error rates compared to Newbob on two different LVCSR tasks (English broadcast news transcription 50 hours and Switchboard telephone conversations 300 hours). Further, we discuss some justifications for the method by briefly linking it to results in optimization theory.


Optimization Methods & Software | 2017

A limited-memory quasi-Newton algorithm for bound-constrained non-smooth optimization

Nitish Shirish Keskar; Andreas Wächter

We consider the problem of minimizing a continuous function that may be non-smooth and non-convex, subject to bound constraints. We propose an algorithm that uses the L-BFGS quasi-Newton approximation of the problems curvature together with a variant of the weak Wolfe line search. The key ingredient of the method is an active-set selection strategy that defines the subspace in which search directions are computed. To overcome the inherent shortsightedness of the gradient for a non-smooth function, we propose two strategies. The first relies on an approximation of the ε-minimum norm subgradient, and the second uses an iterative corrective loop that augments the active set based on the resulting search directions. While theoretical convergence guarantees have been elusive even for the unconstrained case, we present numerical results on a set of standard test problems to illustrate the efficacy of our approach, using an open-source Python implementation of the proposed algorithm.


international conference on learning representations | 2017

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

Nitish Shirish Keskar; Dheevatsa Mudigere; Jorge Nocedal; Mikhail Smelyanskiy; Ping Tak Peter Tang


international conference on learning representations | 2018

Regularizing and Optimizing LSTM Language Models

Stephen Merity; Nitish Shirish Keskar; Richard Socher


arXiv: Computation and Language | 2018

An Analysis of Neural Language Modeling at Multiple Scales.

Stephen Merity; Nitish Shirish Keskar; Richard Socher


arXiv: Learning | 2017

Improving Generalization Performance by Switching from Adam to SGD.

Nitish Shirish Keskar; Richard Socher


european conference on machine learning | 2016

adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs

Nitish Shirish Keskar; Albert S. Berahas


arXiv: Artificial Intelligence | 2018

Weighted Transformer Network for Machine Translation

Karim Ahmed; Nitish Shirish Keskar; Richard Socher


arXiv: Learning | 2018

Using Mode Connectivity for Loss Landscape Analysis.

Akhilesh Gotmare; Nitish Shirish Keskar; Caiming Xiong; Richard Socher

Collaboration


Dive into the Nitish Shirish Keskar's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge