Nitish Shirish Keskar

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nitish Shirish Keskar is active.

Explore More

Publication

Featured researches published by Nitish Shirish Keskar.

Optimization Methods & Software | 2016

A second-order method for convex 1-regularized optimization with active-set prediction

Nitish Shirish Keskar; Jorge Nocedal; Figen Öztoprak; Andreas Wächter

We describe an active-set method for the minimization of an objective function φ that is the sum of a smooth convex function f and an -regularization term. A distinctive feature of the method is the way in which active-set identification and second-order subspace minimization steps are integrated to combine the predictive power of the two approaches. At every iteration, the algorithm selects a candidate set of free and fixed variables, performs an (inexact) subspace phase, and then assesses the quality of the new active set. If it is not judged to be acceptable, then the set of free variables is restricted and a new active-set prediction is made. We establish global convergence for our approach under the assumptions of Lipschitz-continuity and strong-convexity of f, and compare the new method against state-of-the-art codes.

international conference on acoustics, speech, and signal processing | 2015

A nonmonotone learning rate strategy for SGD training of deep neural networks

Nitish Shirish Keskar; George Saon

The algorithm of choice for cross-entropy training of deep neural network (DNN) acoustic models is mini-batch stochastic gradient descent (SGD). One of the important decisions for this algorithm is the learning rate strategy (also called stepsize selection). We investigate several existing schemes and propose a new learning rate strategy which is inspired by nonmonotone linesearch techniques in nonlinear optimization and the NewBob algorithm. This strategy was found to be relatively insensitive to poorly tuned parameters and resulted in lower word error rates compared to Newbob on two different LVCSR tasks (English broadcast news transcription 50 hours and Switchboard telephone conversations 300 hours). Further, we discuss some justifications for the method by briefly linking it to results in optimization theory.

Optimization Methods & Software | 2017

A limited-memory quasi-Newton algorithm for bound-constrained non-smooth optimization

Nitish Shirish Keskar; Andreas Wächter

We consider the problem of minimizing a continuous function that may be non-smooth and non-convex, subject to bound constraints. We propose an algorithm that uses the L-BFGS quasi-Newton approximation of the problems curvature together with a variant of the weak Wolfe line search. The key ingredient of the method is an active-set selection strategy that defines the subspace in which search directions are computed. To overcome the inherent shortsightedness of the gradient for a non-smooth function, we propose two strategies. The first relies on an approximation of the ε-minimum norm subgradient, and the second uses an iterative corrective loop that augments the active set based on the resulting search directions. While theoretical convergence guarantees have been elusive even for the unconstrained case, we present numerical results on a set of standard test problems to illustrate the efficacy of our approach, using an open-source Python implementation of the proposed algorithm.

international conference on learning representations | 2017