Andrew W. Senior | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andrew W. Senior is active.

Explore More

Publication

Featured researches published by Andrew W. Senior.

IEEE Signal Processing Magazine | 2012

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

Geoffrey E. Hinton; Li Deng; Dong Yu; George E. Dahl; Abdel-rahman Mohamed; Navdeep Jaitly; Andrew W. Senior; Vincent Vanhoucke; Patrick Nguyen; Tara N. Sainath; Brian Kingsbury

Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models (GMMs) to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. An alternative way to evaluate the fit is to use a feed-forward neural network that takes several frames of coefficients as input and produces posterior probabilities over HMM states as output. Deep neural networks (DNNs) that have many hidden layers and are trained using new methods have been shown to outperform GMMs on a variety of speech recognition benchmarks, sometimes by a large margin. This article provides an overview of this progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.

Image and Vision Computing | 2006

Appearance models for occlusion handling

Andrew W. Senior; Arun Hampapur; Yingli Tian; Lisa M. Brown; Sharath Pankanti; Ruud M. Bolle

Abstract Objects in the world exhibit complex interactions. When captured in a video sequence, some interactions manifest themselves as occlusions. A visual tracking system must be able to track objects, which are partially or even fully occluded. In this paper we present a method of tracking objects through occlusions using appearance models. These models are used to localize objects during partial occlusions, detect complete occlusions and resolve depth ordering of objects during occlusions. This paper presents a tracking system which successfully deals with complex real world interactions, as demonstrated on the PETS 2001 dataset.

international conference on acoustics, speech, and signal processing | 2013

Statistical parametric speech synthesis using deep neural networks

Heiga Ze; Andrew W. Senior; Mike Schuster

Conventional approaches to statistical parametric speech synthesis typically use decision tree-clustered context-dependent hidden Markov models (HMMs) to represent probability densities of speech parameters given texts. Speech parameters are generated from the probability densities to maximize their output probabilities, then a speech waveform is reconstructed from the generated parameters. This approach is reasonably effective but has a couple of limitations, e.g. decision trees are inefficient to model complex context dependencies. This paper examines an alternative scheme that is based on a deep neural network (DNN). The relationship between input texts and their acoustic realizations is modeled by a DNN. The use of the DNN can address some limitations of the conventional approach. Experimental results show that the DNN-based systems outperformed the HMM-based systems with similar numbers of parameters.

international conference on acoustics, speech, and signal processing | 2015

Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks

Tara N. Sainath; Oriol Vinyals; Andrew W. Senior; Hasim Sak

Both Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) have shown improvements over Deep Neural Networks (DNNs) across a wide variety of speech recognition tasks. CNNs, LSTMs and DNNs are complementary in their modeling capabilities, as CNNs are good at reducing frequency variations, LSTMs are good at temporal modeling, and DNNs are appropriate for mapping features to a more separable space. In this paper, we take advantage of the complementarity of CNNs, LSTMs and DNNs by combining them into one unified architecture. We explore the proposed architecture, which we call CLDNN, on a variety of large vocabulary tasks, varying from 200 to 2,000 hours. We find that the CLDNN provides a 4-6% relative improvement in WER over an LSTM, the strongest of the three individual models.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 1998

An off-line cursive handwriting recognition system

Andrew W. Senior; Anthony J. Robinson

Describes a complete system for the recognition of off-line handwriting. Preprocessing techniques are described, including segmentation and normalization of word images to give invariance to scale, slant, slope and stroke thickness. Representation of the image is discussed and the skeleton and stroke features used are described. A recurrent neural network is used to estimate probabilities for the characters represented in the skeleton. The operation of the hidden Markov model that calculates the best word in the lexicon is also described. Issues of vocabulary choice, rejection, and out-of-vocabulary word recognition are discussed.

international conference on acoustics, speech, and signal processing | 2013

On rectified linear units for speech processing

Matthew D. Zeiler; Marc'Aurelio Ranzato; Rajat Monga; Mark Mao; Ke Yang; Quoc V. Le; Patrick Nguyen; Andrew W. Senior; Vincent Vanhoucke; Jeffrey Dean; Geoffrey E. Hinton

Deep neural networks have recently become the gold standard for acoustic modeling in speech recognition systems. The key computational unit of a deep network is a linear projection followed by a point-wise non-linearity, which is typically a logistic function. In this work, we show that we can improve generalization and make training of deep networks faster and simpler by substituting the logistic units with rectified linear units. These units are linear when their input is positive and zero otherwise. In a supervised setting, we can successfully train very deep nets from random initialization on a large vocabulary speech recognition task achieving lower word error rates than using a logistic network with the same topology. Similarly in an unsupervised setting, we show how we can learn sparse features that can be useful for discriminative tasks. All our experiments are executed in a distributed environment using several hundred machines and several hundred hours of speech data.

international conference on acoustics, speech, and signal processing | 2013

Multilingual acoustic models using distributed deep neural networks

Georg Heigold; Vincent Vanhoucke; Andrew W. Senior; Patrick Nguyen; Marc'Aurelio Ranzato; Matthieu Devin; Jeffrey Dean

Todays speech recognition technology is mature enough to be useful for many practical applications. In this context, it is of paramount importance to train accurate acoustic models for many languages within given resource constraints such as data, processing power, and time. Multilingual training has the potential to solve the data issue and close the performance gap between resource-rich and resource-scarce languages. Neural networks lend themselves naturally to parameter sharing across languages, and distributed implementations have made it feasible to train large networks. In this paper, we present experimental results for cross- and multi-lingual network training of eleven Romance languages on 10k hours of data in total. The average relative gains over the monolingual baselines are 4%/2% (data-scarce/data-rich languages) for cross- and 7%/2% for multi-lingual training. However, the additional gain from jointly training the languages on all data comes at an increased training time of roughly four weeks, compared to two weeks (monolingual) and one week (crosslingual).

pacific rim conference on multimedia | 2003

Smart surveillance: applications, technologies and implications

Arun Hampapur; Lisa M. Brown; Jonathan H. Connell; Sbarat Pankanti; Andrew W. Senior; Yingli Tian

Smart surveillance, is the use of automatic video analysis technologies in video surveillance applications. This paper attempts to answer a number of questions about smart surveillance: What are the applications of smart surveillance? What are the system architectures for smart surveillance? What are the key technologies? What are the some of the key technical challenges? and What are the implications of smart surveillance, both to security and privacy?.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2001

A combination fingerprint classifier

Andrew W. Senior

Fingerprint classification is an important indexing method for any large scale fingerprint recognition system or database as a method for reducing the number of fingerprints that need to be searched when looking for a matching print. Fingerprints are generally classified into broad categories based on global characteristics. This paper describes novel methods of classification using hidden Markov models and decision trees to recognize the ridge structure of the print, without needing to detect singular points. The methods are compared and combined with a standard fingerprint classification algorithm and results for the combination are presented using a standard database of fingerprint images. The paper also describes a method for achieving any level of accuracy required of the system by sacrificing the efficiency of the classifier. The accuracy of the combination classifier is shown to be higher than that of the two state-of-the-art systems tested under the same conditions.

machine vision applications | 2008

IBM smart surveillance system (S3): event based video surveillance system with an open and extensible framework

Yingli Tian; Lisa M. Brown; Arun Hampapur; Max Lu; Andrew W. Senior; Chiao-fe Shu

The increasing need for sophisticated surveillance systems and the move to a digital infrastructure has transformed surveillance into a large scale data analysis and management challenge. Smart surveillance systems use automatic image understanding techniques to extract information from the surveillance data. While the majority of the research and commercial systems have focused on the information extraction aspect of the challenge, very few systems have explored the use of extracted information in the search, retrieval, data management and investigation context. The IBM smart surveillance system (S3) is one of the few advanced surveillance systems which provides not only the capability to automatically monitor a scene but also the capability to manage the surveillance data, perform event based retrieval, receive real time event alerts thru standard web infrastructure and extract long term statistical patterns of activity. The IBM S3 is easily customized to fit the requirements of different applications by using an open-standards based architecture for surveillance.

Explore More