Enrico Bocchieri
AT&T
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Enrico Bocchieri.
international conference on acoustics, speech, and signal processing | 2010
Enrico Bocchieri; Diamantino Antonio Caseiro
The query distribution, in the speech recognition applications of directory assistance (DA) and voice-search, depends on the customers location. This motivates the research on query models conditioned on the user location, here denoted as local models. We describe and test our methods for the estimation of local models with various degrees of spacial “granularity”, for the recognition of city-state (sub-task of DA) and for the recognition of business listings, spoken over iPhones in a nation-wide business-listing voice-search service. Our local language models improve the accuracy of city-state by 2.4% absolute (32% relative error reduction), and of voice-search by 2.2% (7% relative).
international conference on acoustics, speech, and signal processing | 2013
Enrico Bocchieri; Dimitrios Dimitriadis
Micro-modulation components such as the formant frequencies are very important characteristics of spoken speech that have allowed great performance improvements in small-vocabulary ASR tasks. Yet they have limited use in large vocabulary ASR applications. To enable the successful application, in real-life tasks, of these frequency measures, we investigate their combination with traditional features (MFCCs and PLPs) by linear (e.g. HDA), and non-linear (bottleneck MLP) feature transforms. Our experiments show that such integration, using non-linear MLP-based transforms, of micro-modulation and cepstral features greatly improves the ASR with respect to the cepstral features alone. We have applied this novel feature extraction scheme onto two very different tasks, i.e. a clean speech task (DARPA-WSJ) and a real-life, open-vocabulary, mobile search task (Speak4itSM), always reporting improved performance. We report relative error rate reduction of 15% for the Speak4itSM task, and similar improvements, up to 21%, for the WSJ task.
international conference on acoustics, speech, and signal processing | 2011
Enrico Bocchieri; Diamantino Antonio Caseiro; Dimitrios Dimitriadis
This paper reports on the development and advances in automatic speech recognition for the AT&T Speak4it® voice-search application. With Speak4it as real-life example, we show the effectiveness of acoustic model (AM) and language model (LM) estimation (adaptation and training) on relatively small amounts of application field-data. We then introduce algorithmic improvements concerning the use of sentence length in LM, of non-contextual features in AM decision-trees, and of the Teager energy in the acoustic front-end. The combination of these algorithms, integrated into the AT&T Watson recognizer, yields substantial accuracy improvements. LM and AM estimation on field-data samples increases the word accuracy from 66.4% to 77.1%, a relative word error reduction of 32%. The algorithmic improvements increase the accuracy to 79.7%, an additional 11.3% relative error reduction.
international conference on acoustics, speech, and signal processing | 2011
Dimitrios Dimitriadis; Enrico Bocchieri; Diamantino Antonio Caseiro
In previously published work, we have proposed a novel feature extraction algorithm, based on the Teager-Kaiser energy estimates, that approximates human auditory characteristics and that is more robust to sub-band noise than the mean-square estimates of standard MFCCs. We refer to the novel features as Teager energy cepstrum coefficients (TECC). Herein, we study the TECC performance under additive noise and suggest how to predict the noisy TECC deviations by estimating the subband SNR values. Then, we report on the effectiveness of the TECCs when they are used in the acoustic front-end of the state-of-the-art AT&T WATSON large-vocabulary recognizer. The TECC front-end is tested in the real-life voice-search Speak4it application for mobile devices. It provides a 6% relative word error rate reduction w.r.t. the MFCC front-end, using the same high performance language model, lexicon and acoustic model training.
Archive | 1993
Enrico Bocchieri; Sedat Ibrahim Gokcen; Rajendra Prasad Mikkilineni; David Bjorn Roe; Jay Gordon Wilpon
Archive | 2009
Enrico Bocchieri; Diamantino Antonio Caseiro
Archive | 2006
Charles Douglas Blewett; Enrico Bocchieri
Archive | 2010
Enrico Bocchieri; Dimitrios Dimitriadis; Horst J. Schroeter
Archive | 2013
Benjamin J. Stern; Enrico Bocchieri; Alistair D. Conkie; Danilo Giulianelli
Archive | 2011
Enrico Bocchieri; Diamantino Antonio Caseiro; Dimitrios Dimitriadis