Enrico Bocchieri | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Enrico Bocchieri is active.

Explore More

Publication

Featured researches published by Enrico Bocchieri.

international conference on acoustics, speech, and signal processing | 2010

Use of geographical meta-data in ASR language and acoustic models

Enrico Bocchieri; Diamantino Antonio Caseiro

The query distribution, in the speech recognition applications of directory assistance (DA) and voice-search, depends on the customers location. This motivates the research on query models conditioned on the user location, here denoted as local models. We describe and test our methods for the estimation of local models with various degrees of spacial “granularity”, for the recognition of city-state (sub-task of DA) and for the recognition of business listings, spoken over iPhones in a nation-wide business-listing voice-search service. Our local language models improve the accuracy of city-state by 2.4% absolute (32% relative error reduction), and of voice-search by 2.2% (7% relative).

international conference on acoustics, speech, and signal processing | 2013

Investigating deep neural network based transforms of robust audio features for LVCSR

Enrico Bocchieri; Dimitrios Dimitriadis

Micro-modulation components such as the formant frequencies are very important characteristics of spoken speech that have allowed great performance improvements in small-vocabulary ASR tasks. Yet they have limited use in large vocabulary ASR applications. To enable the successful application, in real-life tasks, of these frequency measures, we investigate their combination with traditional features (MFCCs and PLPs) by linear (e.g. HDA), and non-linear (bottleneck MLP) feature transforms. Our experiments show that such integration, using non-linear MLP-based transforms, of micro-modulation and cepstral features greatly improves the ASR with respect to the cepstral features alone. We have applied this novel feature extraction scheme onto two very different tasks, i.e. a clean speech task (DARPA-WSJ) and a real-life, open-vocabulary, mobile search task (Speak4itSM), always reporting improved performance. We report relative error rate reduction of 15% for the Speak4itSM task, and similar improvements, up to 21%, for the WSJ task.

international conference on acoustics, speech, and signal processing | 2011

Speech recognition modeling advances for mobile voice search

Enrico Bocchieri; Diamantino Antonio Caseiro; Dimitrios Dimitriadis

This paper reports on the development and advances in automatic speech recognition for the AT&T Speak4it® voice-search application. With Speak4it as real-life example, we show the effectiveness of acoustic model (AM) and language model (LM) estimation (adaptation and training) on relatively small amounts of application field-data. We then introduce algorithmic improvements concerning the use of sentence length in LM, of non-contextual features in AM decision-trees, and of the Teager energy in the acoustic front-end. The combination of these algorithms, integrated into the AT&T Watson recognizer, yields substantial accuracy improvements. LM and AM estimation on field-data samples increases the word accuracy from 66.4% to 77.1%, a relative word error reduction of 32%. The algorithmic improvements increase the accuracy to 79.7%, an additional 11.3% relative error reduction.

international conference on acoustics, speech, and signal processing | 2011

An alternative front-end for the AT&T WATSON LV-CSR system

Dimitrios Dimitriadis; Enrico Bocchieri; Diamantino Antonio Caseiro

In previously published work, we have proposed a novel feature extraction algorithm, based on the Teager-Kaiser energy estimates, that approximates human auditory characteristics and that is more robust to sub-band noise than the mean-square estimates of standard MFCCs. We refer to the novel features as Teager energy cepstrum coefficients (TECC). Herein, we study the TECC performance under additive noise and suggest how to predict the noisy TECC deviations by estimating the subband SNR values. Then, we report on the effectiveness of the TECCs when they are used in the acoustic front-end of the state-of-the-art AT&T WATSON large-vocabulary recognizer. The TECC front-end is tested in the real-life voice-search Speak4it application for mobile devices. It provides a 6% relative word error rate reduction w.r.t. the MFCC front-end, using the same high performance language model, lexicon and acoustic model training.

Archive | 1993