Petar Aleksic | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Petar Aleksic is active.

Explore More

Publication

Featured researches published by Petar Aleksic.

international conference on acoustics, speech, and signal processing | 2015

Improved recognition of contact names in voice commands

Petar Aleksic; Cyril Allauzen; David K. Elson; Aleksandar Kracun; Diego Melendo Casado; Pedro J. Moreno

The recognition of contact names in mobile-device voice commands is a challenging problem. Some of the difficulties include potentially infinite vocabularies, low probability of contact tokens in the language model (LM), increased false triggering of contact voice commands when none are spoken, and very large and noisy contact name lists. In this paper we suggest solutions for each of these difficulties. We address low prior probability and out-of-vocabulary contact name problems by using class-based language models, and creating on-the-fly user dependent small language models containing only relevant names. These models are compiled dynamically based on analysis of the mobile device state. Since these solutions can increase biasing towards contact names during recognition, it is crucial to monitor false triggering. To properly balance this bias we introduce the concept of a contacts insertion reward. This reward is tuned using both positive and negative test sets. We show significant recognition performance improvements on data sets in three languages, without negatively impacting the overall system performance. The improvements are obtained in both offline evaluations as well as on live traffic experiments.

spoken language technology workshop | 2016

Unsupervised context learning for speech recognition

Assaf Hurwitz Michaely; Mohammadreza Ghodsi; Zelin Wu; Justin Scheiner; Petar Aleksic

It has been shown in the literature that automatic speech recognition systems can greatly benefit from contextual information [1, 2, 3, 4, 5]. Contextual information can be used to simplify the beam search and improve recognition accuracy. Types of useful contextual information can include the name of the application the user is in, the contents of the users phone screen, the users location, a certain dialog state, etc. Building a separate language model for each of these types of context is not feasible due to limited resources or limited amounts of training data. In this paper we describe an approach for unsupervised learning of contextual information and automatic building of contextual biasing models. Our approach can be used to build a large number of small contextual models from a limited amount of available unsupervised training data. We describe how n-grams relevant for a particular context are automatically selected as well as how an optimal size of a final contextual model is chosen. Our experimental results show great accuracy improvements for several types of context.

spoken language technology workshop | 2016

Voice search language model adaptation using contextual information

Justin Scheiner; Ian Williams; Petar Aleksic

It has been shown that automatic speech recognition (ASR) system quality can be improved by augmenting n-gram language models with contextual information [1][2]. In the voice search domain, there are a large number of useful contextual signals for a given query. Some of these signals are speaker location, speaker identity, time of the query, etc. Each of these signals comes with relevant contextual information (e.g. location specific entities, favorite queries, recent popular queries) that is not included in the language models training data. We show that these contextual signals can be used to improve ASR system quality. This is achieved by adjusting n-gram language model probabilities on-the-fly based on the contextual information relevant for the current voice search request. We analyze three example sources of context: location context, previously entered typed and spoken queries. We present a set of approaches we have used to improve ASR quality using these sources of context. Our main objective is to automatically, in real time, take advantage of all available sources of contextual information. In addition, we investigate challenges that come with applying our approach to a number of languages (unsegmented languages, languages with diacritics) and present solutions used.

Archive | 2013