Ian R. Lane
Carnegie Mellon University
                                 Network
                            
                            Latest external collaboration on country level. Dive into details by clicking on the dots.
                                 Publication
                            
                            Featured researches published by Ian R. Lane.
Machine Translation | 2007
Yik-Cheung Tam; Ian R. Lane; Tanja Schultz
We propose a novel approach to cross-lingual language model and translation lexicon adaptation for statistical machine translation (SMT) based on bilingual latent semantic analysis. Bilingual LSA enables latent topic distributions to be efficiently transferred across languages by enforcing a one-to-one topic correspondence during training. Using the proposed bilingual LSA framework, model adaptation can be performed by, first, inferring the topic posterior distribution of the source text and then applying the inferred distribution to an n-gram language model of the target language and translation lexicon via marginal adaptation. The background phrase table is enhanced with the additional phrase scores computed using the adapted translation lexicon. The proposed framework also features rapid bootstrapping of LSA models for new languages based on a source LSA model of another language. Our approach is evaluated on the Chinese–English MT06 test set using the medium-scale SMT system and the GALE SMT system measured in BLEU and NIST scores. Improvement in both scores is observed on both systems when the adapted language model and the adapted translation lexicon are applied individually. When the adapted language model and the adapted translation lexicon are applied simultaneously, the gain is additive. At the 95% confidence interval of the unadapted baseline system, the gain in both scores is statistically significant using the medium-scale SMT system, while the gain in the NIST score is statistically significant using the GALE SMT system.
conference of the international speech communication association | 2016
Bing Liu; Ian R. Lane
Attention-based encoder-decoder neural network models have recently shown promising results in machine translation and speech recognition. In this work, we propose an attention-based neural network model for joint intent detection and slot filling, both of which are critical steps for many speech understanding and dialog systems. Unlike in machine translation and speech recognition, alignment is explicit in slot filling. We explore different strategies in incorporating this alignment information to the encoder-decoder framework. Learning from the attention mechanism in encoder-decoder model, we further propose introducing attention to the alignment-based RNN models. Such attentions provide additional information to the intent classification and slot label prediction. Our independent task models achieve state-of-the-art intent detection error rate and slot filling F1 score on the benchmark ATIS task. Our joint training model further obtains 0.56% absolute (23.8% relative) error reduction on intent detection and 0.23% absolute gain on slot filling over the independent task models.
international conference on acoustics, speech, and signal processing | 2008
Matthias Paulik; Sharath Rao; Ian R. Lane; Stephan Vogel; Tanja Schultz
Sentence segmentation and punctuation recovery are critical components for effective spoken language translation (SLT). In this paper we describe our recent work on sentence segmentation and punctuation recovery for three different language pairs, namely for English-to-Spanish, Arabic-to-English and Chinese-to-English. We show that the proposed approach works equally well in these very different language pairs. Furthermore, we introduce two features computed from the translation beam-search lattice that indicate if phrasal and target language model context is jeopardized when segmenting at a given word boundary. These features enable us to introduce short intra-sentence segments without degrading translation performance.
annual meeting of the special interest group on discourse and dialogue | 2016
Bing Liu; Ian R. Lane
Speaker intent detection and semantic slot filling are two critical tasks in spoken language understanding (SLU) for dialogue systems. In this paper, we describe a recurrent neural network (RNN) model that jointly performs intent detection, slot filling, and language modeling. The neural network model keeps updating the intent estimation as word in the transcribed utterance arrives and uses it as contextual features in the joint model. Evaluation of the language model and online SLU model is made on the ATIS benchmarking data set. On language modeling task, our joint model achieves 11.8% relative reduction on perplexity comparing to the independent training language model. On SLU tasks, our joint model outperforms the independent task training model by 22.3% on intent detection error rate, with slight degradation on slot filling F1 score. The joint model also shows advantageous performance in the realistic ASR settings with noisy speech input.
annual meeting of the special interest group on discourse and dialogue | 2014
Teruhisa Misu; Antoine Raux; Rakesh Gupta; Ian R. Lane
In this paper, we address issues in situated language understanding in a rapidly changing environment ‐ a moving car. Specifically, we propose methods for understanding user queries about specific target buildings in their surroundings. Unlike previous studies on physically situated interactions such as interaction with mobile robots, the task is very sensitive to timing because the spatial relation between the car and the target is changing while the user is speaking. We collected situated utterances from drivers using our research system, Townsurfer, which is embedded in a real vehicle. Based on this data, we analyze the timing of user queries, spatial relationships between the car and targets, head pose of the user, and linguistic cues. Optimized on the data, our algorithms improved the target identification rate by 24.1% absolute.
ieee automatic speech recognition and understanding workshop | 2009
Hassan Al-Haj; Roger Hsiao; Ian R. Lane; Alan W. Black; Alex Waibel
Short vowels in Arabic are normally omitted in written text which leads to ambiguity in the pronunciation. This is even more pronounced for dialectal Arabic where a single word can be pronounced quite differently based on the speakers nationality, level of education, social class and religion. In this paper we focus on pronunciation modeling for Iraqi-Arabic speech. We introduce multiple pronunciations into the Iraqi speech recognition lexicon, and compare the performance, when weights computed via forced alignment are assigned to the different pronunciations of a word. Incorporating multiple pronunciations improved recognition accuracy compared to a single pronunciation baseline and introducing pronunciation weights further improved performance. Using these techniques an absolute reduction in word-error-rate of 2.4% was obtained compared to the baseline system.
north american chapter of the association for computational linguistics | 2009
Nguyen Bach; Roger Hsiao; Matthias Eck; Paisarn Charoenpornsawat; Stephan Vogel; Tanja Schultz; Ian R. Lane; Alex Waibel; Alan W. Black
In building practical two-way speech-to-speech translation systems the end user will always wish to use the system in an environment different from the original training data. As with all speech systems, it is important to allow the system to adapt to the actual usage situations. This paper investigates how a speech-to-speech translation system can adapt day-to-day from collected data on day one to improve performance on day two. The platform is the CMU Iraqi-English portable two-way speech-to-speech system as developed under the DARPA TransTac program. We show how machine translation, speech recognition and overall system performance can be improved on day 2 after adapting from day 1 in both a supervised and unsupervised way.
conference of the international speech communication association | 2016
William Chan; Ian R. Lane
In this paper, we explore the use of attention-based models for online speech recognition without the usage of language models or searching. Our model is based on an attention-based neural network which directly emits English/Mandarin characters as outputs. The model jointly learns the pronunciation, acoustic and language model. We evaluate the model for online speech recognition on English and Mandarin. On English, we achieve a 33.0% WER on the WSJ task, or a 5.4% absolute reduction in WER compared to an online CTC based system. We also introduce a new training method and show how we can learn joint Mandarin Character-Pinyin models. Our Mandarin character only model achieves a 72% CER on the GALE Phase 2 evaluation, and with our joint Mandarin Character-Pinyin model, we achieve 59.3% CER or 12.7% absolute improvement over the character only model.
international conference on acoustics, speech, and signal processing | 2012
Paul Maergner; Alex Waibel; Ian R. Lane
In this work, we propose a novel method for vocabulary selection to automatically adapt automatic speech recognition systems to the diverse topics that occur in educational and scientific lectures. Utilizing materials that are available before the lecture begins, such as lecture slides, our proposed framework iteratively searches for related documents on the web and generates a lecture-specific vocabulary based on the resulting documents. In this paper, we propose a novel method for vocabulary selection where we first collect documents similar to an initial seed document and then rank the resulting vocabulary based on a score which is calculated using a combination of word features. This is a critical component for adaptation that has typically been overlooked in prior works. On the inter ACT German-English simultaneous lecture translation system our proposed approach significantly improved vocabulary coverage, reducing the out-of-vocabulary rate, on average by 57.0% and up to 84.9%, compared to a lecture-independent baseline. Furthermore, our approach reduced the word error rate, by 12.5% on average and up to 25.3%, compared to a lecture-independent baseline.
international conference on acoustics, speech, and signal processing | 2003
Ian R. Lane; Tatsuya Kawahara; Tomoko Matsui
An efficient, scalable speech recognition architecture is proposed for multidomain dialog systems by combining topic detection and topic-dependent language modeling. The inferred domain is automatically detected from the users utterance, and speech recognition is then performed with an appropriate domain-dependent language model. The architecture improves accuracy and efficiency over current approaches and is scaleable to a large number of domains. In this paper, a novel framework using a multilayer hierarchy of language models is introduced in order to improve robustness against topic detection errors. The proposed system provides a relative reduction in WER of 10.5% over a single language model system. Furthermore it achieves an accuracy that is comparable to using multiple language models in parallel while using only a fraction of the computational cost.
