Yun-Cheng Ju | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yun-Cheng Ju is active.

Explore More

Publication

Featured researches published by Yun-Cheng Ju.

international conference on acoustics, speech, and signal processing | 2008

Live search for mobile:Web services by voice on the cellphone

Alex Acero; Neal Bernstein; Robert L. Chambers; Yun-Cheng Ju; Xinggang Li; Julian J. Odell; Patrick Nguyen; Oliver Scholz; Geoffrey Zweig

Live search for mobile is a cellphone application that allows users to interact with Web-based information portals. Currently the implementation is focused on information related to local businesses: their phone numbers and addresses, directions, reviews, maps of the surrounding area, and traffic. This paper describes a speech-recognition interface which was recently developed for the application, which allows the users to interact by voice. The paper presents the overall architecture, the user interface, the design and implementation of the speech recognition grammars, and initial performance results indicating that for sentence level utterance recognition we achieve 60 to 65% of human capability.

IEEE Signal Processing Magazine | 2008

An introduction to voice search

Ye-Yi Wang; Dong Yu; Yun-Cheng Ju; Alex Acero

Voice search is the technology underlying many spoken dialog systems (SDSs) that provide users with the information they request with a spoken query. The information normally exists in a large database, and the query has to be compared with a field in the database to obtain the relevant information. The contents of the field, such as business or product names, are often unstructured text. This article categorized spoken dialog technology into form filling, call routing, and voice search, and reviewed the voice search technology. The categorization was made from the technological perspective. It is important to note that a single SDS may apply the technology from multiple categories. Robustness is the central issue in voice search. The technology in acoustic modeling aims at improved robustness to environment noise, different channel conditions, and speaker variance; the pronunciation research addresses the problem of unseen word pronunciation and pronunciation variance; the language model research focuses on linguistic variance; the studies in search give rise to improved robustness to linguistic variance and ASR errors; the dialog management research enables graceful recovery from confusions and understanding errors; and the learning in the feedback loop speeds up system tuning for more robust performance. While tremendous achievements have been accomplished in the past decade on voice search, large challenges remain. Many voice search dialog systems have automation rates around or below 50% in field trials.

international conference on acoustics, speech, and signal processing | 1997

Recent improvements on Microsoft's trainable text-to-speech system-Whistler

Xuedong Huang; Alex Acero; Hsiao-Wuen Hon; Yun-Cheng Ju; Jingsong Liu; Scott Meredith; Mike Plumpe

The Whistler text-to-speech engine was designed so that we can automatically construct the model parameters from training data. This paper focuses on the improvements on prosody and acoustic modeling, which are all derived through the use of probabilistic learning methods. Whistler can produce synthetic speech that sounds very natural and resembles the acoustic and prosodic characteristics of the original speaker. The underlying technologies used in Whistler can significantly facilitate the process of creating generic TTS systems for a new language, a new voice, or a new speech style. Whisper TTS engine supports Microsoft Speech API and requires less than 3 MB of working memory.

human factors in computing systems | 2010

Cars, calls, and cognition: investigating driving and divided attention

Shamsi T. Iqbal; Yun-Cheng Ju; Eric Horvitz

Conversing on cell phones while driving an automobile is a common practice. We examine the interference of the cognitive load of conversational dialog with driving tasks, with the goal of identifying better and worse times for conversations during driving. We present results from a controlled study involving 18 users using a driving simulator. The driving complexity and conversation type were manipulated in the study, and performance was measured for factors related to both the primary driving task and secondary conversation task. Results showed significant interactions between the primary and secondary tasks, where certain combinations of complexity and conversations were found especially detrimental to driving. We present the studies and analyses and relate the findings to prior work on multiple resource models of cognition. We discuss how the results can frame thinking about policies and technologies aimed at enhancing driving safety.

international conference on acoustics, speech, and signal processing | 2009

Voice search of structured media data

Young-In Song; Ye-Yi Wang; Yun-Cheng Ju; Michael L. Seltzer; Ivan Tashev; Alex Acero

This paper addresses the problem of using unstructured queries to search a structured database in voice search applications. By incorporating structural information in music metadata, the end-to-end search error has been reduced by 15% on text queries and up to 11% on spoken queries. Based on that, an HMM sequential rescoring model has reduced the error rate by 28% on text queries and up to 23% on spoken queries compared to the baseline system. Furthermore, a phonetic similarity model has been introduced to compensate speech recognition errors, which has improved the end-to-end search accuracy consistently across different levels of speech recognition accuracy.

international conference on acoustics, speech, and signal processing | 2008

Language modeling for voice search: A machine translation approach

Xiao Li; Yun-Cheng Ju; Geoffrey Zweig; Alex Acero

This paper presents a novel approach to language modeling for voice search based on the idea and method of statistical machine translation. We propose an n-gram based translation model that can be used for listing-to-query translation. We then leverage the query forms translated from listings to improve language modeling. The translation model is trained in an unsupervised manner using a set of transcribed voice search queries. Experiments show that the translation approach yielded drastic perplexity reductions compared with a baseline language model where no translation is applied.

user interface software and technology | 2008

Search Vox: leveraging multimodal refinement and partial knowledge for mobile voice search

Tim Paek; Bo Thiesson; Yun-Cheng Ju; Bongshin Lee

Internet usage on mobile devices continues to grow as users seek anytime, anywhere access to information. Because users frequently search for businesses, directory assistance has been the focus of many voice search applications utilizing speech as the primary input modality. Unfortunately, mobile settings often contain noise which degrades performance. As such, we present Search Vox, a mobile search interface that not only facilitates touch and text refinement whenever speech fails, but also allows users to assist the recognizer via text hints. Search Vox can also take advantage of any partial knowledge users may have about the business listing by letting them express their uncertainty in an intuitive way using verbal wildcards. In simulation experiments conducted on real voice search data, leveraging multimodal refinement resulted in a 28% relative reduction in error rate. Providing text hints along with the spoken utterance resulted in even greater relative reduction, with dramatic gains in recovery for each additional character.

ubiquitous computing | 2013

NLify: lightweight spoken natural language interfaces via exhaustive paraphrasing

Seungyeop Han; Matthai Philipose; Yun-Cheng Ju

This paper presents the design and implementation of a programming system that enables third-party developers to add spoken natural language (SNL) interfaces to standalone mobile applications. The central challenge is to create statistical recognition models that are accurate and resource-efficient in the face of the variety of natural language, while requiring little specialized knowledge from developers. We show that given a few examples from the developer, it is possible to elicit comprehensive sets of paraphrases of the examples using internet crowds. The exhaustive nature of these paraphrases allows us to use relatively simple, automatically derived statistical models for speech and language understanding that perform well without per-application tuning. We have realized our design fully as an extension to the Visual Studio IDE. Based on a new benchmark dataset with 3500 spoken instances of 27 commands from 20 subjects and a small developer study, we establish the promise of our approach and the impact of various design choices.

Journal of Experimental Psychology: Applied | 2014

Sharing a driver's context with a caller via continuous audio cues to increase awareness about driver state

Christian P. Janssen; Shamsi T. Iqbal; Yun-Cheng Ju

In an experiment using a driving simulator we investigated whether sharing information of a drivers context with a remote caller via continuous audio cues can make callers more aware of the driving situation. Increased awareness could potentially help in making the conversation less distracting. Prior research has shown that although sharing context using video can create such beneficial effects, it also has some practical disadvantages. It is an open question whether other modalities might also provide sufficient context for a caller. In particular, the effects of sharing audio, a cheaper, more salient, and perhaps more practical alternative than video, are not well understood. We investigated sharing context using direct cues in the form of realistic driving sounds (e.g., car honks, sirens) and indirect cues in the form of elevated heartbeats. Sound sharing affected the callers perception of the drivers busyness. However, this had at most a modest effect on conversation and driving performance. An implication of these results is that although sharing sounds can increase a callers awareness of changes in the drivers busyness, they need more training or information on how to leverage such context information to reduce disruption to driving. Limitations and implications are discussed.

north american chapter of the association for computational linguistics | 2007

Voice-Rate: A Dialog System for Consumer Ratings

Geoffrey Zweig; Yun-Cheng Ju; Patrick Nguyen; Dong Yu; Ye-Yi Wang; Alex Acero

Voice-Rate is an automated dialog system which provides access to over one million ratings of products and businesses. By calling a toll-free number, consumers can access ratings for products, national businesses such as airlines, and local businesses such as restaurants. Voice-Rate also has a facility for recording and analyzing ratings that are given over the phone. The service has been primed with ratings taken from a variety of web sources, and we are augmenting these with user ratings. Voice-Rate can be accessed by dialing 1-877-456-DATA.

Explore More