Is this you? Create Your Porfile

Helen M. Meng

The Chinese University of Hong Kong

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Helen M. Meng is active.

Explore More

Publication

Featured researches published by Helen M. Meng.

international conference on spoken language processing | 1996

WHEELS: a conversational system in the automobile classifieds domain

Helen M. Meng; Senis Busayapongchai; J. Giass; David Goddeau; L. Hethetingron; Edward Hurley; Christine Pao; Joseph Polifroni; Stephanie Seneff; Victor W. Zue

WHEELS is a conversational system which provides access to a database of electronic automobile classified advertisements. It leverages off the existing spoken language technologies from our GALAXY system, and enables users to search through a database of 5,000 automobile classifieds. The current end-to-end system can respond to spoken or typed inputs, and produces a short list of entries meeting the constraints specified by the user. The system operates in mixed-initiative mode, asking for specific information but not requiring compliance. The output information is conveyed to the user with visual tables and synthesized speech. This system incorporates a new type of category bigram, created with the innovative use of the natural language component. Future plans to extend the system include operating in a displayless mode, and porting the system to Spanish.

ieee automatic speech recognition and understanding workshop | 2001

Generating phonetic cognates to handle named entities in English-Chinese cross-language spoken document retrieval

Helen M. Meng; Wai Kit Lo; Berlin Chen; Karen P. Tang

We have developed a technique for automatic transliteration of named entities for English-Chinese cross-language spoken document retrieval (CL-SDR). Our retrieval system integrates machine translation, speech recognition and information retrieval technologies. An English news story forms a textual query that is automatically translated into Chinese words, which are mapped into Mandarin syllables by pronunciation dictionary lookup. Mandarin radio news broadcasts form spoken documents that are indexed by word and syllable recognition. The information retrieval engine performs matching in both word and syllable scales. The English queries contain many named entities that tend to be out-of-vocabulary words for machine translation and speech recognition, and are omitted in retrieval. Names are often transliterated across languages and are generally important for retrieval. We present a technique that takes in a name spelling and automatically generates a phonetic cognate in terms of Chinese syllables to be used in retrieval. Experiments show consistent retrieval performance improvement by including the use of named entities in this way.

Speech Communication | 2002

Spoken language resources for Cantonese speech processing

Tan Lee; Wai Kit Lo; P. C. Ching; Helen M. Meng

This paper describes the development of CU Corpora, a series of large-scale speech corpora for Cantonese. Cantonese is the most commonly spoken Chinese dialect in Southern China and Hong Kong. CU Corpora are the first of their kind and intended to serve as an important infrastructure for the advancement of speech recognition and synthesis technologies for this widely used Chinese dialect. They contain a large amomat of speech data that cover various linguistic units of spoken Cantonese, including isolated syllables, polysyllabic words and continuous sentences. While some of the corpora are created for specific applications of common interest, the others are designed with emphasis on the coverage and distributions of different phonetic units, including the contextual ones. The speech data are annotated manually so as to provide sufficient orthographic and phonetic information for the development of different applications. Statistical analysis of the annotated data shows that CU Corpora contain rich and balanced phonetic content. The usefulness of the corpora is also demonstrated with a number of speech recognition and speech synthesis applications.

IEEE Signal Processing Magazine | 2015

Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends

Zhen-Hua Ling; Shiyin Kang; Heiga Zen; Andrew W. Senior; Mike Schuster; Xiaojun Qian; Helen M. Meng; Li Deng

Hidden Markov models (HMMs) and Gaussian mixture models (GMMs) are the two most common types of acoustic models used in statistical parametric approaches for generating low-level speech waveforms from high-level symbolic inputs via intermediate acoustic feature sequences. However, these models have their limitations in representing complex, nonlinear relationships between the speech generation inputs and the acoustic features. Inspired by the intrinsically hierarchical process of human speech production and by the successful application of deep neural networks (DNNs) to automatic speech recognition (ASR), deep learning techniques have also been applied successfully to speech generation, as reported in recent literature. This article systematically reviews these emerging speech generation approaches, with the dual goal of helping readers gain a better understanding of the existing techniques as well as stimulating new work in the burgeoning area of deep learning for parametric speech generation.

IEEE Transactions on Speech and Audio Processing | 2002

A system for spoken query information retrieval on mobile devices

Eric Chang; Frank Seide; Helen M. Meng; Zhuoran Chen; Yu Shi; Yuk-Chi Li

With the proliferation of handheld devices, information access on mobile devices is a topic of growing relevance. This paper presents a system that allows the user to search for information on mobile devices using spoken natural-language queries. We explore several issues related to the creation of this system, which combines state-of-the-art speech-recognition and information-retrieval technologies. This is the first work that we are aware of which evaluates spoken query based information retrieval on a commonly available and well researched text database, the Chinese news corpus used in the National Institute of Standards and Technology (NIST)s TREC-5 and TREC-6 benchmarks. To compare spoken-query retrieval performance for different relevant scenarios and recognition accuracies, the benchmark queries-read verbatim by 20 speakers-were recorded simultaneously through three channels: headset microphone, PDA microphone, and cellular phone. Our results show that for mobile devices with high-quality microphones, spoken-query retrieval based on existing technologies yields retrieval precisions that come close to that for perfect text input (mean average precision 0.459 and 0.489, respectively, on TREC-6).

international conference on acoustics, speech, and signal processing | 2013

Multi-distribution deep belief network for speech synthesis

Shiyin Kang; Xiaojun Qian; Helen M. Meng

Deep belief network (DBN) has been shown to be a good generative model in tasks such as hand-written digit image generation. Previous work on DBN in the speech community mainly focuses on using the generatively pre-trained DBN to initialize a discriminative model for better acoustic modeling in speech recognition (SR). To fully utilize its generative nature, we propose to model the speech parameters including spectrum and F0 simultaneously and generate these parameters from DBN for speech synthesis. Compared with the predominant HMM-based approach, objective evaluation shows that the spectrum generated from DBN has less distortion. Subjective results also confirm the advantage of the spectrum from DBN, and the overall quality is comparable to that of context-independent HMM.

empirical methods in natural language processing | 2015

Fine-grained Opinion Mining with Recurrent Neural Networks and Word Embeddings

Pengfei Liu; Shafiq R. Joty; Helen M. Meng

The tasks in fine-grained opinion mining can be regarded as either a token-level sequence labeling problem or as a semantic compositional task. We propose a general class of discriminative models based on recurrent neural networks (RNNs) and word embeddings that can be successfully applied to such tasks without any taskspecific feature engineering effort. Our experimental results on the task of opinion target identification show that RNNs, without using any hand-crafted features, outperform feature-rich CRF-based models. Our framework is flexible, allows us to incorporate other linguistic features, and achieves results that rival the top performing systems in SemEval-2014.

international conference on acoustics, speech, and signal processing | 2006

A Comparative Study of Discriminative Methods for Reranking LVCSR N-Best Hypotheses in Domain Adaptation and Generalization

Zhengyu Zhou; Jianfeng Gao; Frank K. Soong; Helen M. Meng

This paper is an empirical study on the performance of different discriminative approaches to reranking the N-best hypotheses output from a large vocabulary continuous speech recognizer (LVCSR). Four algorithms, namely perceptron, boosting, ranking support vector machine (SVM) and minimum sample risk (MSR), are compared in terms of domain adaptation, generalization and time efficiency. In our experiments on Mandarin dictation speech, we found that for domain adaptation, perceptron performs the best; for generalization, boosting performs the best. The best result on a domain-specific test set is achieved by the perceptron algorithm. A relative character error rate (CER) reduction of 11% over the baseline was obtained. The best result on a general test set is 3.4% CER reduction over the baseline, achieved by the boosting algorithm

International Journal of Intelligent Systems | 2001

Using contextual analysis for news event detection

Wai Lam; Helen M. Meng; Ka-Ho Wong; J. C. H. Yen

The rapidly growing amount of newswire stories stored in electronic devices raises new challenges for information retrieval technology. Traditional query‐driven retrieval is not suitable for generic queries. It is desirable to have an intelligent system to automatically locate topically related events or topics in a continuous stream of newswire stories. This is the goal of automatic event detection. We propose a new approach to performing event detection from multilingual newswire stories. Unlike traditional methods which employ simple keyword matching, our method makes use of concept terms and named entities such as person, location, and organization names. Concept terms of a story are derived from statistical context analysis between sentences in the news story and stories in the concept database. We have conducted a set of experiments to study the effectiveness of our approach. The results show that the performance of detection using concept terms together with story keywords is better than traditional methods which only use keyword representation. © 2001 John Wiley & Sons, Inc.

2009 Oriental COCOSDA International Conference on Speech Database and Assessments | 2009

Phonetic aspects of content design in AESOP (Asian English Speech cOrpus Project)

Tanya Visceglia; Chiu-yu Tseng; Mariko Kondo; Helen M. Meng; Yoshinori Sagisaka

This research is part of the ongoing multinational collaboration “Asian English Speech cOrpus Project” (AESOP), whose aim is to build up an Asian English speech corpus representing the varieties of English spoken in Asia. AESOP is an international consortium of linguists, speech scientists, psychologists and educators from Japan, Taiwan, Hong Kong, China, Thailand, Indonesia and Mongolia. Its primary aim is to collect and compare Asian English speech corpora from the countries listed above in order to derive a set of core properties common to all varieties of Asian English, as well as to discover features that are particular to individual varieties. Each research team will use a common recording setup and share an experimental task set, and will develop a common, open-ended annotation system. Moreover, AESOP-collected corpora will be an open resource, available to the research community at large. The initial stage of the phonetics aspect of this project will be devoted to designing spoken-language tasks which will elicit production of a large range of English segmental and suprasegmental characteristics. These data will be used to generate a catalogue of acoustic characteristics particular to individual varieties of Asian English, which will then be compared with the data collected by other AESOP members in order to determine areas of overlap between L1 and L2 English as well as differences among varieties of Asian English.

Explore More