Is this you? Create Your Porfile

Ka-Ho Wong

The Chinese University of Hong Kong

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ka-Ho Wong is active.

Explore More

Publication

Featured researches published by Ka-Ho Wong.

International Journal of Intelligent Systems | 2001

Using contextual analysis for news event detection

Wai Lam; Helen M. Meng; Ka-Ho Wong; J. C. H. Yen

The rapidly growing amount of newswire stories stored in electronic devices raises new challenges for information retrieval technology. Traditional query‐driven retrieval is not suitable for generic queries. It is desirable to have an intelligent system to automatically locate topically related events or topics in a continuous stream of newswire stories. This is the goal of automatic event detection. We propose a new approach to performing event detection from multilingual newswire stories. Unlike traditional methods which employ simple keyword matching, our method makes use of concept terms and named entities such as person, location, and organization names. Concept terms of a story are derived from statistical context analysis between sentences in the news story and stories in the concept database. We have conducted a set of experiments to study the effectiveness of our approach. The results show that the performance of detection using concept terms together with story keywords is better than traditional methods which only use keyword representation. © 2001 John Wiley & Sons, Inc.

international conference on acoustics, speech, and signal processing | 2004

A real-time Cantonese text-to-audiovisual speech synthesizer

Jianqing Wang; Ka-Ho Wong; Pheng-Ann Pheng; Helen Meng; Tien-Tsin Wong

This paper describes the design and development of a Cantonese TTVS synthesizer, which can generate highly natural synthetic speech that is precisely time-synchronized with a real-time 3D face rendering. Our Cantonese TTVS synthesizer utilizes a homegrown Cantonese syllable-based concatenative text-to-speech system named CU VOCAL. This paper describes the extension of CU VOCAL to output syllable labels and durations that correspond to the output acoustic wave file. The syllables are decomposed and their initials/finals are mapped to the nearest IPA symbols that correspond to static viseme models. We have authored sixteen static viseme models together with two emotion-based face models. In order to achieve 3D face rendering, we have designed and implemented a blending technique that computes the linear combinations of the static face models to effect smooth transitions in between models. We demonstrate that this design and implementation of a TTVS synthesizer can achieve real-time performance in generation.

international conference on acoustics, speech, and signal processing | 2011

Allophonic variations in visual speech synthesis for corrective feedback in CAPT

Ka-Ho Wong; Wai Kit Lo; Helen M. Meng

This paper presents a visual speech synthesizer providing midsagittal and front views of the vocal tract to help language learners to correct their mispronunciations. We adopt a set of allophonic rules to determine the visualization of allophonic variations. We also implement coarticulation by decomposing a viseme (visualization of all articulators) into viseme components (visualization of tongue, lips, jaw, and velum separately). Viseme components are morphed independently while the temporally adjacent articulations are considered. Subjective evaluation involving 6 subjects with linguistic background shows that 54% of their responses prefer having allophonic variations incorporated.

2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) | 2011

Enunciate: An internet-accessible computer-aided pronunciation training system and related user evaluations

Ka-Wa Yuen; Wai-Kim Leung; Pengfei Liu; Ka-Ho Wong; Xiaojun Qian; Wai Kit Lo; Helen M. Meng

This paper presents our groups latest progress in developing Enunciate — an online computer-aided pronunciation training (CAPT) system for Chinese learners of English. Presently, the system targets segmental pronunciation errors. It consists of an audio-enabled web interface, a speech recognizer for mispronunciation detection and diagnosis, a speech synthesizer and a viseme animator. We present a summary of the systems architecture and major interactive features. We also present statistics from evaluations by English teachers and university students who participate in pilot trials. We are also extending the system to cover suprasegmental training and mobile access.

conference of the international speech communication association | 2015

Analysis of Dysarthric Speech using Distinctive Feature Recognition

Ka-Ho Wong; Yu Ting Yeung; Patrick C. M. Wong; Gina-Anne Levow; Helen M. Meng

Imprecise articulatory breakdown is one of the characteristics of dysarthric speech. This work attempts to develop a framework to automatically identify problematic articulatory patterns of dysarthric speakers in terms of distinctive features (DFs), which are effective for describing speech production. The identification of problematic articulatory patterns aims to assist speech therapists in developing intervention strategies. A multilayer perceptron (MLP) system is trained with nondysarthric speech data for DF recognition. Agreement rates between the recognized DF values and the canonical values based on phonetic transcriptions are computed. For nondysarthric speech, our system achieves an average agreement rate of 85.7%. The agreement rate of dysarthric speech declines, ranging between 1% to 3% in mild cases, 4% to 7% in moderate cases, and 7% to 12% in severe cases, when compared with non-dysarthric speech. We observe that the DF disagreement patterns are consistent with the analysis of a speech

international symposium on chinese spoken language processing | 2010

Development of an articulatory visual-speech synthesizer to support language learning

Ka-Ho Wong; Wai-Kim Leung; Wai Kit Lo; Helen M. Meng

This paper presents a two-dimensional (2D) visual-speech synthesizer to support language learning. A visual-speech synthesizer animates the human articulators in synchronization with speech signals, e.g., output from a text-to-speech synthesizer. A visual-speech animation can offer a concrete illustration to the language learners on how to move and where to place the articulators when pronouncing a phoneme. We adopt a 2D vector-based viseme models and compiled a collection of visemes to cover the articulation of all English phonemes (42 visemes for the 44 English phonemes). Morphing between properly selected vector-based articulation images achieves articulatory animations. In this way, we have developed an articulatory visual speech synthesizer that can accept free-text input and synthesize articulatory dynamics in real-time. Evaluation involving 32 subjects based on “lip-reading” shows that they can identify the appropriate word(s) based on articulation animation alone nearly ∼80% of the time

international conference on acoustics, speech, and signal processing | 2016

Exploring articulatory characteristics of Cantonese dysarthric speech using distinctive features

Ka-Ho Wong; Wing Sum Yeung; Yu Ting Yeung; Helen M. Meng

Dysarthria is a kind of motor speech disorder due to neurological deficits. Understanding the articulatory problems of dysarthric speakers may help to design suitable intervention strategies to improve their speech intelligibility. We have developed an automatic articulatory characteristics analysis framework based on a distinctive feature (DF) recognition. We recruited 16 Cantonese dysarthric subjects with spinocerebellar ataxia (SCA) or cerebral palsy (CP) to support our research. To the best of our knowledge, this is among the first efforts in collecting and automatically analyzing Cantonese dysarthric speech. The framework shows a close Pearson correlation to manual annotation of the subjects in most DFs and also in the average DF error rates. It indicates a potential way to describe articulatory characteristics of dysarthric speech and automatically assess it.

conference of the international speech communication association | 2015