Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hongcui Wang is active.

Publication


Featured researches published by Hongcui Wang.


Speech Communication | 2014

Detection of speaker individual information using a phoneme effect suppression method

Songgun Hyon; Jianwu Dang; Hui Feng; Hongcui Wang; Kiyoshi Honda

Feature extraction of speaker information from speech signals is a key procedure for exploring individual speaker characteristics and also the most critical part in a speaker recognition system, which needs to preserve individual information while attenuating linguistic information. However, it is difficult to separate individual from linguistic information in a given utterance. For this reason, we investigated a number of potential effects on speaker individual information that arise from differences in articulation due to speaker-specific morphology of the speech organs, comparing English, Chinese and Korean. We found that voiced and unvoiced phonemes have different frequency distributions in speaker information and these effects are consistent across the three languages, while the effect of nasal sounds on speaker individuality is language dependent. Because these differences are confounded with speaker individual information, feature extraction is negatively affected. Accordingly, a new feature extraction method is proposed to more accurately detect speaker individual information by suppressing phoneme-related effects, where the phoneme alignment is required once in constructing a filter bank for phoneme effect suppression, but is not necessary in processing feature extraction. The proposed method was evaluated by implementing it in GMM speaker models for speaker identification experiments. It is shown that the proposed approach outperformed both Mel Frequency Cepstrum Coefficient (MFCC) and the traditional F-ratio (FFCC). The use of the proposed feature has reduced recognition errors by 32.1-67.3% for the three languages compared with MFCC, and by 6.6-31% compared with FFCC. When combining an automatic phoneme aligner with the proposed method, the result demonstrated that the proposed method can detect speaker individuality with about the same accuracy as that based on manual phoneme alignment.


asia-pacific signal and information processing association annual summit and conference | 2013

Visualization of Mandarin articulation by using a physiological articulatory model

Dian Huang; Xiyu Wu; Jianguo Wei; Hongcui Wang; Chan Song; Qingzhi Hou; Jianwu Dang

It is difficult for language learners to produce unfamiliar speech sounds accurately because they may not manipulate articulatory movements precisely by auditory feedback alone. Visual feedback can help identify the errors and promote the learning progress, especially in language learning and speech rehabilitation fields. In this paper, we propose a visualization method for Mandarin phoneme pronunciation using a three-dimensional (3D) articulatory physiological model driven by Chinese Electromagnetic Articulographic (EMA) data. A mapping from EMA data to physiological articulatory model was constructed using three points on the mid-sagittal plane of the tongue. To do so, we analyzed configurations of 30 Chinese phonemes based on an EMA database. At the same time, we designed nearly 150,000 muscle activation patterns and applied them to the physiological model to generate model-based articulatory movements. As the result, we developed a visualized articulation system with 2.5 dimensional and 3D views respectively. The mapping was evaluated using MRI data. It is found that the mean deviation was about 0.21cm for seven vowels.


international symposium on chinese spoken language processing | 2012

Efficient feature extraction of speaker identification using phoneme mean F-ratio for Chinese

Chen Zhao; Hongcui Wang; Songgun Hyon; Jianguo Wei; Jianwu Dang

The features used for speaker recognition should have more speaker individual information while attenuating the linguistic information. In order to discard the linguistic information effectively, in this paper, we employed the phoneme mean F-ratio method to investigate the different contributions of different frequency region from the point of view of Chinese phoneme, and apply it for speaker identification. It is found that the speaker individual information depending on the phonemes is distributed in different frequency regions of speech sound. Based on the contribution rate, we extracted the new features and combined with GMM model. The experiment for speaker identification task is conducted with a King-ASR Chinese database. Compared with the MFCC feature, the identification error rate with the proposed feature was reduced by 32.94%. The results confirmed that the efficiency of the phoneme mean F-ratio method for improving speaker recognition performance for Chinese.


international symposium on chinese spoken language processing | 2016

Comparison of DCT and autoencoder-based features for DNN-HMM multimodal silent speech recognition

Licheng Liu; Yan Ji; Hongcui Wang; Bruce Denby

Hidden Markov Model and Deep Neural Network-Hidden Markov Model speech recognition performance for a portable ultrasound + video multimodal silent speech interface is investigated using Discrete Cosine Transform and Deep Auto Encoder-based features with a range of dimensionalities. Experimental results show that the two types of features achieve similar Word Error Rate, but that the autoencoder features maintain good performance even for very low-dimension feature vectors, demonstrating potential as a very compact representation of the information in multimodal silent speech data. It is also shown for the first time that the Deep Network/Markov approach, which has been demonstrated to be beneficial for acoustic speech recognition and for articulatory sensor-based silent speech, improves the silent speech recognition performance for video-based silent speech recognition as well.


international symposium on chinese spoken language processing | 2012

Detailed morphological analysis of mandarin sustained steady vowels

Yuguang Wang; Hongcui Wang; Jiaqi Gao; Jianguo Wei; Jianwu Dang

One of important issues for speech production is to investigate the relation of acoustic features and fine morphological structures of the vocal tract. This study aims to examine morphological characteristics of Mandarin sustained vowels using a female vocal tract MRI data. To do so, image preprocessing, teeth superimposition, segmentation and volume reconstruction are carried out on the MRI volumetric images to extract 3D vocal tract shapes. Then area functions are extracted from vocal tract shapes by re-slicing the vocal tract with a set of grid planes. Nine Mandarin vowels are divided into three groups based on the size rate of pharyngeal/oral cavity. Detailed analysis of these area functions are performed within the groups. The morphological characteristics of the laryngeal cavity and side branches (namely the bilateral piriform fossae, epiglottic valleculae and inter-dental spaces) are also discussed. To evaluate morphological measurements, a comparison is carried out between formants measured from real speech sounds and those calculated ones from these area functions. Results suggested that the calculated formants are consistent with natural speech with a mean error of 4.6%.


Speech Communication | 2018

Updating the Silent Speech Challenge benchmark with deep learning

Yan Ji; Licheng Liu; Hongcui Wang; Zhilei Liu; Zhibin Niu; Bruce Denby

The 2010 Silent Speech Challenge benchmark is updated with new results obtained in a Deep Learning strategy, using the same input features and decoding strategy as in the original article. A Word Error Rate of 6.4% is obtained, compared to the published value of 17.4%. Additional results comparing new auto-encoder-based features with the original features at reduced dimensionality, as well as decoding scenarios on two different language models, are also presented. The Silent Speech Challenge archive has been updated to contain both the original and the new auto-encoder features, in addition to the original raw data.


international symposium on chinese spoken language processing | 2016

Exploring tonal information for Lhasa dialect acoustic modeling

Jian Li; Hongcui Wang; Longbiao Wang; Jianwu Dang; Kuntharrgyal Khuru; Gyaltsen Lobsang

Detailed analysis of tonal features for Tibetan Lhasa dialect is an important task for Tibetan automatic speech recognition (ASR) applications. However, it is difficult to utilize tonal information because it remains controversial how many tonal patterns the Lhasa dialect has. Therefore, few studies have focused on modeling the tonal information of the Lhasa dialect for speech recognition purpose. For this reason, we investigated influences of the tonal information on the performance of Lhasa Tibetan speech recognition. Since Lhasa Tibetan has no conclusive tonal pattern yet, in this study, we used a four-tone pattern and designed a phone set based on the four contour contrasts scheme. Speech recognition performance was examined using the acoustic model with and without the pitch-related features. The experimental results showed that the character error rate (CER) was improved 11% after applying the tone based phone set and pitch-related features to DNN-HMM based speech recognition by comparing to that without tonal information. This preliminary study revealed that the tonal information plays an important role in speech recognition of Tibetan Lhasa dialect.


asia pacific signal and information processing association annual summit and conference | 2016

Investigation on acoustic modeling with different phoneme set for continuous Lhasa Tibetan recognition based on DNN method

Hongcui Wang; Kuntharrgyal Khyuru; Jian Li; Guanyu Li; Jianwu Dang; Lixia Huang

Deep neural network (DNN) acoustic models have significantly advanced forward in recent years, outperforming the traditional Gaussian Mixture Hidden Markov Model (GMM-HMM) in large vocabulary continuous speech recognition tasks. We try to develop a practical Lhasa Tibetan ASR system. For higher speech recognition accuracy, in this paper, we consider to investigate the performances of Tibetan acoustic modeling using DNN method based on several different phoneme sets, which are defined based on linguistic and phonological knowledge of Tibetan Lhasa dialect. Experiments are conducted on a Tibetan corpus recorded by 20 persons, using a bigram language model over phones. The phone error rate (PER) results show that the acoustic model with CTL set performs best, which is relatively 10.43% higher accuracy than the basic phoneme set. Moreover, our results confirm the fact that for Lhasa Tibetan acoustic model, the paradigm DNN-HMM outperforms the conventional GMM-HMM.


asia pacific signal and information processing association annual summit and conference | 2015

Automatic tongue contour tracking in ultrasound sequences without manual initialization

Hongcui Wang; Siyu Wang; Bruce Denby; Jianwu Dang

Tracking the movement of the tongue is important for understanding how tongue shape change contributes to speech production and control. Ultrasound imaging is widely used to record real time information on the tongue surface; however, noise, artefacts, and the presence of spurious edges render automatic detection of tongue contours without manual initialization difficult. In this paper, we propose a method to extract ultrasound tongue surface contour in a totally automatic way using a three-step procedure: 1) noise reduction using a non-local mean filter; 2) use of a quadratic function to roughly fit the surface contour based on points obtained with a Robert cross operator; and 3) an automatic refinement based on gradient shift and relative distance of candidate points to the initial rough contour point. Experiments are conducted on isolated vowels and on a continuous utterance of vowel sequence. The Mean Sum of Distances criterion shows that the proposed method provides results on a par with the popular EdgeTrak algorithm on these two data sets, as compared to hand-scanned contours, but without any manual initialization.


asia pacific signal and information processing association annual summit and conference | 2015

Investigation of relation between speech perception and production based on EEG source reconstruction

Guancheng Li; Jianwu Dang; Gaoyan Zhang; Zhilei Liu; Hongcui Wang

Mirror neuron system has been investigated using the functional magnetic resonance imaging (fMRI) technique. Activation of the Brocas area and the premotor cortex (PMC), which related with speech production, were observed during speech perception, and seems to be a mirror. However, it is not clear how the mirror neurons function between speech production and perception. This study attempts to investigate the functions of the mirror neurons by utilizing the high temporal resolution of the Electroencephalography (EEG) system. The participants watched Chinese material from screen then heard the material reading from an earphone, finally made a judgement about the consistency of the two stimuli. The high-density EEG signal under source reconstruction revealed that the Wernickes area activated before the Brocas area and PMC during the speech perception tasks. Results are also consistent with the mirror neuron system: the speech production related regions are working during the speech perception tasks.

Collaboration


Dive into the Hongcui Wang's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge