Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jianwu Dang is active.

Publication


Featured researches published by Jianwu Dang.


Journal of the Acoustical Society of America | 1997

ACOUSTIC CHARACTERISTICS OF THE PIRIFORM FOSSA IN MODELS AND HUMANS

Jianwu Dang; Kiyoshi Honda

The piriform fossa forms the bottom part of the pharynx and acts as a pair of side branches of the vocal tract. Because of its obscure form and function, the piriform fossa has usually been neglected in the current speech production models. This study examines the geometric and acoustic characteristics of the piriform fossa by means of MRI-based mechanical modeling, in-vivo experiments and numerical computations. Volumetric MRI data showed that the piriform fossa is 2.1 to 2.9 cm3 in volume and 1.6 to 2.0 cm in depth for four Japanese subjects (three males and one female). The results obtained from mechanical models showed that the piriform fossa contributes strong troughs, i.e., spectral minima, to speech spectra in a region of 4 to 5 kHz. The antiresonances were identified with increasing frequency when water was injected into the piriform fossa of human subjects in in-vivo experiments. Antiresonances obtained from the experiments and simulations were confirmed to be consistent with those in natural speech within 5%. Acoustic measurements and simulations showed that the influence of the piriform fossa extends to the lower vowel formants in addition to the local troughs. This global effect can be explained by the location of the fossa near the glottal end of the vocal tract.


Journal of the Acoustical Society of America | 1994

MORPHOLOGICAL AND ACOUSTICAL ANALYSIS OF THE NASAL AND THE PARANASAL CAVITIES

Jianwu Dang; Kiyoshi Honda; Hisayoshi Suzuki

Morphological measurements of the nasal and paranasal cavities were conducted to investigate their relevance to the acoustic properties of the human nasal tract. The magnetic resonance imaging (MRI) technique was used to measure the three-dimensional geometry of the vocal tract. The area function of the nasal tract was calculated for seven subjects based on data obtained during natural breathing. The entire vocal tract was measured for five subjects during sustained production of nasal consonants. A marked morphological difference was observed between our data and previously published data [A. S. House and K. N. Stevens, J. Speech Hear. Disord. 21, 218-232 (1956); G. Fant, Acoustic Theory of Speech Production (Mouton, The Hague, 1970), 2nd ed., p. 139] particularly in the middle portion of the nasal tract. Previous data derived from cadaver specimens showed a large cavity in the middle portion possibly due to an absent or dehydrated mucous membrane, while our data showed narrow passages due to thickly layered mucosa. It has been confirmed by an additional experiment that the wide cavity is reproducible by applying an adrenergic agent to the nasal mucosa. Transfer functions of the vocal tract and the nasal tract were calculated from measured data, and compared to spectra of real speech signals recorded subsequent to the MRI experiment. The results indicate that asymmetry between the two nasal passages can cause extra pole-zero pairs, and suggest that the paranasal cavities play an important role in shaping the spectral characteristics of human nasal sounds.


Speech Communication | 2008

An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification

Xugang Lu; Jianwu Dang

The features used for speech recognition are expected to emphasize linguistic information while suppressing individual differences. For speaker recognition, in contrast, features should preserve individual information and attenuate the linguistic information at the same time. In most studies, however, identical acoustic features are used for the different missions of speaker and speech recognition. In this paper, we first investigated the relationships between the frequency components and the vocal tract based on speech production. We found that the individual information is encoded non-uniformly in different frequency bands of speech sound. Then we adopted statistical Fishers F-ratio and information-theoretic mutual information measurements to measure the dependencies between frequency components and individual characteristics based on a speaker recognition database (NTT-VR). From the analysis, we not only confirmed the finding of non-uniform distribution of individual information in different frequency bands from the speech production point of view, but also quantified their dependencies. Based on the quantification results, we proposed a new physiological feature which emphasizes individual information for text-independent speaker identification by using a non-uniform subband processing strategy to emphasize the physiological information involved in speech production. The new feature was combined with GMM speaker models and applied to the NTT-VR speaker recognition database. The speaker identification using proposed feature reduced the identification error rate 20.1% compared that with MFCC feature. The experimental results confirmed that emphasizing the features from highly individual-dependent frequency bands is valid for improving speaker recognition performance.


Journal of the Acoustical Society of America | 1996

Acoustic characteristics of the human paranasal sinuses derived from transmission characteristic measurement and morphological observation

Jianwu Dang; Kiyoshi Honda

This paper reports on the acoustic characteristics of the paranasal sinuses as determined from transmission characteristic measurements and morphological examinations. A new experimental approach was developed to explore the correspondence between antiresonance frequencies and the causal resonators [J. Dang and K. Honda, J. Acoust. Soc. Jpn. (E) 17, 93-99 (1996)], and it was adopted to determine the antiresonance frequency of each sinus cavity. In this study, the antiresonance frequencies and the locations of the sinus openings were estimated from transmission characteristics of the nasal tract for three subjects, and then MRI-based morphological data for the subjects were used to relate each antiresonance frequency to its causal sinus cavity. The results indicate that each of the three major sinuses, i.e., the sphenoidal, maxillary, and frontal sinuses, contributes its own antiresonances to the transmission characteristics of the nasal tract. The estimated antiresonance frequencies were compared with computed natural frequencies of Helmholtz resonators, and the differences were within 10% for the sinuses. On the basis of the frequency distribution of the sinus antiresonance, the acoustic characteristics of the paranasal sinuses were modeled by four Helmholtz resonators. The simulation with the four-zero model showed that the paranasal sinuses not only introduce antiresonances in the transfer function, but also change the spectral shape of the nasal formants.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

Voice Activity Detection Based on an Unsupervised Learning Framework

Dongwen Ying; Yonghong Yan; Jianwu Dang; Frank K. Soong

How to construct models for speech/nonspeech discrimination is a crucial point for voice activity detectors (VADs). Semi-supervised learning is the most popular way for model construction in conventional VADs. In this correspondence, we propose an unsupervised learning framework to construct statistical models for VAD. This framework is realized by a sequential Gaussian mixture model. It comprises an initialization process and an updating process. At each subband, the GMM is firstly initialized using EM algorithm, and then sequentially updated frame by frame. From the GMM, a self-regulatory threshold for discrimination is derived at each subband. Some constraints are introduced to this GMM for the sake of reliability. For the reason of unsupervised learning, the proposed VAD does not rely on an assumption that the first several frames of an utterance are nonspeech, which is widely used in most VADs. Moreover, the speech presence probability in the time-frequency domain is a byproduct of this VAD. We tested it on speech from TIMIT database and noise from NOISEX-92 database. The evaluations effectively showed its promising performance in comparison with VADs such as ITU G.729B, GSM AMR, and a typical semi-supervised VAD.


Speech Communication | 2006

Integration of articulatory and spectrum features based on the hybrid HMM/BN modeling framework

Konstantin Markov; Jianwu Dang; Satoshi Nakamura

Most of the current state-of-the-art speech recognition systems are based on speech signal parametrizations that crudely model the behavior of the human auditory system. However, little or no use is usually made of the knowledge on the human speech production system. A data-driven statistical approach to incorporate this knowledge into ASR would require a substantial amount of data, which are not widely available since their acquisition is difficult and expensive. Furthermore, during recognition, it is nearly impossible to obtain observations of articulators movement. Thus, research on speech production mechanisms in ASR has largely focused on modeling the hidden articulatory trajectories and using prior phonetic and phonological knowledge. Nevertheless, it has been shown that combining the acoustic and articulatory information can lead to improved speech recognition performance. The approach taken in this study is to integrate features extracted from actual articulatory data with acoustic MFCC features in a way that allows recognition using MFCC only. Rather than trying to map articulatory features to the corresponding acoustic features, we use the probabilistic dependency between them. Bayesian Networks (BN) are ideally suited for this purpose. They can model complex joint probability distributions with many discrete and continuous variables and have great flexibility in representing their dependencies. Our speech recognition system is based on the hybrid HMM/BN acoustic model where the BN is used to describe the HMM states probability distributions. HMM transitions, on the other hand, model the temporal speech characteristics. Articulatory and acoustic features are represented by different variables of the BN. Dependencies are learned from the observable articulatory and acoustic training data. During recognition, when only the acoustic observations are available, articulatory variables are assumed hidden. We have evaluated our ASR system by using a small database consisting of articulatory and acoustic data recorded from three speakers. The articulatory data are actual measurements of articulators position at several points. In all experiments involving both speakerdependent and multi-speaker acoustic models, the HMM/BN system outperformed the baseline HMM system trained on acoustic data only. In experimenting with different BN topologies, we found that integrating the velocity and


Journal of Phonetics | 2002

Estimation of vocal tract shapes from speech sounds with a physiological articulatory model

Jianwu Dang; Kiyoshi Honda

Abstract A 3D physiological articulatory model based on volumetric MRI data from a male speaker was used to estimate vocal tract shapes from speech sounds. The advantages of using the model for the inverse estimation are that the model is equipped with the morphological and dynamic constraints that are commonly used for such estimation and possesses the physiological constraints that are involved in human articulation. In this study, a dynamic muscle workspace was introduced to account for temporal variations of the muscle orientation with articulatory movements, and a multipoint control strategy was proposed for flexible control of the tongue tip and tongue dorsum. The control points were used as articulatory parameters, and formants were chosen as acoustic parameters. An articulatory constraint between the F1−F2 difference and tongue dorsum position was introduced in mapping formant patterns to control point positions, where the constraint was obtained based on X-ray microbeam data recorded from the target speaker and five other male speakers. The proposed estimation method was evaluated using vowel-to-vowel sequences. For the target speaker of the model, the average estimation error was 0.16 cm for the vocal tract shapes, and 1.8% for the four lower formants. This implies that our physiological articulatory model can be a valuable tool for the inverse estimation.


Oral Science International | 2007

A Computational Tongue Model and its Clinical Application

Satoru Fujita; Jianwu Dang; Noriko Suzuki; Kiyoshi Honda

Abstract The tongue possesses a complex muscular structure, and its motor functions are also intricate. Therefore, it would be beneficial to use a computational physiological model of the tongue to examine its vital functions in normal and pathological conditions. Thus far, the studies of tongue models have focused on symmetric movements for normal speech. For clinical purposes, it is necessary to develop a physiological model to deal with daily vital activities such as mastication and swallowing. To do so, we constructed a full 3D physiological model of the tongue based on MRI data from a normal subject, and verified the basic functions of the model based on anatomic and physiological knowledge. In this study, the model was applied to clinical issues: prediction and verification of the changes in movements of the tongue with a tumor before and after partial glossectomy, respectively. Tongue protrusion and lateral bending motion were examined for the prediction and verification. The simulation results were consistent with the observations for a patient with a tumor in the tongue. Comparisons of the simulation and observation in the clinical case showed that the model could predict potential effects of the glossectomy on the tongue movements. It is suggested that the model is a useful tool for pre-operative planning of glossectomy.


Information Sciences | 2014

Fuzzy rough regression with application to wind speed prediction

Shuang An; Hong Shi; Qinghua Hu; Xiaoqi Li; Jianwu Dang

Accurate wind speed prediction is a prerequisite of large-scale wind power generation. There are several uncertain factors which degrade the performance of the current wind speed prediction systems. Fuzzy rough sets are considered as a powerful tool to deal with uncertainty, and have been widely discussed and applied in classification learning. In this work we describe a regression algorithm based on fuzzy rough sets, consisting of fuzzy partition, fuzzy approximation and estimation of regression values. In this algorithm, the training set is divided into k fuzzy classes with fuzzy partition, and then the predicted values of test samples are determined in the finite intervals with fuzzy rough approximation, finally they are estimated with lower and upper limits of the intervals. Numerical experiments on UCI data sets and wind speed prediction show the effectiveness of the proposed algorithm.


Scientific Reports | 2015

Combined node and link partitions method for finding overlapping communities in complex networks

Di Jin; Bogdan Gabrys; Jianwu Dang

Community detection in complex networks is a fundamental data analysis task in various domains, and how to effectively find overlapping communities in real applications is still a challenge. In this work, we propose a new unified model and method for finding the best overlapping communities on the basis of the associated node and link partitions derived from the same framework. Specifically, we first describe a unified model that accommodates node and link communities (partitions) together, and then present a nonnegative matrix factorization method to learn the parameters of the model. Thereafter, we infer the overlapping communities based on the derived node and link communities, i.e., determine each overlapped community between the corresponding node and link community with a greedy optimization of a local community function conductance. Finally, we introduce a model selection method based on consensus clustering to determine the number of communities. We have evaluated our method on both synthetic and real-world networks with ground-truths, and compared it with seven state-of-the-art methods. The experimental results demonstrate the superior performance of our method over the competing ones in detecting overlapping communities for all analysed data sets. Improved performance is particularly pronounced in cases of more complicated networked community structures.

Collaboration


Dive into the Jianwu Dang's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jianguo Wei

Japan Advanced Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Qiang Fang

Chinese Academy of Social Sciences

View shared research outputs
Top Co-Authors

Avatar

Jianguo Wei

Japan Advanced Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xugang Lu

National Institute of Information and Communications Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Aijun Li

Chinese Academy of Social Sciences

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge