Wen-Whei Chang
National Chiao Tung University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Wen-Whei Chang.
Speech Communication | 2002
Wuei-He Tsai; Wen-Whei Chang
This study focuses on the parametric stochastic modeling of characteristic sound features that distinguish languages from one another. A new stochastic model, the so-called Gaussian mixture bigram model (GMBM), that allows exploitation of the acoustic feature bigram statistics without requiring transcribed training data is introduced. For greater efficiency, a minimum classification error (MCE) algorithm is employed to accomplish discriminative training of a GMBM-based Chinese dialect identification system. Simulation results demonstrate the effectiveness of the GMBM for dialect-specific acoustic modeling, and use of this model allows the proposed system to distinguish between the three major Chinese dialects spoken in Taiwan with 94.4% accuracy.
IEEE Transactions on Speech and Audio Processing | 1996
Wen-Whei Chang; Chin-Tun Wang
Most LPC-based audio coders improve the reproduction quality by using predictor coefficients to embody perceptual masking in noise spectral shaping. Since the predictor coefficients were originally derived to characterize sound production models, they cannot precisely describe the human ears nonlinear responses to frequency and loudness. We report on new approaches to exploiting the masking threshold in the design of a perceptual noise-weighting filter for excitation searches. To track the nonstationary evolution of a masking threshold, an autoregressive spectral analysis with finite order has been shown to be capable of providing sufficient accuracy. In seeking a faster response, an artificial neural network was also trained to extract autoregressive modeling parameters of the masking threshold from typical audio signals via mapping. Furthermore, we propose the concept of sinusoidal excitation representation to better track the intrinsic characteristics of prediction error signals. Simulation results indicate that the combined use of a multisinusoid excitation model and a masking-threshold-adapted weighting filter allows the implementation of an LPC-based audio coder that delivers near transparent quality at the rate of 96 kb/s.
IEEE Journal on Selected Areas in Communications | 2001
Wen-Whei Chang; Tan-Hsu Tan; De-Yu Wang
This study focuses on two issues: parametric modeling of the channel and index assignment of codevectors, to design a vector quantizer that achieves high robustness against channel errors. We first formulate the design of a robust zero-redundancy vector quantizer as a combinatorial optimization problem leading to a genetic search for a minimum-distortion index assignment. The performance is further enhanced by the use of the Fritchman (1967) channel model that more closely characterizes the statistical dependencies between error sequences. This study also presents an index assignment algorithm based on the Fritchman model with parameter values estimated using a real-coded genetic algorithm. Simulation results indicate that the global explorative properties of genetic algorithms make them very effective in estimating Fritchman model parameters, and use of this model can match index assignment to expected channel conditions.
IEEE Transactions on Communications | 1991
Jerry D. Gibson; Wen-Whei Chang
The authors present both forward and backward adaptive speech coders that operate at 9.6, 12, and 16 kb/s using integer and fractional rate trees, weighted squared error distortion measures, the (M,L) tree search algorithm, and incremental path map symbol release. They introduce the concept of multitree source codes and illustrate how the multitree structure allows scalar quantizer-based codes and scalar adaptation rules to be used for fractional rate tree coding. With a frequency weighted distortion measure, the forward and backward adaptive multitree coders produce near toll quality speech at 16 kb/s, while the backward adaptive 9.6 kb/s multitree coder substantially outperforms adaptive predictive coding and has an encoding delay of less than 2 ms. Performance results are present in terms of unweighted and weighted signal-to-noise ratio and segmental signal-to-noise ratio, sound spectrograms, and subjective listening tests. >
global communications conference | 1989
Jerry D. Gibson; Wen-Whei Chang
The authors present both forward and backward adaptive speech coders that operate at 9.6, 12, and 16 kb/s using integer and fractional rate trees, unweighted and weighted squared error distortion measures, the (M,L) trees search algorithm, and incremental path map symbol release. They introduce the concept of multitree sources codes and illustrate their advantage over classical, multiple-symbol-per branch, fractional-rate trees for speech coding with deterministic code generators. Performance results are presented in terms of unweighted and weighted signal-to-noise ratio and segmental signal-to-noise ratio, sound spectrograms, and subjective listening tests.<<ETX>>
Speech Communication | 2006
Cheng-Lung Lee; Wen-Whei Chang; Yuan-Chuan Chiang
This paper studies the combined use of spectral and prosodic conversions to enhance the hearing-impaired Mandarin speech. The analysis-synthesis system is based on a sinusoidal representation of the speech production mechanism. By taking advantage of the tone structure in Mandarin speech, pitch contours are orthogonally transformed and applied within the sinusoidal framework to perform pitch modification. Also proposed is a time-scale modification algorithm that finds accurate alignments between hearing-impaired and normal utterances. Using the alignments, spectral conversion is performed on subsyllabic acoustic units by a continuous probabilistic transform based on a Gaussian mixture model. Results of perceptual evaluation indicate that the proposed system greatly improves the intelligibility and the naturalness of hearing-impaired Mandarin speech.
Archive | 1991
Jerry D. Gibson; Yoon Chae Cheong; Hong Chae Woo; Wen-Whei Chang
To achieve low delay in speech coders, the redundancy removal must be accomplished in a backward adaptive fashion so that both long-and short-term predictors use only the decoder output for parameter adaptation. Numerous backward adaptive algorithms for updating the short-term predictor coefficients have been studied [1–4], and several look promising. In [5] comparative simulation results were presented for a fixed-tap predictor, three gradient-adapted transversal predictors, and two least squares lattice predictors when used in a differential pulse code modulation (DPCM) based code generator for tree coding of speech at 16 kilobits/s (kbits/s).
global communications conference | 1989
Jerry D. Gibson; Wen-Whei Chang
Minimum mean squared error (MMSE) fixed-lag smoothing is used in conjunction with DPCM (differential pulse code modulation) to develop a code generator employing delayed decoding. This smoothed DPCM (SDPCM) code generator is compared to DPCM and interpolative DPCM (IDPCM) code generators at rates 1 and 2 for tree coding several synthetic sources, as well as to a DPCM code/generator at rate 2 for speech sources. The (M,L) algorithm is used for tree searching, and SDPCM outperforms IDPCM and DPCM at rate 2 for synthetic sources with M=1, 4, 8, and 12, and at rate 1 with M>or=4. For speech, SDPCM provides a slight improvement in MSE (mean squared error) over DPCM codes, which is also evident in sound spectrograms and listening tests.<<ETX>>
international conference on communications | 2006
Chun-Feng Wu; Wen-Whei Chang
Packet delay and loss are two essential problems to real-time voice transmission over IP networks. In the proposed system, the playout delay is adaptively adjusted based on a simplified version of the conversational-quality E-model. Perceptual-based buffer design is formulated as an unconstrained optimization problem leading to a better balance between end-to-end delay and packet loss. Experimental results show that the proposed playout buffer algorithm can achieve the optimum perceived speech quality under various network conditions.
international conference on acoustics, speech, and signal processing | 1997
Wen-Whei Chang; De-Yu Wang; Li-Wei Wang
Most LPC-based audio coders employ simplistic noise-shaping operations to perform psychoacoustic control of quantization noise. In this paper, we report on new approaches to exploiting perceptual masking in the design of adaptive quantization of LPC excitation parameters. Due to its localized spectral sensitivity, sinusoidal excitation representation is preferred to spectrally flat signals for use in excitation modeling. Simulation results indicate that the proposed multisinusoid excited coder can deliver high quality audio reproduction at the rate of 72 kb/s.