Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Xiang Xie is active.

Publication


Featured researches published by Xiang Xie.


international conference on multimedia and expo | 2014

Low latency parameter generation for real-time speech synthesis system

Xingyu Na; Xiang Xie; Jingming Kuang

Speech synthesizer is commonly used in human-computer interaction. In many applicational cases, the computing resource is limited while real-time synthesis is demanded. The HMM-based speech synthesis technique allows creating a natural voice quality with small footprint, but current synthesizers require the concatenation of sentence level acoustic units, which is not applicable in real-time mode. In this paper, we propose a blocked parameter generation algorithm for low latency speech synthesis which can work real-time in resource limited applications. Phonetic units at various time spans are used as blocks. The objective and subjective evaluations suggest that the proposed system produce promising voice quality with a low demand for the computing resource.


international conference on acoustics, speech, and signal processing | 2014

Improving voice quality of HMM-based speech synthesis using voice conversion method

Yishan Jiao; Xiang Xie; Xingyu Na; Ming Tu

HMM-based speech synthesis system (HTS) often generates buzzy and muffled speech. Such degradation of voice quality makes synthetic speech sound robotically rather than naturally. From this point, we suppose that synthetic speech is in a different speaker space apart from the original. We propose to use voice conversion method to transform synthetic speech toward the original so as to improve its quality. Local linear transformation (LLT) combined with temporal decomposition (TD) is proposed as the conversion method. It can not only ensure smooth spectral conversion but also avoid over-smoothing problem. Moreover, we design a robust spectral selection and modification strategy to make the modified spectra stable. Preference test shows that the proposed method can improve the quality of HMM-based speech synthesis.


Eurasip Journal on Audio, Speech, and Music Processing | 2013

Context-based adaptive arithmetic coding in time and frequency domain for the lossless compression of audio coding parameters at variable rate

Jing Wang; Xuan Ji; Shenghui Zhao; Xiang Xie; Jingming Kuang

This paper presents a novel lossless compression technique of the context-based adaptive arithmetic coding which can be used to further compress the quantized parameters in audio codec. The key feature of the new technique is the combination of the context model in time domain and frequency domain which is called time-frequency context model. It is used for the lossless compression of audio coding parameters such as the quantized modified discrete cosine transform (MDCT) coefficients and the frequency band gains in ITU-T G.719 audio codec. With the proposed adaptive arithmetic coding, a high degree of adaptation and redundancy reduction can be achieved. In addition, an efficient variable rate algorithm is employed, which is designed based on both the baseline entropy coding method of G.719 and the proposed adaptive arithmetic coding technique. Experiments show that the proposed technique is of higher efficiency compared with the conventional Huffman coding and the common adaptive arithmetic coding when used in the lossless compression of audio coding parameters. For a set of audio samples used in the G.719 application, the proposed technique achieves an average bit rate saving of 7.2% at low bit rate coding mode while producing audio quality equal to that of the original G.719.


international symposium on chinese spoken language processing | 2008

Order Adaptation of the Fractional Fourier Transform Using the Intraframe Pitch Change Rate for Speech Recognition

Hui Yin; Climent Nadeu; Volker Hohmann; Xiang Xie; Jingming Kuang

We propose an acoustic feature for speech recognition based on the combination of MFCC and fractional Fourier transform (FrFT). The transform orders for FrFT are adaptively set according to the intraframe pitch change rate. This method is motivated by the fact that the speech is not stationary even in a short period of time, and the idea is shown using an AM-FM speech model and some spectrograms of an artificial periodic signal. Experiments were conducted on the intervocalic English consonants provided by Interspeech 2008 Consonant Challenge and a Mandarin connected digits corpus. The performance of the proposed method is compared with the MFCC baseline system. Experimental results show that the proposed features get a slightly better recognition rate than MFCCs presumably because they can better track the dynamic characteristics of the speech harmonics.


international conference on acoustics, speech, and signal processing | 2011

A novel algorithm of seeking FrFT order for speech processing

Duo-jia Ma; Xiang Xie; Jingming Kuang

The determination of the optimal fractional Fourier transform (FrFT) order is a crucial issue for FrFT. This paper introduces a novel algorithm is proposed for the estimation of FrFT order. We use the information of pitches, harmonies and formants in the correlogram of Gammatone filterbanks to get a few candidates of the transform order. The proposed method reduces the computation complexity in the searching of optimal transform order. We apply this method for speech processing such as Mel-frequency cepstral coefficients (MFCC) extraction and speech enhancement. The experiment of MFCC extraction shows that the proposed method is superior to the traditional method based on Fourier Transform in the sense of Fisher distance improvement. The experimental results of speech enhancement also show the improvement in sense of SNR and the Itakura-Saito distance of LPC coefficients.


international conference on acoustics, speech, and signal processing | 2013

Multichannel audio signal compression based on tensor decomposition

Jing Wang; Chundong Xu; Xiang Xie; Jingming Kuang

This paper proposes a novel multichannel audio signal compression method based on tensor decomposition. The multichannel audio tensor space is established with three factors (channel, time, and frequency) and is decomposed into the core tensor and three factor matrices based on tucker model. Only the truncated core tensor is transmitted to the decoder which is multiplied by the factor matrices trained before processing. The performance of the proposed method is evaluated with approximation errors, compression degree and listening tests. When the core tensor is smaller, the compression degree will be higher. A very noticeable compression capability will be achieved with an acceptable retrieved quality. The novelty of the proposed method is that it enables both high compression capability and backward compatibility with little signal distortion to the hearing.


international symposium on chinese spoken language processing | 2010

A novel algorithm of seeking FrFT order for speech enhancement

Duo-jia Ma; Xiang Xie; Jingming Kuang

This article introduces fractional Fourier transform (FrFT) to speech enhancement, A novel algorithm is proposed for the estimation of FrFT order. The determination of the optimal FrFT order is a crucial issue for FrFT. We use the information of pitches, harmonies and formants in correlogram of Gammatone filterbank to get a few candidates of the transform order. The proposed method reduces the computation complexity in the searching of optimal transform order. The experimental results of speech enhancement show that the proposed method is superior to the conventional spectral subtraction in the sense of SNR improvement of the enhanced speeches and the Itakura-Saito distance of LPC coefficients.


international conference on machine learning | 2017

Robust Recognition of Mandarin Vowels by Articulatory Manners

Jin Hu; Jing Liu; Yingnan Zhang; Zhuanling Zha; Xiang Xie; Shilei Huang

This paper proposes a robust classifier for Mandarin vowels considering articulatory manners (AMs) which include the height of the body of the tongue, the front-back position of the tongue, and the degree of lip rounding. Firstly, the articulatory manners of each vowel are encoded to a 3-dimension vector pattern. Then, acoustic features are extracted and mapped to the articulatory manner vector by ELM. Finally, the nearest vowel to the articulatory manner vector is chosen as the recognized result. Comparison between our method and the direct method without considering the articulatory manners shows that the proposed method has an improvement of 7.1 percentage points. Tests with three kinds of noisy data in the Aurora-4 show it also outperforms the normal method with an about a gain of about 4 percentage points.


China Communications | 2017

A novel two-layer model for overall quality assessment of multichannel audio

Jiyue Liu; Jing Wang; Min Liu; Xiang Xie; Jingming Kuang

With the development of multichannel audio systems, corresponding audio quality assessment techniques, especially the objective prediction models, have received increasing attention. Existing methods, such as PEAQ (Perceptual Evaluation of Audio Quality) recommended by ITU, usually lead to poor results when assessing multichannel audio, which have little correlation with subjective scores. In this paper, a novel two-layer model based on Multiple Linear Regression (MLR) and Neural Network (NN) is proposed. Through the first layer, two indicators of multichannel audio, Audio Quality Score (AQS) and Spatial Perception Score (SPS) are derived, and through the second layer the overall score is output. The final results show that this model can not only improve the correlation with the subjective test score by 30.7% and decrease the Root Mean Square Error (RMSE) by 44.6%, but also add two new indicators: AQS and SPS, which can help reflect the multichannel audio quality more clearly.


international symposium on chinese spoken language processing | 2016

Microphone array speech denoising modeled by tensor filtering

Jing Wang; Yahui Shan; Shequan Jiang; Xiang Xie

This paper proposes a novel speech denoising method based on tensor filtering, in which the microphone array speech signal is constructed by tensor data and processed by tensor filtering model. The multi-microphone signal is represented with three-order tensor space in the way of channel, time and frequency. Noise can be reduced by finding the lower-rank approximation of the three-order tensor with tucker model. MDL (Minimum Description Length) criterion is used to estimate the optimal tensor rank. The performance of the proposed approach is evaluated with objective indexes and listening quality test. The experimental results indicate that the proposed approach has potential ability of retrieving the target signal from noisy microphone array signal.

Collaboration


Dive into the Xiang Xie's collaboration.

Top Co-Authors

Avatar

Jingming Kuang

Beijing Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Jing Wang

Beijing Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Xingyu Na

Beijing Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Jin Hu

Beijing Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Duo-jia Ma

Beijing Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Jing Liu

Beijing Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Ming Tu

Beijing Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Shenghui Zhao

Beijing Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Shilei Huang

Beijing Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Yahui Shan

Beijing Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge