Zhaozhang Jin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhaozhang Jin is active.

Explore More

Publication

Featured researches published by Zhaozhang Jin.

Computer Speech & Language | 2010

A computational auditory scene analysis system for speech segregation and robust speech recognition

Yang Shao; Soundararajan Srinivasan; Zhaozhang Jin; DeLiang Wang

A conventional automatic speech recognizer does not perform well in the presence of multiple sound sources, while human listeners are able to segregate and recognize a signal of interest through auditory scene analysis. We present a computational auditory scene analysis system for separating and recognizing target speech in the presence of competing speech or noise. We estimate, in two stages, the ideal binary time-frequency (T-F) mask which retains the mixture in a local T-F unit if and only if the target is stronger than the interference within the unit. In the first stage, we use harmonicity to segregate the voiced portions of individual sources in each time frame based on multipitch tracking. Additionally, unvoiced portions are segmented based on an onset/offset analysis. In the second stage, speaker characteristics are used to group the T-F units across time frames. The resulting masks are used in an uncertainty decoding framework for automatic speech recognition. We evaluate our system on a speech separation challenge and show that our system yields substantial improvement over the baseline performance.

international conference on acoustics, speech, and signal processing | 2009

An auditory-based feature for robust speech recognition

Yang Shao; Zhaozhang Jin; DeLiang Wang; Soundararajan Srinivasan

A conventional automatic speech recognizer does not perform well in the presence of noise, while human listeners are able to segregate and recognize speech in noisy conditions. We study a novel feature based on an auditory periphery model for robust speech recognition. Specifically, gammatone frequency cepstral coefficients are derived by applying a cepstral analysis on gammatone filterbank responses. Our evaluations show that the proposed feature performs considerably better than conventional acoustic features. We further demonstrate that integrating the proposed feature with a computational auditory scene analysis system yields promising recognition performance.

IEEE Transactions on Audio, Speech, and Language Processing | 2011

HMM-Based Multipitch Tracking for Noisy and Reverberant Speech

Zhaozhang Jin; DeLiang Wang

Multipitch tracking in real environments is critical for speech signal processing. Determining pitch in reverberant and noisy speech is a particularly challenging task. In this paper, we propose a robust algorithm for multipitch tracking in the presence of both background noise and room reverberation. An auditory front-end and a new channel selection method are utilized to extract periodicity features. We derive pitch scores for each pitch state, which estimate the likelihoods of the observed periodicity features given pitch candidates. A hidden Markov model integrates these pitch scores and searches for the best pitch state sequence. Our algorithm can reliably detect single and double pitch contours in noisy and reverberant conditions. Quantitative evaluations show that our approach outperforms existing ones, particularly in reverberant conditions.

international conference on acoustics, speech, and signal processing | 2007

A Supervised Learning Approach to Monaural Segregation of Reverberant Speech

Zhaozhang Jin; DeLiang Wang

A major source of signal degradation in real environments is room reverberation. Monaural speech segregation in reverberant environments is a particularly challenging problem. Although inverse filtering has been proposed to partially restore the harmonicity of reverberant speech before segregation, this approach is sensitive to specific source/receiver and room configurations. This paper proposes a supervised learning approach to monaural segregation of reverberant voiced speech, which learns to map from a set of pitch-based auditory features to a grouping cue encoding the posterior probability of a time-frequency (T-F) unit being target dominant given observed features. We devise a novel objective function for the learning process, which directly relates to the goal of maximizing signal-to-noise ratio. The models trained using this objective function yield significantly better T-F unit labeling. A segmentation and grouping framework is utilized to form reliable segments under reverberant conditions and organize them into streams. Systematic evaluations show that our approach produces very promising results under various reverberant conditions and generalizes well to new utterances and new speakers.

IEEE Transactions on Audio, Speech, and Language Processing | 2011

Reverberant Speech Segregation Based on Multipitch Tracking and Classification

Zhaozhang Jin; DeLiang Wang

Room reverberation creates a major challenge to speech segregation. We propose a computational auditory scene analysis approach to monaural segregation of reverberant voiced speech, which performs multipitch tracking of reverberant mixtures and supervised classification. Speech and nonspeech models are separately trained, and each learns to map from a set of pitch-based features to a grouping cue which encodes the posterior probability of a time-frequency (T-F) unit being dominated by the source with the given pitch estimate. Because interference may be either speech or nonspeech, a likelihood ratio test selects the correct model for labeling corresponding T-F units. Experimental results show that the proposed system performs robustly in different types of interference and various reverberant conditions, and has a significant advantage over existing systems.

international conference on acoustics, speech, and signal processing | 2010

A multipitch tracking algorithm for noisy and reverberant speech

Zhaozhang Jin; DeLiang Wang

Determining multiple pitches in noisy and reverberant speech is an important and challenging task. We propose a robust multipitch tracking algorithm in the presence of both background noise and room reverberation. A new channel selection method is utilized in conjunction with an auditory front-end to extract periodicity features in the time-frequency space. These features are combined to formulate frame level conditional probabilities given each pitch state. A hidden Markov model is then applied to integrate these probabilities and search for the most likely pitch state sequences. The proposed approach can reliably detect up to two simultaneous pitch contours in noisy and reverberant conditions. Quantitative evaluations show that our system significantly outperforms existing ones, particularly in reverberant environments.

international conference on acoustics, speech, and signal processing | 2009

Learning to maximize signal-to-noise ratio for reverberant speech segregation

Zhaozhang Jin; DeLiang Wang

Monaural speech segregation in reverberant environments is a very difficult problem. We develop a supervised learning approach by proposing an objective function that directly relates to the computational goal of maximizing signal-to-noise ratio. The model trained using this new objective function yields significantly better results for time-frequency unit labeling. In our segregation system, a segmentation and grouping framework is utilized to form reliable segments under reverberant conditions and organize them into streams. Systematic evaluations show very promising results.

Journal of the Acoustical Society of America | 2008

Monaural segregation of reverberant speech

Zhaozhang Jin; DeLiang Wang

A major source of signal degradation in realistic environments is room reverberation. Monaural speech segregation in reverberant environments is a particularly challenging problem. Although inverse filtering has been proposed to partially restore the harmonicity of reverberant speech before segregation, this approach is sensitive to different room configurations. In this study, we investigate monaural segregation of reverberant speech by employing a supervised learning approach to map from a set of pitch‐based auditory features to a grouping cue, which encodes the posterior probability of a time‐frequency unit being target dominant given observed features. We devise a novel objective function for the learning process, which relates to the goal of maximizing SNR directly. The models trained using this new objective function yield significantly better results for unit labeling. In our segregation system, a segmentation and grouping framework is utilized in order to capture segments reliably under reverberan...

conference of the international speech communication association | 2006