Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Guoning Hu is active.

Publication


Featured researches published by Guoning Hu.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

A Tandem Algorithm for Pitch Estimation and Voiced Speech Segregation

Guoning Hu; DeLiang Wang

A lot of effort has been made in computational auditory scene analysis (CASA) to segregate speech from monaural mixtures. The performance of current CASA systems on voiced speech segregation is limited by lacking a robust algorithm for pitch estimation. We propose a tandem algorithm that performs pitch estimation of a target utterance and segregation of voiced portions of target speech jointly and iteratively. This algorithm first obtains a rough estimate of target pitch, and then uses this estimate to segregate target speech using harmonicity and temporal continuity. It then improves both pitch estimation and voiced speech segregation iteratively. Novel methods are proposed for performing segregation with a given pitch estimate and pitch determination with given segregation. Systematic evaluation shows that the tandem algorithm extracts a majority of target speech without including much interference, and it performs substantially better than previous systems for either pitch extraction or voiced speech segregation.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Auditory Segmentation Based on Onset and Offset Analysis

Guoning Hu; DeLiang Wang

A typical auditory scene in a natural environment contains multiple sources. Auditory scene analysis (ASA) is the process in which the auditory system segregates a scene into streams corresponding to different sources. Segmentation is a major stage of ASA by which an auditory scene is decomposed into segments, each containing signal mainly from one source. We propose a system for auditory segmentation by analyzing onsets and offsets of auditory events. The proposed system first detects onsets and offsets, and then generates segments by matching corresponding onset and offset fronts. This is achieved through a multiscale approach. A quantitative measure is suggested for segmentation evaluation. Systematic evaluation shows that most of target speech, including unvoiced speech, is correctly segmented, and target speech and interference are well separated into different segments


Journal of the Acoustical Society of America | 2008

Segregation of unvoiced speech from nonspeech interference

Guoning Hu; DeLiang Wang

Monaural speech segregation has proven to be extremely challenging. While efforts in computational auditory scene analysis have led to considerable progress in voiced speech segregation, little attention has been given to unvoiced speech, which lacks harmonic structure and has weaker energy, hence more susceptible to interference. This study proposes a new approach to the problem of segregating unvoiced speech from nonspeech interference. The study first addresses the question of how much speech is unvoiced. The segregation process occurs in two stages: Segmentation and grouping. In segmentation, the proposed model decomposes an input mixture into contiguous time-frequency segments by a multiscale analysis of event onsets and offsets. Grouping of unvoiced segments is based on Bayesian classification of acoustic-phonetic features. The proposed model for unvoiced speech segregation joins an existing model for voiced speech segregation to produce an overall system that can deal with both voiced and unvoiced speech. Systematic evaluation shows that the proposed system extracts a majority of unvoiced speech without including much interference, and it performs substantially better than spectral subtraction.


workshop on applications of signal processing to audio and acoustics | 2001

Speech segregation based on pitch tracking and amplitude modulation

Guoning Hu; DeLiang Wang

Speech segregation is an important task of auditory scene analysis (ASA), in which the speech of a certain speaker is separated from other interfering signals. D.L. Wang and G.J. Brown (see IEEE Trans. Neural Network, vol.10, p.684-97, 1999) proposed a multistage neural model for speech segregation, the core of which is a two-layer oscillator network. We extend their model by adding further processes based on psychoacoustic evidence to improve the performance. These processes include pitch tracking and grouping based on amplitude modulation (AM). Our model is systematically evaluated and compared with the Wang-Brown model, and it yields significantly better performance.


international conference on acoustics, speech, and signal processing | 2003

Separation of stop consonants

Guoning Hu; DeLiang Wang

To extract speech from acoustic interference is a challenging problem. Previous systems based on auditory scene analysis principles deal with voiced speech, but cannot separate unvoiced speech. We propose a novel method to separate stop consonants, which contain significant unvoiced signals, based on their acoustic properties. The method employs onset as the major grouping cue; it first detects stops through onset detection and feature-based Bayesian classification, then groups detected onsets based on onset coincidence. This method is tested with utterances mixed with various types of interference.


international conference on acoustics, speech, and signal processing | 2005

Separation of fricatives and affricates

Guoning Hu; DeLiang Wang

Separating speech from acoustic interference is a very challenging task. In particular, no system successfully addresses the separation of unvoiced speech. Fricatives and affricates are two main categories of consonants that contain a significant amount of unvoiced signal. We propose a novel system that separates fricatives and affricates from non-speech interference. The system first decomposes the input mixture into segments, each of which contains signal mainly from one source. Then it detects segments dominated by unvoiced portions of fricatives and affricates with a feature-based Bayesian classifier, and groups these segments with voiced speech separated by a previous system. The proposed system is evaluated with various types of interference and produces promising results.


international conference on acoustics, speech, and signal processing | 2006

Unvoiced Speech Segregation

DeLiang Wang; Guoning Hu

Speech segregation, or the cocktail party problem, has proven to be extremely challenging. While efforts in computational auditory scene analysis have led to considerable progress in voiced speech segregation, little attention has been given to unvoiced speech which lacks harmonic structure and has weaker energy, hence more susceptible to interference. We describe a novel approach to address this problem. The segregation process occurs in two stages: segmentation and grouping. In segmentation, our model decomposes the input mixture into contiguous time-frequency segments by analyzing sound onsets and offsets. Grouping of unvoiced segments is based on Bayesian classification of acoustic-phonetic features. The proposed model yields very promising results


international conference on acoustics, speech, and signal processing | 2002

Monaural speech segregation based on pitch tracking and amplitude modulation

Guoning Hu; DeLiang Wang

Monaural speech segregation remains a computational challenge for auditory scene analysis (ASA). A major problem for existing computational auditory scene analysis (CASA) systems is their inability to deal with signals in the high-frequency range. Psychoacoustic evidence suggests that different perceptual mechanisms are involved to handle resolved and unresolved harmonics. We propose a system for speech segregation that deals with low-frequency and high-frequency signals differently. For low-frequency signals, our model generates segments based on temporal continuity and cross-channel correlation, and groups them according to periodicity. For high-frequency signals. the model generates segments based on common amplitude modulation (AM) in addition to temporal continuity, and groups them according to AM repetition rates. Underlying the grouping process is a pitch contour that is first estimated from segregated speech based on global pitch and then verified by psychoacoustic constraints. Our system is systematically evaluated, and it yields substantially better performance than previous CASA systems, especially in the high-frequency range.


international symposium on neural networks | 2002

On amplitude modulation for monaural speech segregation

Guoning Hu; DeLiang Wang

We propose a computational auditory scene analysis (CASA) model for monaural speech segregation. It deals with low-frequency and high-frequency signals differently. For high-frequency signals, it generates segments based on the common amplitude modulation (AM) and groups them according to AM repetition rates. This model performs substantially better than previous CASA systems.


international symposium on neural networks | 2001

An extended model for speech segregation

Guoning Hu; DeLiang Wang

Speech segregation is an important task of auditory scene analysis (ASA), in which the speech of a certain speaker is separated from other interfering signals. Wang and Brown (1999) proposed a multistage neural model for speech segregation, the core of which is a two-layer oscillator network. We extend their model by adding further processes based on psychoacoustic evidence to improve the performance. These processes include estimation of the pitch of target speech and refined generation of a target speech stream with the estimated pitch. Our model is systematically evaluated and compared with the Wang-Brown model, and it yields significantly better performance.

Collaboration


Dive into the Guoning Hu's collaboration.

Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge