Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Feipeng Li is active.

Publication


Featured researches published by Feipeng Li.


Journal of the Acoustical Society of America | 2010

A psychoacoustic method to find the perceptual cues of stop consonants in natural speech

Feipeng Li; Anjali Menon; Jont B. Allen

Synthetic speech has been widely used in the study of speech cues. A serious disadvantage of this method is that it requires prior knowledge about the cues to be identified in order to synthesize the speech. Incomplete or inaccurate hypotheses about the cues often lead to speech sounds of low quality. In this research a psychoacoustic method, named three-dimensional deep search (3DDS), is developed to explore the perceptual cues of stop consonants from naturally produced speech. For a given sound, it measures the contribution of each subcomponent to perception by time truncating, highpass/lowpass filtering, or masking the speech with white noise. The AI-gram, a visualization tool that simulates the auditory peripheral processing, is used to predict the audible components of the speech sound. The results are generally in agreement with the classical studies that stops are characterized by a short duration burst followed by a F2 transition, suggesting the effectiveness of the 3DDS method. However, it is also shown that /ba/ and /pa/ may have a wide band click as the dominant cue. F2 transition is not necessary for the perception of /ta/ and /ka/. Moreover, many stop consonants contain conflicting cues that are characteristic of competing sounds. The robustness of a consonant sound to noise is determined by the intensity of the dominant cue.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

Manipulation of Consonants in Natural Speech

Feipeng Li; Jont B. Allen

Natural speech often contains conflicting cues that are characteristic of confusable sounds. For example, the /k/, defined by a mid-frequency burst within 1-2 kHz, may also contain a high-frequency burst above 4 kHz indicative of /ta/, or vice versa. Conflicting cues can cause people to confuse the two sounds in a noisy environment. An efficient way of reducing confusion and improving speech intelligibility in noise is to modify these speech cues. This paper describes a method to manipulate consonant sounds in natural speech, based on our a priori knowledge of perceptual cues of consonants. We demonstrate that: 1) the percept of consonants in natural speech can be controlled through the manipulation of perceptual cues; 2) speech sounds can be made much more robust to noise by removing the conflicting cue and enhancing the target cue.


Journal of the Acoustical Society of America | 2009

Multiband product rule and consonant identification

Feipeng Li; Jont B. Allen

The multiband product rule, also known as band-independence, is a basic assumption of articulation index and its extension, the speech intelligibility index. Previously Fletcher showed its validity for a balanced mix of 20% consonant-vowel (CV), 20% vowel-consonant (VC), and 60% consonant-vowel-consonant (CVC) sounds. This study repeats Miller and Nicelys version of the hi-/lo-pass experiment with minor changes to study band-independence for the 16 Miller-Nicely consonants. The cut-off frequencies are chosen such that the basilar membrane is evenly divided into 12 segments from 250 to 8000 Hz with the high-pass and low-pass filters sharing the same six cut-off frequencies in the middle. Results show that the multiband product rule is statistically valid for consonants on average. It also applies to subgroups of consonants, such as stops and fricatives, which are characterized by a flat distribution of speech cues along the frequency. It fails for individual consonants.


Proceedings of the 10th International Workshop on the Mechanics of Hearing | 2009

NONLINEAR COCHLEAR SIGNAL PROCESSING AND PHONEME PERCEPTION

Jont B. Allen; Marion Regnier; Sandeep A. Phatak; Feipeng Li

The most important communication signal is human speech. It is helpful to think of speech communication in terms of Claude Shannon’s information theory channel model. When thus viewed, it immediately becomes clear that the most complex part of speech communication channel is in auditory system (the receiver). In my opinion, even after years of work, relatively little is know about how the human auditory system decodes speech. Given cochlear damaged, speech scores are greatly reduced, even with tiny amounts of noise. The exact reasons for this SNR-loss presently remain unclear, but I speculate that the source of this must be cochlear outer hair cell temporal processing, not central processing. Specifically, “temporal edge enhancement” of the speech signal and forward masking could easily be modified in such ears, leading to SNR-Loss. What ever the reason, SNR-Loss is the key problem that needs to be fully researched.


Archive | 2010

Identification of Perceptual Cues for Consonant Sounds and the Influence of Sensorineural Hearing Loss on Speech Perception

Feipeng Li; Jont B. Allen

A common problem for people with hearing loss is that they can hear the noisy speech, with the assistance of hearing aids, but still they cannot understand it. To explain why, the following two questions need to be addressed: (1) What are the perceptual cues making up speech sounds? (2) What are the impacts of different types of hearing loss on speech perception? For the first question, a systematic psychoacoustic method is developed to explore the perceptual cues of consonant sounds. Without making any assumptions about the cues to be identified, it measures the contribution of each subcomponent to speech perception by time truncating, high/low-pass filtering, or masking the speech with white noise. In addition, AI-gram, a tool that simulates auditory peripheral processing, is developed to show the audible components of a speech sound on the basilar membrane. For the second question, speech perception experiments are used to determine the difficult sounds for the hearing-impaired listeners. In a case study, an elderly subject (AS) with moderate to severe sloping hearing loss, trained in linguistics, volunteered for the pilot study. Results show that AS cannot hear /ka/ and /ga/ with her left ear, because of a cochlear dead region from 2 to 3.5 kHz, where we show that the perceptual cues for /ka/ and /ga/ are located. In contrast, her right ear can hear these two sounds with low accuracy. NAR-L improves the average score by 10%, but it has no effect on the two inaudible consonants.


ieee automatic speech recognition and understanding workshop | 2009

Manipulation of consonants in natural speech

Jont B. Allen; Feipeng Li

Starting in the 1920s, researchers at AT&T Research characterized speech perception. Until 1950, this work was done by a large group working under Harvey Fletcher, which resulted in the articulation index, an important tool able to predict average speech scores. In the 1950s a dedicated group of researchers at Haskins Labs in NYC attempted to extend these ideas, and then again at MIT under the direction of Ken Stevens, further work was done, on trying to identify the reliable speech cues. Most of this work after 1950 was not successful in finding speech cues, therefore today many consider it impossible. That is, many believe that there is no direct unique mapping from the time-frequency plane to consonant and vowel recognition. For example it has been claimed that context is necessary to successfully identify nonsense consonantvowels. In fact this is not the case. The post 1950 work mostly used synthetic speech. This was a major flaw with all these studies. Also only average results were studied, again a major flaw.


Journal of the Acoustical Society of America | 2009

Consonant identification for hearing impaired listeners.

Feipeng Li; Jont B. Allen

This research investigates the impact of sensorineural hearing loss on consonant identification. Two elderly subjects, AS with a moderate to severe sloping hearing loss and DC with a severe sloping high‐frequency hearing loss, volunteered for the pilot study. Speech perception test reveals that AS has no problem with /ta/, and has little difficulty with /pa, sa, da, za/, but never reports /ka/ and /ga/ due to a cochlear dead region from 2–3.5 kHz in the left ear, which blocks the perceptual cues for /ka/ and /ga/. In contrast, her right ear hears these sounds. Although NAL‐R improves the average perception score from 0.42 to 0.53 under quiet conditions, it provides little help for noisy speech (a 3% increase at 12 dB signal to noise ratio); /ka/ and /ga/ remain unintelligible with NAL‐R. The other subject DC can hear all the 16 consonants tested in both ears with the assistance of NAL‐R. However, it only improves the recognition scores of low and midfrequency sounds such as /pa, ba/ and /ka, ga/. It degra...


Journal of the Acoustical Society of America | 2011

What causes the confusion patterns in speech perception

Feipeng Li; Jont B. Allen

It is widely believed that speech contains redundant cues which make it robust in noise. We have discovered that natural speech often contains conflicting cues that cause significant confusion patterns in speech perception. To demonstrate the impact of conflicting cues on the identification of consonant sounds, we selected four plosives /ta, ka, da, ga/ and four fricatives /si, Si, zi, Zi/ that show significant confusion with other sounds from a female high-performance talker in the LDC database and derived a set of test stimuli by removing the dominant cue and manipulating the conflicting cue. It was found that the consonants always morph to their competing sounds, once the dominant cue is unavailable. The confusion can be strengthened by enhancing the conflicting cue. These results demonstrate that the identification of consonant is dependent on a careful balancing between the dominant cue and the conflicting cues. Such prior knowledge about perceptual cues can be used to manipulate the perception of co...


Journal of the Acoustical Society of America | 2010

Predict the intelligibility of individual consonants in noise for hearing‐impaired listeners.

Feipeng Li; Woojae Han; Jont B. Allen

The performance of hearing‐impaired (HI) listeners to understand speech often deteriorates drastically in cocktail‐party environments. To understand why a particular HI listener can hear certain sounds and not the others, it is critical that we take the prior knowledge about speech cues into consideration and investigate the effect of different types of cochlear hearing loss on speech perception. In the last 2 years we have tested over 50 hearing‐impaired ears on consonant identification in noise. To evaluate the impact of shift of hearing threshold on the intelligibility of individual consonants in masking noise, a consonant intelligibility index (CII) is developed to predict the perceptual scores of individual consonants. Results of preliminary study indicate that CII makes an accurate prediction for flat mild‐to‐moderate hearing loss, but fails for the cases of cochlear dead region and extremely unbalanced (e.g., severe high‐frequency) loss, suggesting that audibility alone does not fully account for t...


Journal of the Acoustical Society of America | 2008

Perceptual cues for consonant identification.

Feipeng Li; Jont B. Allen

This research quantitatively explores the perceptual cues of initial consonants by using psychoacoustical methods. Speech sounds are encoded by across‐frequency temporal onsets called events. To determine the time‐frequency importance function of the consonant sounds, speech stimuli (16 nonsense CVs from the LDC‐2005S22 database) are high‐pass or low‐pass filtered and time‐truncated before being presented to normal hearing listeners. Databases of speech perception under various signal to noise ratio (SNR) conditions are constructed to investigate the effect of noise on speech recognition. A visualization tool that simulates the auditory peripheral processing, called the AI‐gram, is used for the analysis of the speech events under various SNR conditions. To verify the nature of the events, a special software has been developed to convert one sound into another, starting from real speech sounds, by removing primary noise robust cues. In pilot studies with a hearing‐impaired subject, it is shown that feature...

Collaboration


Dive into the Feipeng Li's collaboration.

Researchain Logo
Decentralizing Knowledge