Tianshu Qu
Peking University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tianshu Qu.
IEEE Transactions on Audio, Speech, and Language Processing | 2009
Tianshu Qu; Zheng Xiao; Mei Gong; Ying Huang; Xiaodong Li; Xihong Wu
A measurement of head-related transfer functions (HRTFs) with high spatial resolution was carried out in this study. HRTF measurement is difficult in the proximal region because of the lack of an appropriate acoustic point source. In this paper, a modified spark gap was used as the acoustic sound source. Our evaluation experiments showed that the spark gap was more like an acoustic point source than others previously used from the viewpoints of frequency response, directivity, power attenuation, and stability. Using this spark gap, high spatial resolution HRTFs were measured at 6344 spatial points, with distances from 20 to 160 cm, elevations from -40deg to 90deg, and azimuths from 0deg to 360deg. Based on these measurements, an HRTF database was obtained and its reliability was confirmed by both objective and subjective evaluations.
Hearing Research | 2008
Ying Huang; Qiang Huang; Xun Chen; Tianshu Qu; Xihong Wu; Liang Li
This study evaluated unmasking functions of perceptual integration of target speech and simulated target-speech reflection, which were presented by two spatially separated loudspeakers. In both younger-adult listeners with normal hearing and older-adult listeners in the early stages of presbycusis, reducing the time interval between target speech and target-reflection simulation (inter-target interval, ITI) from 64 to 0ms not only progressively enhanced perceptual integration of target-speech signals, but also progressively released target speech from either speech masking or noise masking. When the signal-to-noise ratio was low, the release from speech masking was significantly larger than the release from noise masking in both younger listeners and older listeners, but the longest ITI at which a significant release from speech masking occurred was significantly shorter in older listeners than in younger listeners. These results suggest that in reverberant environments with multi-talker speech, perceptual integration between the direct sound wave and correlated reflections, which facilitates perceptual segregation of various sources, is critical for unmasking attended speech. The age-related reduction of the ITI range for releasing speech from speech masking may be one of the causes for the speech-recognition difficulties experienced by older listeners in such adverse environments.
Journal of Cognitive Neuroscience | 2011
Ying Huang; Jingyu Li; Xuefei Zou; Tianshu Qu; Xihong Wu; Lihua Mao; Yanhong Wu; Liang Li
To discriminate and to recognize sound sources in a noisy, reverberant environment, listeners need to perceptually integrate the direct wave with the reflections of each sound source. It has been confirmed that perceptual fusion between direct and reflected waves of a speech sound helps listeners recognize this speech sound in a simulated reverberant environment with disrupting sound sources. When the delay between a direct sound wave and its reflected wave is sufficiently short, the two waves are perceptually fused into a single sound image as coming from the source location. Interestingly, compared with nonspeech sounds such as clicks and noise bursts, speech sounds have a much larger perceptual fusion tendency. This study investigated why the fusion tendency for speech sounds is so large. Here we show that when the temporal amplitude fluctuation of speech was artificially time reversed, a large perceptual fusion tendency of speech sounds disappeared, regardless of whether the speech acoustic carrier was in normal or reversed temporal order. Moreover, perceptual fusion of normal-order speech, but not that of time-reversed speech, was accompanied by increased coactivation of the attention-control-related, spatial-processing-related, and speech-processing-related cortical areas. Thus, speech-like acoustic carriers modulated by speech amplitude fluctuation selectively activate a cortical network for top–down modulations of speech processing, leading to an enhancement of perceptual fusion of speech sounds. This mechanism represents a perceptual-grouping strategy for unmasking speech under adverse conditions.
international conference on audio, language and image processing | 2008
Tianshu Qu; Zheng Xiao; Mei Gong; Ying Huang; Xiaodong Li; Xihong Wu
The measurement and structure of a database of distance-dependent head-related transfer function is introduced in this paper. This database was setup by measuring a high spatial resolution head-related transfer function at a total of 6344 space points, with distance from 20 to 160 cm, elevation from -40 to 90 degrees, and azimuth from 0 to 360 degrees. The databasepsilas reliability was confirmed by the object and subject evaluations.
Behavioural Brain Research | 2014
Ming Lei; Lu Luo; Tianshu Qu; Hongxiao Jia; Liang Li
Prepulse inhibition (PPI) is the suppression of the startle reflex when the startling stimulus is shortly preceded by a non-startling stimulus (the prepulse). Previous studies have shown that both fear conditioning of a prepulse and precedence-effect-induced perceptual separation between the conditioned prepulse and a noise masker facilitate selective attention to the prepulse and consequently enhance PPI with a remarkable prepulse-feature specificity. This study investigated whether the two types of attentional enhancements of PPI in rats also exhibit a prepulse-location specificity. The results showed that when a prepulse was delivered by each of the two spatially separated loudspeakers, fear conditioning of the prepulse at a particularly perceived location (left or right to the tested rat) enhanced PPI without exhibiting any perceived-location specificity. However, when a noise masker was presented, the precedence-effect-induced perceptual separation between the conditioned prepulse and the noise masker further enhanced PPI when the prepulse was perceived as coming from the location that was conditioned but not the location without being conditioned. Moreover, both conditioning-induced and perceptual separation-induced PPI enhancements were eliminated by extinction learning, whose effect could be blocked by systemic injection of the selective antagonist of metabotropic glutamate receptor subtype 5 (mGluR5), 2-methyl-6-(phenylethynyl)-pyridine (MPEP). Thus, fear conditioning of a prepulse perceived at a particular location not only facilitates selective attention to the conditioned prepulse but also induces a learning-based spatial gating effect on the spatial unmasking of the conditioned prepulse, leading to that the perceptual separation-induced PPI enhancement becomes perceived-location specific.
Frontiers in Human Neuroscience | 2017
Yayue Gao; Qian Wang; Yu Ding; Changming Wang; Haifeng Li; Xihong Wu; Tianshu Qu; Liang Li
Human listeners are able to selectively attend to target speech in a noisy environment with multiple-people talking. Using recordings of scalp electroencephalogram (EEG), this study investigated how selective attention facilitates the cortical representation of target speech under a simulated “cocktail-party” listening condition with speech-on-speech masking. The result shows that the cortical representation of target-speech signals under the multiple-people talking condition was specifically improved by selective attention relative to the non-selective-attention listening condition, and the beta-band activity was most strongly modulated by selective attention. Moreover, measured with the Granger Causality value, selective attention to the single target speech in the mixed-speech complex enhanced the following four causal connectivities for the beta-band oscillation: the ones (1) from site FT7 to the right motor area, (2) from the left frontal area to the right motor area, (3) from the central frontal area to the right motor area, and (4) from the central frontal area to the right frontal area. However, the selective-attention-induced change in beta-band causal connectivity from the central frontal area to the right motor area, but not other beta-band causal connectivities, was significantly correlated with the selective-attention-induced change in the cortical beta-band representation of target speech. These findings suggest that under the “cocktail-party” listening condition, the beta-band oscillation in EEGs to target speech is specifically facilitated by selective attention to the target speech that is embedded in the mixed-speech complex. The selective attention-induced unmasking of target speech may be associated with the improved beta-band functional connectivity from the central frontal area to the right motor area, suggesting a top-down attentional modulation of the speech-motor process.
PLOS ONE | 2015
Lingzhi Kong; Zilong Xie; Lingxi Lu; Tianshu Qu; Xihong Wu; Jun Yan; Liang Li
The subjective representation of the sounds delivered to the two ears of a human listener is closely associated with the interaural delay and correlation of these two-ear sounds. When the two-ear sounds, e.g., arbitrary noises, arrive simultaneously, the single auditory image of the binaurally identical noises becomes increasingly diffuse, and eventually separates into two auditory images as the interaural correlation decreases. When the interaural delay increases from zero to several milliseconds, the auditory image of the binaurally identical noises also changes from a single image to two distinct images. However, measuring the effect of these two factors on an identical group of participants has not been investigated. This study examined the impacts of interaural correlation and delay on detecting a binaurally uncorrelated fragment (interaural correlation = 0) embedded in the binaurally correlated noises (i.e., binaural gap or break in interaural correlation). We found that the minimum duration of the binaural gap for its detection (i.e., duration threshold) increased exponentially as the interaural delay between the binaurally identical noises increased linearly from 0 to 8 ms. When no interaural delay was introduced, the duration threshold also increased exponentially as the interaural correlation of the binaurally correlated noises decreased linearly from 1 to 0.4. A linear relationship between the effect of interaural delay and that of interaural correlation was described for listeners participating in this study: a 1 ms increase in interaural delay appeared to correspond to a 0.07 decrease in interaural correlation specific to raising the duration threshold. Our results imply that a tradeoff may exist between the impacts of interaural correlation and interaural delay on the subjective representation of sounds delivered to two human ears.
PsyCh Journal | 2014
Yayue Gao; Shuyang Cao; Tianshu Qu; Xihong Wu; Haifeng Li; Jinsheng Zhang; Liang Li
In noisy, multipeople talking environments such as a cocktail party, listeners can use various perceptual and/or cognitive cues to improve recognition of target speech against masking, particularly informational masking. Previous studies have shown that temporally prepresented voice cues (voice primes) improve recognition of target speech against speech masking but not noise masking. This study investigated whether static face image primes that have become target-voice associated (i.e., facial images linked through associative learning with voices reciting the target speech) can be used by listeners to unmask speech. The results showed that in 32 normal-hearing younger adults, temporally prepresenting a voice-priming sentence with the same voice reciting the target sentence significantly improved the recognition of target speech that was masked by irrelevant two-talker speech. When a persons face photograph image became associated with the voice reciting the target speech by learning, temporally prepresenting the target-voice-associated face image significantly improved recognition of target speech against speech masking, particularly for the last two keywords in the target sentence. Moreover, speech-recognition performance under the voice-priming condition was significantly correlated to that under the face-priming condition. The results suggest that learned facial information on talker identity plays an important role in identifying the target-talkers voice and facilitating selective attention to the target-speech stream against the masking-speech stream.
Ear and Hearing | 2013
Tianshu Qu; Shuyang Cao; Xun Chen; Ying Huang; Liang Li; Xihong Wu; Bruce A. Schneider
Objectives: Previous studies have shown that both younger adults and older adults with clinically normal hearing are able to detect a break in correlation (BIC) between interaurally correlated sounds presented over headphones. This ability to detect a BIC improved when the correlated sounds were presented over left and right loudspeakers rather than over left and right headphones, suggesting that additional spectral cues provided by comb filtering (caused by interference between the two channels) facilitate detection of the BIC. However, older adults receive significantly less benefit than younger adults from a switch to loudspeaker presentation. It is hypothesized that this is a result of an age-related reduction in the sensitivity to the monaural spectral cues provided by comb filtering. Design: Two experiments were conducted in this study. Correlated white noises with a BIC in the temporal middle were presented from two spatially separated loudspeakers (positioned at ±45-degree azimuth) and recorded at the right ear of a Knowles Electronic Manikin for Acoustic Research (KEMAR). In Experiment 1, the waveforms recorded at the KEMAR’s right ear were presented to the participant’s right ear over a headphone in 14 younger adults and 24 older adults with clinically normal hearing. In Experiment 2, 8 of the 14 younger participants participated. Under the monaurally cueing condition, the waveforms recorded at the KEMAR’s right ear were presented to the participant’s right ear as Experiment 1. Under the binaurally cueing condition, waveforms delivered from the left loudspeaker and those from the right loudspeaker were recorded at the KEMAR’s left and right ear, respectively, thereby eliminating the spectral ripple cue, and were presented to the participant’s left and right ears, respectively. For each of the two experiments, the break duration threshold for detecting the BIC was examined when the interloudspeaker interval (delay) (ILI) was 0, 1, 2, or 4 msec (left loudspeaker leading). Results: In Experiment 1, both younger participants and older participants detected the BIC in the waveforms recorded by the right ear of KEMAR, but older participants had higher detection thresholds than younger participants when the ILI was 0, 2, or 4 msec without an effect of SPL shift between 59 and 71 dB. In Experiment 2, each of the eight younger participants was able to detect the occurrence of the BIC in either the monaurally cueing or binaural-cueing condition. In addition, the detection threshold under the monaurally cueing condition was substantially the same as that under the binaurally cueing condition at each of the four ILIs. Conclusions: Younger adults and older adults with clinically normal hearing are able to detect the monaural spectral changes arising from comb filtering when a sudden drop in intersound correlation is introduced. However, younger adults are more sensitive than older adults are, at detecting the BIC. The findings suggest that older adults are less able than younger adults to detect a periodic ripple in the sound spectrum. This age-related ability reduction may contribute to older adults’ difficulties in hearing under noisy, reverberant conditions.
Attention Perception & Psychophysics | 2018
Lingxi Lu; Xiaohan Bao; Jing Chen; Tianshu Qu; Xihong Wu; Liang Li
Under a noisy “cocktail-party” listening condition with multiple people talking, listeners can use various perceptual/cognitive unmasking cues to improve recognition of the target speech against informational speech-on-speech masking. One potential unmasking cue is the emotion expressed in a speech voice, by means of certain acoustical features. However, it was unclear whether emotionally conditioning a target-speech voice that has none of the typical acoustical features of emotions (i.e., an emotionally neutral voice) can be used by listeners for enhancing target-speech recognition under speech-on-speech masking conditions. In this study we examined the recognition of target speech against a two-talker speech masker both before and after the emotionally neutral target voice was paired with a loud female screaming sound that has a marked negative emotional valence. The results showed that recognition of the target speech (especially the first keyword in a target sentence) was significantly improved by emotionally conditioning the target speaker’s voice. Moreover, the emotional unmasking effect was independent of the unmasking effect of the perceived spatial separation between the target speech and the masker. Also, (skin conductance) electrodermal responses became stronger after emotional learning when the target speech and masker were perceptually co-located, suggesting an increase of listening efforts when the target speech was informationally masked. These results indicate that emotionally conditioning the target speaker’s voice does not change the acoustical parameters of the target-speech stimuli, but the emotionally conditioned vocal features can be used as cues for unmasking target speech.