Nai Ding
Zhejiang University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Nai Ding.
Proceedings of the National Academy of Sciences of the United States of America | 2012
Nai Ding; Jonathan Z. Simon
A visual scene is perceived in terms of visual objects. Similar ideas have been proposed for the analogous case of auditory scene analysis, although their hypothesized neural underpinnings have not yet been established. Here, we address this question by recording from subjects selectively listening to one of two competing speakers, either of different or the same sex, using magnetoencephalography. Individual neural representations are seen for the speech of the two speakers, with each being selectively phase locked to the rhythm of the corresponding speech stream and from which can be exclusively reconstructed the temporal envelope of that speech stream. The neural representation of the attended speech dominates responses (with latency near 100 ms) in posterior auditory cortex. Furthermore, when the intensity of the attended and background speakers is separately varied over an 8-dB range, the neural representation of the attended speech adapts only to the intensity of that speaker but not to the intensity of the background speaker, suggesting an object-level intensity gain control. In summary, these results indicate that concurrent auditory objects, even if spectrotemporally overlapping and not resolvable at the auditory periphery, are neurally encoded individually in auditory cortex and emerge as fundamental representational units for top-down attentional modulation and bottom-up neural adaptation.
Nature Neuroscience | 2016
Nai Ding; Lucia Melloni; Hang Zhang; Xing Tian; David Poeppel
The most critical attribute of human language is its unbounded combinatorial nature: smaller elements can be combined into larger structures on the basis of a grammatical system, resulting in a hierarchy of linguistic units, such as words, phrases and sentences. Mentally parsing and representing such structures, however, poses challenges for speech comprehension. In speech, hierarchical linguistic structures do not have boundaries that are clearly defined by acoustic cues and must therefore be internally and incrementally constructed during comprehension. We found that, during listening to connected speech, cortical activity of different timescales concurrently tracked the time course of abstract linguistic structures at different hierarchical levels, such as words, phrases and sentences. Notably, the neural tracking of hierarchical linguistic structures was dissociated from the encoding of acoustic cues and from the predictability of incoming words. Our results indicate that a hierarchy of neural processing timescales underlies grammar-based internal construction of hierarchical linguistic structure.
Frontiers in Human Neuroscience | 2014
Nai Ding; Jonathan Z. Simon
Auditory cortical activity is entrained to the temporal envelope of speech, which corresponds to the syllabic rhythm of speech. Such entrained cortical activity can be measured from subjects naturally listening to sentences or spoken passages, providing a reliable neural marker of online speech processing. A central question still remains to be answered about whether cortical entrained activity is more closely related to speech perception or non-speech-specific auditory encoding. Here, we review a few hypotheses about the functional roles of cortical entrainment to speech, e.g., encoding acoustic features, parsing syllabic boundaries, and selecting sensory information in complex listening environments. It is likely that speech entrainment is not a homogeneous response and these hypotheses apply separately for speech entrainment generated from different neural sources. The relationship between entrained activity and speech intelligibility is also discussed. A tentative conclusion is that theta-band entrainment (4–8 Hz) encodes speech features critical for intelligibility while delta-band entrainment (1–4 Hz) is related to the perceived, non-speech-specific acoustic rhythm. To further understand the functional properties of speech entrainment, a splitter’s approach will be needed to investigate (1) not just the temporal envelope but what specific acoustic features are encoded and (2) not just speech intelligibility but what specific psycholinguistic processes are encoded by entrained cortical activity. Similarly, the anatomical and spectro-temporal details of entrained activity need to be taken into account when investigating its functional properties.
The Journal of Neuroscience | 2013
Nai Ding; Jonathan Z. Simon
Speech recognition is remarkably robust to the listening background, even when the energy of background sounds strongly overlaps with that of speech. How the brain transforms the corrupted acoustic signal into a reliable neural representation suitable for speech recognition, however, remains elusive. Here, we hypothesize that this transformation is performed at the level of auditory cortex through adaptive neural encoding, and we test the hypothesis by recording, using MEG, the neural responses of human subjects listening to a narrated story. Spectrally matched stationary noise, which has maximal acoustic overlap with the speech, is mixed in at various intensity levels. Despite the severe acoustic interference caused by this noise, it is here demonstrated that low-frequency auditory cortical activity is reliably synchronized to the slow temporal modulations of speech, even when the noise is twice as strong as the speech. Such a reliable neural representation is maintained by intensity contrast gain control and by adaptive processing of temporal modulations at different time scales, corresponding to the neural δ and θ bands. Critically, the precision of this neural synchronization predicts how well a listener can recognize speech in noise, indicating that the precision of the auditory cortical representation limits the performance of speech recognition in noise. Together, these results suggest that, in a complex listening environment, auditory cortex can selectively encode a speech stream in a background insensitive manner, and this stable neural representation of speech provides a plausible basis for background-invariant recognition of speech.
Neuroscience & Biobehavioral Reviews | 2017
Nai Ding; Aniruddh D. Patel; Lin Chen; Henry Butler; Cheng Luo; David Poeppel
Speech and music have structured rhythms. Here we discuss a major acoustic correlate of spoken and musical rhythms, the slow (0.25-32Hz) temporal modulations in sound intensity and compare the modulation properties of speech and music. We analyze these modulations using over 25h of speech and over 39h of recordings of Western music. We show that the speech modulation spectrum is highly consistent across 9 languages (including languages with typologically different rhythmic characteristics). A different, but similarly consistent modulation spectrum is observed for music, including classical music played by single instruments of different types, symphonic, jazz, and rock. The temporal modulations of speech and music show broad but well-separated peaks around 5 and 2Hz, respectively. These acoustically dominant time scales may be intrinsic features of speech and music, a possibility which should be investigated using more culturally diverse samples in each domain. Distinct modulation timescales for speech and music could facilitate their perceptual analysis and its neural processing.
Journal of Neurophysiology | 2009
Nai Ding; Jonathan Z. Simon
Natural sounds such as speech contain multiple levels and multiple types of temporal modulations. Because of nonlinearities of the auditory system, however, the neural response to multiple, simultaneous temporal modulations cannot be predicted from the neural responses to single modulations. Here we show the cortical neural representation of an auditory stimulus simultaneously frequency modulated (FM) at a high rate, f(FM) approximately 40 Hz, and amplitude modulation (AM) at a slow rate, f(AM) <15 Hz. Magnetoencephalography recordings show fast FM and slow AM stimulus features evoke two separate but not independent auditory steady-state responses (aSSR) at f(FM) and f(AM), respectively. The power, rather than phase locking, of the aSSR of both decreases with increasing stimulus f(AM). The aSSR at f(FM) is itself simultaneously amplitude modulated and phase modulated with fundamental frequency f(AM), showing that the slow stimulus AM is not only encoded in the neural response at f(AM) but also encoded in the instantaneous amplitude and phase of the neural response at f(FM). Both the amplitude modulation and phase modulation of the aSSR at f(FM) are most salient for low stimulus f(AM) but remain observable at the highest tested f(AM) (13.8 Hz). The instantaneous amplitude of the aSSR at f(FM) is successfully predicted by a model containing temporal integration on two time scales, approximately 25 and approximately 200 ms, followed by a static compression nonlinearity.
Journal of Computational Neuroscience | 2013
Nai Ding; Jonathan Z. Simon
Natural sensory inputs, such as speech and music, are often rhythmic. Recent studies have consistently demonstrated that these rhythmic stimuli cause the phase of oscillatory, i.e. rhythmic, neural activity, recorded as local field potential (LFP), electroencephalography (EEG) or magnetoencephalography (MEG), to synchronize with the stimulus. This phase synchronization, when not accompanied by any increase of response power, has been hypothesized to be the result of phase resetting of ongoing, spontaneous, neural oscillations measurable by LFP, EEG, or MEG. In this article, however, we argue that this same phenomenon can be easily explained without any phase resetting, and where the stimulus-synchronized activity is generated independently of background neural oscillations. It is demonstrated with a simple (but general) stochastic model that, purely due to statistical properties, phase synchronization, as measured by ‘inter-trial phase coherence’, is much more sensitive to stimulus-synchronized neural activity than is power. These results question the usefulness of analyzing the power and phase of stimulus-synchronized activity as separate and complementary measures; particularly in the case of attempting to demonstrate whether stimulus-synchronized neural activity is generated by phase resetting of ongoing neural oscillations.
Hearing Research | 2014
Ying-Yee Kong; Ala Mullangi; Nai Ding
This study investigates how top-down attention modulates neural tracking of the speech envelope in different listening conditions. In the quiet conditions, a single speech stream was presented and the subjects paid attention to the speech stream (active listening) or watched a silent movie instead (passive listening). In the competing speaker (CS) conditions, two speakers of opposite genders were presented diotically. Ongoing electroencephalographic (EEG) responses were measured in each condition and cross-correlated with the speech envelope of each speaker at different time lags. In quiet, active and passive listening resulted in similar neural responses to the speech envelope. In the CS conditions, however, the shape of the cross-correlation function was remarkably different between the attended and unattended speech. The cross-correlation with the attended speech showed stronger N1 and P2 responses but a weaker P1 response compared to the cross-correlation with the unattended speech. Furthermore, the N1 response to the attended speech in the CS condition was enhanced and delayed compared with the active listening condition in quiet, while the P2 response to the unattended speaker in the CS condition was attenuated compared with the passive listening in quiet. Taken together, these results demonstrate that top-down attention differentially modulates envelope-tracking neural activity at different time lags and suggest that top-down attention can both enhance the neural responses to the attended sound stream and suppress the responses to the unattended sound stream.
Frontiers in Human Neuroscience | 2016
Hong Zhou; Lucia Melloni; David Poeppel; Nai Ding
Brain activity can follow the rhythms of dynamic sensory stimuli, such as speech and music, a phenomenon called neural entrainment. It has been hypothesized that low-frequency neural entrainment in the neural delta and theta bands provides a potential mechanism to represent and integrate temporal information. Low-frequency neural entrainment is often studied using periodically changing stimuli and is analyzed in the frequency domain using the Fourier analysis. The Fourier analysis decomposes a periodic signal into harmonically related sinusoids. However, it is not intuitive how these harmonically related components are related to the response waveform. Here, we explain the interpretation of response harmonics, with a special focus on very low-frequency neural entrainment near 1 Hz. It is illustrated why neural responses repeating at f Hz do not necessarily generate any neural response at f Hz in the Fourier spectrum. A strong neural response at f Hz indicates that the time scales of the neural response waveform within each cycle match the time scales of the stimulus rhythm. Therefore, neural entrainment at very low frequency implies not only that the neural response repeats at f Hz but also that each period of the neural response is a slow wave matching the time scale of a f Hz sinusoid.
The Journal of Neuroscience | 2017
Shiri Makov; Omer Sharon; Nai Ding; Michal Ben-Shachar; Yuval Nir; Elana Zion Golumbic
The extent to which the sleeping brain processes sensory information remains unclear. This is particularly true for continuous and complex stimuli such as speech, in which information is organized into hierarchically embedded structures. Recently, novel metrics for assessing the neural representation of continuous speech have been developed using noninvasive brain recordings that have thus far only been tested during wakefulness. Here we investigated, for the first time, the sleeping brains capacity to process continuous speech at different hierarchical levels using a newly developed Concurrent Hierarchical Tracking (CHT) approach that allows monitoring the neural representation and processing-depth of continuous speech online. Speech sequences were compiled with syllables, words, phrases, and sentences occurring at fixed time intervals such that different linguistic levels correspond to distinct frequencies. This enabled us to distinguish their neural signatures in brain activity. We compared the neural tracking of intelligible versus unintelligible (scrambled and foreign) speech across states of wakefulness and sleep using high-density EEG in humans. We found that neural tracking of stimulus acoustics was comparable across wakefulness and sleep and similar across all conditions regardless of speech intelligibility. In contrast, neural tracking of higher-order linguistic constructs (words, phrases, and sentences) was only observed for intelligible speech during wakefulness and could not be detected at all during nonrapid eye movement or rapid eye movement sleep. These results suggest that, whereas low-level auditory processing is relatively preserved during sleep, higher-level hierarchical linguistic parsing is severely disrupted, thereby revealing the capacity and limits of language processing during sleep. SIGNIFICANCE STATEMENT Despite the persistence of some sensory processing during sleep, it is unclear whether high-level cognitive processes such as speech parsing are also preserved. We used a novel approach for studying the depth of speech processing across wakefulness and sleep while tracking neuronal activity with EEG. We found that responses to the auditory sound stream remained intact; however, the sleeping brain did not show signs of hierarchical parsing of the continuous stream of syllables into words, phrases, and sentences. The results suggest that sleep imposes a functional barrier between basic sensory processing and high-level cognitive processing. This paradigm also holds promise for studying residual cognitive abilities in a wide array of unresponsive states.