Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Oded Ghitza is active.

Publication


Featured researches published by Oded Ghitza.


Frontiers in Psychology | 2011

Linking speech perception and neurophysiology: speech decoding guided by cascaded oscillators locked to the input rhythm

Oded Ghitza

The premise of this study is that current models of speech perception, which are driven by acoustic features alone, are incomplete, and that the role of decoding time during memory access must be incorporated to account for the patterns of observed recognition phenomena. It is postulated that decoding time is governed by a cascade of neuronal oscillators, which guide template-matching operations at a hierarchy of temporal scales. Cascaded cortical oscillations in the theta, beta, and gamma frequency bands are argued to be crucial for speech intelligibility. Intelligibility is high so long as these oscillations remain phase locked to the auditory input rhythm. A model (Tempo) is presented which is capable of emulating recent psychophysical data on the intelligibility of speech sentences as a function of “packaging” rate (Ghitza and Greenberg, 2009). The data show that intelligibility of speech that is time-compressed by a factor of 3 (i.e., a high syllabic rate) is poor (above 50% word error rate), but is substantially restored when the information stream is re-packaged by the insertion of silent gaps in between successive compressed-signal intervals – a counterintuitive finding, difficult to explain using classical models of speech perception, but emerging naturally from the Tempo architecture.


Phonetica | 2009

On the Possible Role of Brain Rhythms in Speech Perception: Intelligibility of Time-Compressed Speech with Periodic and Aperiodic Insertions of Silence

Oded Ghitza; Steven Greenberg

This study was motivated by the prospective role played by brain rhythms in speech perception. The intelligibility – in terms of word error rate – of natural-sounding, synthetically generated sentences was measured using a paradigm that alters speech-energy rhythm over a range of frequencies. The material com-prised 96 semantically unpredictable sentences, each approximately 2 s long (6–8 words per sentence), generated by a high-quality text-to-speech (TTS) synthesis engine. The TTS waveform was time-compressed by a factor of 3, creating a signal with a syllable rhythm three times faster than the original, and whose intel-ligibility is poor (<50% words correct). A waveform with an artificial rhythm was produced by automatically segmenting the time-compressed waveform into consecutive 40-ms fragments, each followed by a silent interval. The parameters varied were the length of the silent interval (0–160 ms) and whether the lengths of silence were equal (‘periodic’) or not (‘aperiodic’). The performance curve (word error rate as a function of mean duration of silence) was U-shaped. The lowest word error rate (i.e., highest intelligibility) occurred when the silence was 80 ms long and inserted periodically. This is also the condition for which word error rate increased when the silence was inserted aperiodically. These data are consistent with a model (TEMPO) in which low-frequency brain rhythms affect the ability to decode the speech signal. In TEMPO, optimum intelligibility is achieved when the syllable rhythm is within the range of the high theta-frequency brain rhythms (6–12 Hz), comparable to the rate at which segments and syllables are articulated in conversational speech.


Frontiers in Psychology | 2012

On the Role of Theta-Driven Syllabic Parsing in Decoding Speech: Intelligibility of Speech with a Manipulated Modulation Spectrum

Oded Ghitza

Recent hypotheses on the potential role of neuronal oscillations in speech perception propose that speech is processed on multi-scale temporal analysis windows formed by a cascade of neuronal oscillators locked to the input pseudo-rhythm. In particular, Ghitza (2011) proposed that the oscillators are in the theta, beta, and gamma frequency bands with the theta oscillator the master, tracking the input syllabic rhythm and setting a time-varying, hierarchical window structure synchronized with the input. In the study described here the hypothesized role of theta was examined by measuring the intelligibility of speech with a manipulated modulation spectrum. Each critical-band signal was manipulated by controlling the degree of temporal envelope flatness. Intelligibility of speech with critical-band envelopes that are flat is poor; inserting extra information, restricted to the input syllabic rhythm, markedly improves intelligibility. It is concluded that flattening the critical-band envelopes prevents the theta oscillator from tracking the input rhythm, hence the disruption of the hierarchical window structure that controls the decoding process. Reinstating the input-rhythm information revives the tracking capability, hence restoring the synchronization between the window structure and the input, resulting in the extraction of additional information from the flat modulation spectrum.


Frontiers in Psychology | 2013

The theta-syllable: a unit of speech information defined by cortical function

Oded Ghitza

A recent commentary (Oscillators and syllables: a cautionary note. Cummins, 2012) questions the validity of a class of speech perception models inspired by the possible role of neuronal oscillations in decoding speech (e.g., Ghitza, 2011; Giraud and Poeppel, 2012). In arguing against the approach, Cummins raises a cautionary flag “from a phoneticians point of view.” Here we respond to his arguments from an auditory processing viewpoint, referring to a phenomenological model of Ghitza (2011) taken as a representative of the criticized approach. We shall conclude by proposing the theta-syllable as an information unit defined by cortical function—an alternative to the conventional, ambiguously defined syllable. In the large context, the resulting discussion debate should be viewed as a subtext of acoustic and auditory phonetics vs. articulatory and motor theories of speech reception.


Frontiers in Human Neuroscience | 2013

Neuronal oscillations and speech perception: critical-band temporal envelopes are the essence

Oded Ghitza; Anne-Lise Giraud; David Poeppel

A recent opinion article (Neural oscillations in speech: do not be enslaved by the envelope. Obleser et al., 2012) questions the validity of a class of speech perception models inspired by the possible role of neuronal oscillations in decoding speech (e.g., Ghitza, 2011; Giraud and Poeppel, 2012). The authors criticize, in particular, what they see as an over-emphasis of the role of temporal speech envelope information, and an over-emphasis of entrainment to the input rhythm while neglecting the role of top-down processes in modulating the entrainment of neuronal oscillations. Here we respond to these arguments, referring to the phenomenological model of Ghitza (2011), taken as a representative of the criticized approach.


PLOS Computational Biology | 2009

Representation of time-varying stimuli by a network exhibiting oscillations on a faster time scale.

Maoz Shamir; Oded Ghitza; Steven Epstein; Nancy Kopell

Sensory processing is associated with gamma frequency oscillations (30–80 Hz) in sensory cortices. This raises the question whether gamma oscillations can be directly involved in the representation of time-varying stimuli, including stimuli whose time scale is longer than a gamma cycle. We are interested in the ability of the system to reliably distinguish different stimuli while being robust to stimulus variations such as uniform time-warp. We address this issue with a dynamical model of spiking neurons and study the response to an asymmetric sawtooth input current over a range of shape parameters. These parameters describe how fast the input current rises and falls in time. Our network consists of inhibitory and excitatory populations that are sufficient for generating oscillations in the gamma range. The oscillations period is about one-third of the stimulus duration. Embedded in this network is a subpopulation of excitatory cells that respond to the sawtooth stimulus and a subpopulation of cells that respond to an onset cue. The intrinsic gamma oscillations generate a temporally sparse code for the external stimuli. In this code, an excitatory cell may fire a single spike during a gamma cycle, depending on its tuning properties and on the temporal structure of the specific input; the identity of the stimulus is coded by the list of excitatory cells that fire during each cycle. We quantify the properties of this representation in a series of simulations and show that the sparseness of the code makes it robust to uniform warping of the time scale. We find that resetting of the oscillation phase at stimulus onset is important for a reliable representation of the stimulus and that there is a tradeoff between the resolution of the neural representation of the stimulus and robustness to time-warp.


Journal of the Acoustical Society of America | 1993

Adequacy of auditory models to predict human internal representation of speech sounds

Oded Ghitza

A long-standing question that arises when studying a particular auditory model is how to evaluate its performance. More precisely, it is of interest to evaluate to what extent the model representation can describe the actual human internal representation. Here, this question is addressed in the context of speech perception. That is, given a speech representation based on the auditory system, to what extent can it preserve phonetic information that is perceptually relevant? To answer this question, a diagnostic system has been developed that simulates the psychophysical procedure used in the standard Diagnostic-Rhyme Test (DRT). In the psychophysical procedure, the subject has all the cognitive information needed for the discrimination task, a priori. Hence, errors in discrimination are due mainly to inaccuracies in the auditory representation of the stimulus. In the simulation, the human observer is replaced by an array of recognizers, one for each pair of words in the DRT database. The recognizer used [Ghitza and Sondhi, J. Acoust. Soc. Am. Suppl. 1 87, S107 (1990)] was designed to keep errors due to the recognition procedure to a minimum, so that the overall detected errors are due mainly to inaccuracies in the front-end representation. The system provides detailed diagnostics that show the error distributions among six phonetically distinctive features. Results are given for the behavior of two speech analysis methods, a representation based on the auditory system and one based on the Fourier power spectrum, in quiet and in a noisy environment. These results are compared with psychophysical results for the same database.


Frontiers in Psychology | 2014

Behavioral evidence for the role of cortical θ oscillations in determining auditory channel capacity for speech

Oded Ghitza

Studies on the intelligibility of time-compressed speech have shown flawless performance for moderate compression factors, a sharp deterioration for compression factors above three, and an improved performance as a result of “repackaging”—a process of dividing the time-compressed waveform into fragments, called packets, and delivering the packets in a prescribed rate. This intricate pattern of performance reflects the reliability of the auditory system in processing speech streams with different information transfer rates; the knee-point of performance defines the auditory channel capacity. This study is concerned with the cortical computation principle that determines channel capacity. Oscillation-based models of speech perception hypothesize that the speech decoding process is guided by a cascade of oscillations with theta as “master,” capable of tracking the input rhythm, with the theta cycles aligned with the intervocalic speech fragments termed θ-syllables; intelligibility remains high as long as theta is in sync with the input, and it sharply deteriorates once theta is out of sync. In the study described here the hypothesized role of theta was examined by measuring the auditory channel capacity of time-compressed speech undergone repackaging. For all speech speeds tested (with compression factors of up to eight), packaging rate at capacity equals 9 packets/s—aligned with the upper limit of cortical theta, θmax (about 9 Hz)—and the packet duration equals the duration of one uncompressed θ-syllable divided by the compression factor. The alignment of both the packaging rate and the packet duration with properties of cortical theta suggests that the auditory channel capacity is determined by theta. Irrespective of speech speed, the maximum information transfer rate through the auditory channel is the information in one uncompressed θ-syllable long speech fragment per one θmax cycle. Equivalently, the auditory channel capacity is 9 θ-syllables/s.


Language, cognition and neuroscience | 2017

Acoustic-driven delta rhythms as prosodic markers

Oded Ghitza

ABSTRACT Oscillation-based models of speech perception postulate a cortical computation principle by which decoding is performed within a time-varying window structure, synchronised with the input on multiple time scales. The windows are generated by a segmentation process, implemented by a cascade of oscillators. This paper tests the hypothesis that prosodic segmentation is driven by a “flexible” (in contrast to autonomous, “rigid”) oscillator in the delta range (0.5–3 Hz) by tracking prosodic rhythms, such that intelligibility is impaired when the ability of this oscillator to synchronise to these rhythms is impaired. In setting phrasal boundaries, both bottom-up acoustic-driven and top-down context-invoked processes interact in a manner that is difficult to decompose. The present experiments used context-free random-digit strings in order to focus exclusively on bottom-up processes. Two experiments are reported. Listeners performed a target identification task, listening to stimuli with prescribed chunking patterns (Experiment I) or chunking rates (Experiment II), followed by a target. Irrespective of the chunking pattern, performance is high only for targets inside of a chunk, pointing to the benefit of acoustic prosodic segmentation in digit retrieval. Importantly, performance remains high as long as the chunking rate is within the frequency range of neuronal delta, but sharply deteriorates for higher rates. This data provides psychophysical evidence for the role of acoustic-driven segmentation, with flexible delta oscillations at the core, in digit retrieval.


Journal of the Acoustical Society of America | 2014

Refining a model of hearing impairment using speech psychophysics

Morten Løve Jepsen; Torsten Dau; Oded Ghitza

The premise of this study is that models of hearing, in general, and of individual hearing impairment, in particular, can be improved by using speech test results as an integral part of the modeling process. A conceptual iterative procedure is presented which, for an individual, considers measures of sensitivity, cochlear compression, and phonetic confusions using the Diagnostic Rhyme Test (DRT) framework. The suggested approach is exemplified by presenting data from three hearing-impaired listeners and results obtained with models of the hearing impairment of the individuals. The work reveals that the DRT data provide valuable information of the damaged periphery and that the non-speech and speech data are complementary in obtaining the best model for an individual.

Collaboration


Dive into the Oded Ghitza's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Morten Løve Jepsen

Technical University of Denmark

View shared research outputs
Top Co-Authors

Avatar

Torsten Dau

Technical University of Denmark

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge