Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ozlem Kalinli is active.

Publication


Featured researches published by Ozlem Kalinli.


IEEE Transactions on Audio, Speech, and Language Processing | 2009

Prominence Detection Using Auditory Attention Cues and Task-Dependent High Level Information

Ozlem Kalinli; Shrikanth Narayanan

Auditory attention is a complex mechanism that involves the processing of low-level acoustic cues together with higher level cognitive cues. In this paper, a novel method is proposed that combines biologically inspired auditory attention cues with higher level lexical and syntactic information to model task-dependent influences on a given spoken language processing task. A set of low-level multiscale features (intensity, frequency contrast, temporal contrast, orientation, and pitch) is extracted in parallel from the auditory spectrum of the sound based on the processing stages in the central auditory system to create feature maps that are converted to auditory gist features that capture the essence of a sound scene. The auditory attention model biases the gist features in a task-dependent way to maximize target detection in a given scene. Furthermore, the top-down task-dependent influence of lexical and syntactic information is incorporated into the model using a probabilistic approach. The lexical information is incorporated by using a probabilistic language model, and the syntactic knowledge is modeled using part-of-speech (POS) tags. The combined model is tested on automatically detecting prominent syllables in speech using the BU Radio News Corpus. The model achieves 88.33% prominence detection accuracy at the syllable level and 85.71% accuracy at the word level. These results compare well with reported human performance on this task.


multimedia signal processing | 2009

Saliency-driven unstructured acoustic scene classification using latent perceptual indexing

Ozlem Kalinli; Shiva Sundaram; Shrikanth Narayanan

Automatic acoustic scene classification of real life, complex and unstructured acoustic scenes is a challenging task as the number of acoustic sources present in the audio stream are unknown and overlapping in time. In this work, we present a novel approach to classification such unstructured acoustic scenes. Motivated by the bottom-up attention model of the human auditory system, salient events of an audio clip are extracted in an unsupervised manner and presented to the classification system. Similar to latent semantic indexing of text documents, the classification system uses unit-document frequency measure to index the clip in a continuous, latent space. This allows for developing a completely class-independent approach to audio classification. Our results on the BBC sound effects library indicates that using the saliency-driven attention selection approach presented in this paper, 17.5% relative improvement can be obtained in frame-based classification and 25% relative improvement can be obtained using the latent audio indexing approach.


conference of the international speech communication association | 2016

Analysis of Multi-Lingual Emotion Recognition Using Auditory Attention Features.

Ozlem Kalinli

In this paper, we build mono-lingual and cross-lingual emotion recognition systems and report performance on English and German databases. The emotion recognition system uses biologically inspired auditory attention features together with a neural network for learning the mapping between features and emotion classes. We first build mono-lingual systems for both Berlin Database of Emotional Speech (EMO-DB) and LDC’s Emotional Prosody (Emo-Prosody) and achieve 82.7% and 56.7% accuracy for five class emotion classification (neutral, sad, angry, happy, and boredom) using leave-one-speakerout cross validation. When tested with cross-lingual systems, the five-class emotion recognition accuracy drops to 55.1% and 41.4% accuracy for EMO-DB and Emo-Prosody, respectively. Finally, we build a bilingual emotion recognition system and report experimental results and their analysis. Bilingual system performs close to the performance of individual mono-lingual systems.


conference of the international speech communication association | 2007

A saliency-based auditory attention model with applications to unsupervised prominent syllable detection in speech.

Ozlem Kalinli; Shrikanth Narayanan


international conference on acoustics, speech, and signal processing | 2008

A top-down auditory attention model for learning task dependent influences on prominence detection in speech

Ozlem Kalinli; Shrikanth Narayanan


conference of the international speech communication association | 2013

Combination of auditory attention features with phone posteriors for better automatic phoneme segmentation.

Ozlem Kalinli


conference of the international speech communication association | 2012

Automatic Phoneme Segmentation Using Auditory Attention Features.

Ozlem Kalinli


conference of the international speech communication association | 2009

Continuous Speech Recognition Using Attention Shift Decoding with Soft Decision

Ozlem Kalinli; Shrikanth Narayanan


conference of the international speech communication association | 2011

Syllable Segmentation of Continuous Speech Using Auditory Attention Cues.

Ozlem Kalinli


conference of the international speech communication association | 2008

Combining task-dependent information with auditory attention cues for prominence detection in speech.

Ozlem Kalinli; Shrikanth Narayanan

Collaboration


Dive into the Ozlem Kalinli's collaboration.

Top Co-Authors

Avatar

Shrikanth Narayanan

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Mahnoosh Mehrabani

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Ruxin Chen

Sony Computer Entertainment

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge