Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Shahram Kalantari is active.

Publication


Featured researches published by Shahram Kalantari.


Computer Speech & Language | 2017

Cross database audio visual speech adaptation for phonetic spoken term detection

Shahram Kalantari; David Dean; Sridha Sridharan

We show that the use of visual information helps both phone recognition and spoken term detection accuracy.Fused HMM adaptation could be utilized to benefit from multiple databases when training audio visual phone modelsAn additional audio adaptation improves cross-database training accuracy for phone recognition and spoken term detection.A post training step can be used to update all HMM parameters and further improve phone recognition accuracy Spoken term detection (STD), the process of finding all occurrences of a specified search term in a large amount of speech segments, has many applications in multimedia search and retrieval of information. It is known that use of video information in the form of lip movements can improve the performance of STD in the presence of audio noise. However, research in this direction has been hampered by the unavailability of large annotated audio visual databases for development. We propose a novel approach to develop audio visual spoken term detection when only a small (low resource) audio visual database is available for development. First, cross database training is proposed as a novel framework using the fused hidden Markov modeling (HMM) technique, which is used to train an audio model using extensive large and publicly available audio databases; then it is adapted to the visual data of the given audio visual database. This approach is shown to perform better than standard HMM joint-training method and also improves the performance of spoken term detection when used in the indexing stage. In another attempt, the external audio models are first adapted to the audio data of the given audio visual database and then they are adapted to the visual data. This approach also improves both phone recognition and spoken term detection accuracy. Finally, the cross database training technique is used as HMM initialization, and an extra parameter re-estimation step is applied on the initialized models using Baum Welch technique. The proposed approaches for audio visual model training have allowed for benefiting from both large extensive out of domain audio databases that are available and the small audio visual database that is given for development to create more accurate audio-visual models.


acm multimedia | 2015

Acoustic Adaptation in Cross Database Audio Visual SHMM Training for Phonetic Spoken Term Detection

Shahram Kalantari; David Dean; Sridha Sridharan; Houman Ghaemmaghami; Clinton Fookes

Visual information in the form of lip movements of the speaker has been shown to improve the performance of speech recognition and search applications. In our previous work, we proposed cross database training of synchronous hidden Markov models (SHMMs) to make use of external large and publicly available audio databases in addition to the relatively small given audio visual database. In this work, the cross database training approach is improved by performing an additional audio adaptation step, which enables audio visual SHMMs to benefit from audio observations of the external audio models before adding visual modality to them. The proposed approach outperforms the baseline cross database training approach in clean and noisy environments in terms of phone recognition accuracy as well as spoken term detection (STD) accuracy.


conference of the international speech communication association | 2015

Complete-linkage clustering for voice activity detection in audio and visual speech.

Houman Ghaemmaghami; David Dean; Shahram Kalantari; Sridha Sridharan; Clinton Fookes


Science & Engineering Faculty | 2015

Incorporating visual information for spoken term detection

Shahram Kalantari; David Dean; Sridha Sridharan


Science & Engineering Faculty | 2015

Cross database training of audio-visual hidden Markov models for phone recognition

Shahram Kalantari; David Dean; Houman Ghaemmaghami; Sridha Sridharan; Clinton Fookes


european signal processing conference | 2014

Topic Dependent Language Modelling for Spoken Term Detection

Shahram Kalantari; David Dean; Sridha Sridharan; Roy Wallace


Science & Engineering Faculty | 2013

Visual front-end wars : Viola-Jones face detector vs Fourier Lucas-Kanade

Shahram Kalantari; Rajitha Navarathna; David Dean; Sridha Sridharan


Science & Engineering Faculty | 2014

Phonetic spoken term search using topic information

Shahram Kalantari; David Dean; Sridha Sridharan


Institute for Future Environments; Science & Engineering Faculty | 2014

Rescaling clustering trees using impact ratios for robust hierarchical speaker clustering

Houman Ghaemmaghami; David Dean; Shahram Kalantari; Sridha Sridharan


Archive | 2017

METHOD AND SYSTEM FOR AUTOMATICALLY DIARISING A SOUND RECORDING

Houman Ghaemmaghami; Shahram Kalantari; David Dean; Sridha Sridharan

Collaboration


Dive into the Shahram Kalantari's collaboration.

Top Co-Authors

Avatar

David Dean

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Sridha Sridharan

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Houman Ghaemmaghami

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Clinton Fookes

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Rajitha Navarathna

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Roy Wallace

Queensland University of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge