Rohit Prasad | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rohit Prasad is active.

Explore More

Publication

Featured researches published by Rohit Prasad.

computer vision and pattern recognition | 2012

Multimodal feature fusion for robust event detection in web videos

Pradeep Natarajan; Shuang Wu; Shiv Naga Prasad Vitaladevuni; Xiaodan Zhuang; Stavros Tsakalidis; Unsang Park; Rohit Prasad; Premkumar Natarajan

Combining multiple low-level visual features is a proven and effective strategy for a range of computer vision tasks. However, limited attention has been paid to combining such features with information from other modalities, such as audio and videotext, for large scale analysis of web videos. In our work, we rigorously analyze and combine a large set of low-level features that capture appearance, color, motion, audio and audio-visual co-occurrence patterns in videos. We also evaluate the utility of high-level (i.e., semantic) visual information obtained from detecting scene, object, and action concepts. Further, we exploit multimodal information by analyzing available spoken and videotext content using state-of-the-art automatic speech recognition (ASR) and videotext recognition systems. We combine these diverse features using a two-step strategy employing multiple kernel learning (MKL) and late score level fusion methods. Based on the TRECVID MED 2011 evaluations for detecting 10 events in a large benchmark set of ~45000 videos, our system showed the best performance among the 19 international teams.

document analysis systems | 2010

Gabor features for offline Arabic handwriting recognition

Jin Chen; Huaigu Cao; Rohit Prasad; Anurag Bhardwaj; Premkumar Natarajan

Many feature extraction approaches for off-line handwriting recognition (OHR) rely on accurate binarization of gray-level images. However, high-quality binarization of most real-world documents is extremely difficult due to varying characteristics of noises artifacts common in such documents. Unlike most of these features, Gabor features do not require binarization of the document images, and thus are likely to be more robust to noises in document images. To demonstrate the efficacy of our proposed Gabor features, we perform subword recognition for off-line Arabic handwritten images using Support Vector Machines (SVM). We also compare the recognition performance with other binarization based features which have been proven to be effective in capturing shape characteristics of handwritten Arabic subwords, such as GSC (a set of gradient, structure, and concavity features) and skeleton based Graph features. Our preliminary experimental results show that Gabor features outperform Graph features and are slightly better than GSC features for Arabic subword recognition. In addition, by combining Gabor and GSC features, we obtain a significant reduction in classification error rate over using GSC or Gabor features alone.

IEEE Transactions on Audio, Speech, and Language Processing | 2006

Advances in transcription of broadcast news and conversational telephone speech within the combined EARS BBN/LIMSI system

Spyridon Matsoukas; Jean-Luc Gauvain; Gilles Adda; Thomas Colthurst; Chia-Lin Kao; Owen Kimball; Lori Lamel; Fabrice Lefèvre; Jeff Z. Ma; John Makhoul; Long Nguyen; Rohit Prasad; Richard M. Schwartz; Holger Schwenk; Bing Xiang

This paper describes the progress made in the transcription of broadcast news (BN) and conversational telephone speech (CTS) within the combined BBN/LIMSI system from May 2002 to September 2004. During that period, BBN and LIMSI collaborated in an effort to produce significant reductions in the word error rate (WER), as directed by the aggressive goals of the Effective, Affordable, Reusable, Speech-to-text [Defense Advanced Research Projects Agency (DARPA) EARS] program. The paper focuses on general modeling techniques that led to recognition accuracy improvements, as well as engineering approaches that enabled efficient use of large amounts of training data and fast decoding architectures. Special attention is given on efforts to integrate components of the BBN and LIMSI systems, discussing the tradeoff between speed and accuracy for various system combination strategies. Results on the EARS progress test sets show that the combined BBN/LIMSI system achieved relative reductions of 47% and 51% on the BN and CTS domains, respectively

international conference on acoustics, speech, and signal processing | 2004

Speech recognition in multiple languages and domains: the 2003 BBN/LIMSI EARS system

Richard M. Schwartz; Thomas Colthurst; Nicolae Duta; Herbert Gish; Rukmini Iyer; Chia-Lin Kao; Daben Liu; Owen Kimball; Jeff Z. Ma; John Makhoul; Spyros Matsoukas; Long Nguyen; Mohammed Noamany; Rohit Prasad; Bing Xiang; Dongxin Xu; Jean-Luc Gauvain; Lori Lamel; Holger Schwenk; Gilles Adda; Langzhou Chen

We report on the results of the first evaluations for the BBN/LIMSI system under the new DARPA EARS program. The evaluations were carried out for conversational telephone speech (CTS) and broadcast news (BN) for three languages: English, Mandarin, and Arabic. In addition to providing system descriptions and evaluation results, the paper highlights methods that worked well across the two domains and those few that worked well on one domain but not the other. For the BN evaluations, which had to be run under 10 times real-time, we demonstrated that a joint BBN/LIMSI system with a time constraint achieved better results than either system alone.

SACH'06 Proceedings of the 2006 conference on Arabic and Chinese handwriting recognition | 2006

Multi-lingual offline handwriting recognition using hidden Markov models: a script-independent approach

Prem Natarajan; Shirin Saleem; Rohit Prasad; Ehry MacRostie; Krishna Subramanian

This paper introduces a script-independent methodology for multilingual offline handwriting recognition (OHR) based on the use of Hidden Markov Models (HMM). The OHR methodology extends our script-independent approach for OCR of machine-printed text images. The feature extraction, training, and recognition components of the system are all designed to be script independent. The HMM training and recognition components are based on our Byblos continuous speech recognition system. The HMM parameters are estimated automatically from the training data, without the need for laborious hand-written rules. The system does not require pre-segmentation of the data, neither at the word level nor at the character level. Thus, the system can handle languages with cursive handwritten scripts in a straightforward manner. The script independence of the system is demonstrated with experimental results in three scripts that exhibit significant differences in glyph characteristics: English, Chinese, and Arabic. Results from an initial set of experiments are presented to demonstrate the viability of the proposed methodology.

international conference on document analysis and recognition | 2009

Improvements in BBN's HMM-Based Offline Arabic Handwriting Recognition System

Shirin Saleem; Huaigu Cao; Krishna Subramanian; Matin Kamali; Rohit Prasad; Premkumar Natarajan

Offline handwriting recognition of free-flowing Arabic text is a challenging task due to the plethora of factors that contribute to the variability in the data. In this paper, we address some of these sources of variability, and present experimental results on a large corpus of handwritten documents. Specific techniques such as the application of context-dependent Hidden Markov Models (HMMs) for the cursive Arabic script, unsupervised adaptation to account for the stylistic variations across scribes, and image pre-processing to remove ruled-lines are explored. In particular, we proposed a novel integration of structural features in the HMM framework which exclusively results in a 9% relative improvement in performance. Overall, we demonstrate a relative reduction of 17% in word error rate over our baseline Arabic handwriting recognition system.

international conference on acoustics, speech, and signal processing | 2002

A scalable architecture for Directory Assistance automation

Premkumar Natarajan; Rohit Prasad; Richard M. Schwartz; John Makhoul

We present a novel architecture for providing automated telephone Directory Assistance (DA). The architecture couples a large-vocabulary, statistical n-gram, speech recognition engine with a statistical retrieval system. The use of a statistical n-gram allows for the recognition of unconstrained spoken queries while the statistical retrieval engine allows for an inexact match between a particular spoken query and the training data. Allowing for unconstrained recognition and an inexact match provides the framework for high levels of automation. Once the retrieval engine returns a ranked set of frequently requested telephone numbers (FRN), the rejection module uses a classifier to compute a confidence-like score that is used to make the automation decision. With actual customer calls into an operational, automated DA call center and an FRN set size of 25000 numbers, the new architecture is capable of delivering more than 17% correct automation at a false accept rate of 0.76%.

international conference on pattern recognition | 2008

Improvements in hidden Markov model based Arabic OCR

Rohit Prasad; Shirin Saleem; Matin Kamali; Ralf Meermeier; Premkumar Natarajan

This paper describes recent advances in hidden Markov model (HMM) based OCR for machine-printed arabic documents. A combination of script-independent and script-specific techniques are applied to glyph models and language models (LM). Script-independent techniques we applied are higher order n-gram LMs for N-best rescoring and discriminative estimation of glyph HMMs. Arabic specific techniques include the use of context-dependent HMMs for glyph modeling and Parts-of-Arabic-Words in language modeling. We present experimental results that demonstrate a 40% relative reduction in word error rate over the baseline configuration on a corpus of machine-printed Arabic documents.

international conference on image processing | 2011

Automated image quality assessment for camera-captured OCR

Xujun Peng; Huaigu Cao; Krishna Subramanian; Rohit Prasad; Prem Natarajan

Camera-captured optical character recognition (OCR) is a challenging area because of artifacts introduced during image acquisition with consumer-domain hand-held and Smart phone cameras. Critical information is lost if the user does not get immediate feedback on whether the acquired image meets the quality requirements for OCR. To avoid such information loss, we propose a novel automated image quality assessment method that predicts the degree of degradation on OCR. Unlike other image quality assessment algorithms which only deal with blurring, the proposed method quantifies image quality degradation across several artifacts and accurately predicts the impact on OCR error rate. We present evaluation results on a set of machine-printed document images which have been captured using digital cameras with different degradations.

Computer Speech & Language | 2013

Batch-mode semi-supervised active learning for statistical machine translation

Sankaranarayanan Ananthakrishnan; Rohit Prasad; David Stallard; Prem Natarajan

The development of high-performance statistical machine translation (SMT) systems is contingent on the availability of substantial, in-domain parallel training corpora. The latter, however, are expensive to produce due to the labor-intensive nature of manual translation. We propose to alleviate this problem with a novel, semi-supervised, batch-mode active learning strategy that attempts to maximize in-domain coverage by selecting sentences, which represent a balance between domain match, translation difficulty, and batch diversity. Simulation experiments on an English-to-Pashto translation task show that the proposed strategy not only outperforms the random selection baseline, but also traditional active selection techniques based on dissimilarity to existing training data.

Explore More