Chetan Ramaiah
University at Buffalo
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Chetan Ramaiah.
international conference on frontiers in handwriting recognition | 2012
Arti Shivram; Chetan Ramaiah; Utkarsh Porwal; Venu Govindaraju
With the explosive growth of the tablet form factor and greater availability of pen-based direct input, writer identification in online environments is increasingly becoming critical for a variety of downstream applications such as intelligent and adaptive user environments, search, retrieval, indexing and digital forensics. Extant research has approached writer identification by using writing styles as a discriminative function between writers. In contrast, we model writing styles as a shared component of an individualâs handwriting. We develop a theoretical framework for this conceptualization and model this using a three level hierarchical Bayesian model (Latent Dirichlet Allocation). In this text-independent, unsupervised model each writerâs handwriting is modeled as a distribution over finite writing styles that are shared amongst writers. We test our model on a novel online/offline handwriting dataset IBM UB 1 which is being made available to the public. Our experiments show comparable results to current benchmarks and demonstrate the efficacy of explicitly modeling shared writing styles.
international conference on document analysis and recognition | 2013
Arti Shivram; Chetan Ramaiah; Srirangaraj Setlur; Venu Govindaraju
In this paper we present a new dual mode, twin-folio structured English handwriting dataset IBM_UB_1. IBM_UB_1 is our first major release from a large multilingual handwriting corpus. Containing over 6000 pages of handwritten matter, this dataset can not only be used for unconstrained handwriting recognition, more importantly, the datasets unique twin-folio structure presents a natural fit for research on writer identification, keyword spotting, indexing and various forms of handwritten document search and retrieval. We first describe two central characteristics of the dataset - the twin-folio structure and dual modality (online/offline) - and their relevance to current research problems. Secondly, we describe the dataset, its collection and construction, and provide key descriptive statistics. Finally, we evaluate the dataset on two different research domains - handwriting recognition and writer identification - and present related experimental results.
IET Biometrics | 2013
Arti Shivram; Chetan Ramaiah; Vengatesan Govindaraju
With the explosive growth of the tablet form factor and greater availability of pen-based direct input, online writer identification is increasingly becoming critical for person identification, digital forensics as well as downstream applications such as intelligent and adaptive user environments, search, indexing and retrieval of handwritten documents. Extant research has approached writer identification by using writing styles as a discriminative function between writers. In contrast, the authors model writing styles as a shared component of an individuals handwriting. They develop a theoretical framework for this conceptualisation and model it by using a three-level hierarchical Bayesian model (Latent Dirichlet Allocation). In this text-independent, unsupervised model each writers handwriting is modelled as a distribution over finite writing styles that are shared among writers. They test their model on a new online handwriting dataset IBM_UB_1 and also offer benchmark comparisons by using the IAM-OnDB database. Their experiments show comparable results to the current benchmarks and demonstrate the efficacy of explicitly modelling the shared writing styles.
document analysis systems | 2012
Chetan Ramaiah; Utkarsh Porwal; Venu Govindaraju
Accent in handwriting can be defined as the influence of a writers native script on his/her writing style in another script. In this paper, we approach the problem of detecting the existence of accents in handwriting. We approach this problem using two sets of writers, those who can write only in English, and the other set being multilingual writers who can also write in English. We learn the writing styles that are predominant in each set and use it as features in classification. Latent Dirichlet Allocation is used to learn the distribution over writing styles. Experimental results suggest the existence of accents in handwriting.
international conference on pattern recognition | 2014
Chetan Ramaiah; Réjean Plamondonm; Venu Govindaraju
Popular CAPTCHA systems consist of garbled printed text character images with significant distortions and noise. It is believed that humans have little difficulty in deciphering the text, whereas automated systems are foiled by the added noise and distortion. However, in recent years, several text based CAPTCHAs have been reported as broken, that is, automated systems can identify the text in the displayed image with a reasonable amount of success. An extension to the text based CAPTCHA concept is to utilize unconstrained handwritten text, which is still considered to be a challenging problem for automated systems. In this work, we present a automated handwritten CAPTCHA generation system by adding distortions to the Sigma-Lognormal representation of a handwritten word sample. In addition, several noise models are also considered. We perform experiments on the UNIPEN dataset and demonstrate the efficacy of the approach.
international conference on frontiers in handwriting recognition | 2012
Utkarsh Porwal; Chetan Ramaiah; Arti Shivram; Venu Govindaraju
Availability of sufficient labeled data is key to the performance of any learning algorithm. However, in document analysis obtaining the large amount of labeled data is difficult. Scarcity of labeled samples is often a main bottleneck in the performance of algorithms for document analysis. However, unlabeled data samples are present in abundance. We propose a semi supervised framework for writer identification for offline handwritten documents that leverages the information hidden in the unlabeled samples. The task of writer identification is a complex one and our framework tries to model the nuances of handwriting with the use of structural learning. This framework models the complexity of learning problem by selecting the best hypotheses space by breaking the main task into several sub tasks. All the hypotheses spaces pertaining to the sub tasks will be used for the best model selection by retrieving a common optimal sub structure that has high correspondence with all of the candidate hypotheses spaces. We have used publically available IAM data set to show the efficacy of our method.
document recognition and retrieval | 2012
Chetan Ramaiah; Gaurav Kumar; Venu Govindaraju
Handwriting styles are constantly changing over time. We approach the novel problem of estimating the approximate age of Historical Handwritten Documents using Handwriting styles. This system will have many applications in handwritten document processing engines where specialized processing techniques can be applied based on the estimated age of the document. We propose to learn a distribution over styles across centuries using Topic Models and to apply a classifier over weights learned in order to estimate the approximate age of the documents. We present a comparison of different distance metrics such as Euclidean Distance and Hellinger Distance within this application.
international conference on pattern recognition | 2014
Arti Shivram; Chetan Ramaiah; Venu Govindaraju
A key factor in building effective writer identification/verification systems is the amount of data required to build the underlying models. In this research we systematically examine data sufficiency bounds for two broad approaches to online writer identification -- feature space models vs. writer-style space models. We report results from 40 experiments conducted on two publicly available datasets and also test identification performance for the target models using two different feature functions. Our findings show that the writer-style space model gives higher identification performance for a given level of data and further, achieves high performance levels with lesser data costs. This model appears to require as less as 20 words per page to achieve identification performance close to 80% and reaches more than 90% accuracy with higher levels of data enrollment.
document analysis systems | 2014
Chetan Ramaiah; Venu Govindaraju
Writer identification is the process of determining the author of a handwritten specimen by utilizing characteristics inherent in the sample. In this work, we apply the concept of accents in handwriting to introduce a novel perspective for writer identification. Analogous to speech, accents in handwriting can be defined as distinctive writing quirks that are unique to a group of people sharing a common native script. Specifically, we postulate that a group of people with a common native script will share certain traits in their handwriting style that are exposed when they write in a different script. We propose a hierarchical framework for the writer identification task, wherein, we first identify the accent of the writer. In the next step, we perform writer identification based on the selected accent. This framework reduces the complexity of the classification task by reducing the number of classes at the prediction stage. Experiments are performed on the UNIPEN dataset and the results lend credibility to our model.
international conference on document analysis and recognition | 2013
Chetan Ramaiah; Arti Shivram; Venu Govindaraju
Accent in speech is defined as a distinctive mode of pronunciation that is unique to a geographical region. In a similar way, we define accent in handwriting as distinctive writing characteristics that are unique to a group of people sharing a common native script. In other words, we postulate that a group of people with a common native script will share certain traits in their handwriting that can be ascertained when they write in a different script. In this paper, we establish the existence of accents in handwriting using a hierarchical Bayesian framework. We then demonstrate that the unique trait in handwriting that arises out of the writers native script is indigenous to that script, which is perceivable when writing in a different script. As a consequence, the ability to identify a persons native script based on the persons handwriting style in another script is introduced. We validated the approach by performing experiments on the UNIPEN dataset, and the experiments lend credibility to our model.