Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sridha Sridharan is active.

Publication


Featured researches published by Sridha Sridharan.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2005

Texture for script identification

Andrew Busch; Wageeh W. Boles; Sridha Sridharan

The problem of determining the script and language of a document image has a number of important applications in the field of document analysis, such as indexing and sorting of large collections of such images, or as a precursor to optical character recognition (OCR). In this paper, we investigate the use of texture as a tool for determining the script of a document image, based on the observation that text has a distinct visual texture. An experimental evaluation of a number of commonly used texture features is conducted on a newly created script database, providing a qualitative measure of which features are most appropriate for this task. Strategies for improving classification results in situations with limited training data and multiple font types are also proposed.


digital image computing: techniques and applications | 2009

Crowd Counting Using Multiple Local Features

David Ryan; Simon Denman; Clinton Fookes; Sridha Sridharan

In public venues, crowd size is a key indicator of crowd safety and stability. Crowding levels can be detected using holistic image features, however this requires a large amount of training data to capture the wide variations in crowd distribution. If a crowd counting algorithm is to be deployed across a large number of cameras, such a large and burdensome training requirement is far from ideal. In this paper we propose an approach that uses local features to count the number of people in each foreground blob segment, so that the total crowd estimate is the sum of the group sizes. This results in an approach that is scalable to crowd volumes not seen in the training data, and can be trained on a very small data set. As a local approach is used, the proposed algorithm can easily be used to estimate crowd density throughout different regions of the scene and be used in a multi-camera environment. A unique localised approach to ground truth annotation reduces the required training data is also presented, as a localised approach to crowd counting has different training requirements to a holistic one. Testing on a large pedestrian database compares the proposed technique to existing holistic techniques and demonstrates improved accuracy, and superior performance when test conditions are unseen in the training set, or a minimal training set is used.


systems man and cybernetics | 2011

Automatically Detecting Pain in Video Through Facial Action Units

Patrick Lucey; Jeffrey F. Cohn; Iain A. Matthews; Simon Lucey; Sridha Sridharan; Jessica M. Howlett; Kenneth M. Prkachin

In a clinical setting, pain is reported either through patient self-report or via an observer. Such measures are problematic as they are: 1) subjective, and 2) give no specific timing information. Coding pain as a series of facial action units (AUs) can avoid these issues as it can be used to gain an objective measure of pain on a frame-by-frame basis. Using video data from patients with shoulder injuries, in this paper, we describe an active appearance model (AAM)-based system that can automatically detect the frames in video in which a patient is in pain. This pain data set highlights the many challenges associated with spontaneous emotion detection, particularly that of expression and head movement due to the patients reaction to pain. In this paper, we show that the AAM can deal with these movements and can achieve significant improvements in both the AU and pain detection performance compared to the current-state-of-the-art approaches which utilize similarity-normalized appearance features only.


international conference on acoustics, speech, and signal processing | 2003

Real-time adaptive background segmentation

Darren Butler; Sridha Sridharan; V.M.Jr. Bove

Automatic analysis of digital video scenes often requires the segmentation of moving objects from the background. Historically, algorithms developed for this purpose have been restricted to small frame sizes, low frame rates or offline processing. The simplest approach involves subtracting the current frame from the known background. However, as the background is unknown, the key is how to learn and model it. The paper proposes a new algorithm that represents each pixel in the frame by a group of clusters. The clusters are ordered according the likelihood that they model the background and are adapted to deal with background and lighting variations. Incoming pixels are matched against the corresponding cluster group and are classified according to whether the matching cluster is considered part of the background. The algorithm has been subjectively evaluated against three other techniques. It demonstrates equal or better segmentation than the other techniques and proves capable of processing 320/spl times/240 video at 28 fps, excluding post-processing.


Computer Speech & Language | 2008

Explicit modelling of session variability for speaker verification

Robbie Vogt; Sridha Sridharan

This article describes a general and powerful approach to modelling mismatch in speaker recognition by including an explicit session term in the Gaussian mixture speaker modelling framework. Under this approach, the Gaussian mixture model (GMM) that best represents the observations of a particular recording is the combination of the true speaker model with an additional session-dependent offset constrained to lie in a low-dimensional subspace representing session variability. A novel and efficient model training procedure is proposed in this work to perform the simultaneous optimisation of the speaker model and session variables required for speaker training. Using a similar iterative approach to the Gauss–Seidel method for solving linear systems, this procedure greatly reduces the memory and computational resources required by a direct solution. Extensive experimentation demonstrates that the explicit session modelling provides up to a 68% reduction in detection cost over a standard GMM-based system and significant improvements over a system utilising feature mapping, and is shown to be effective on the corpora of recent National Institute of Standards and Technology (NIST) Speaker Recognition Evaluations, exhibiting different session mismatch conditions.


Face and Gesture 2011 | 2011

Person-independent facial expression detection using Constrained Local Models

Sien W. Chew; Patrick Lucey; Simon Lucey; Jason M. Saragih; Jeffrey F. Cohn; Sridha Sridharan

In automatic facial expression detection, very accurate registration is desired which can be achieved via a deformable model approach where a dense mesh of 60–70 points on the face is used, such as an active appearance model (AAM). However, for applications where manually labeling frames is prohibitive, AAMs do not work well as they do not generalize well to unseen subjects. As such, a more coarse approach is taken for person-independent facial expression detection, where just a couple of key features (such as face and eyes) are tracked using a Viola-Jones type approach. The tracked image is normally post-processed to encode for shift and illumination invariance using a linear bank of filters. Recently, it was shown that this preprocessing step is of no benefit when close to ideal registration has been obtained. In this paper, we present a system based on the Constrained Local Model (CLM) method which is a generic or person-independent face alignment algorithm which gains high accuracy. We show these results against the LBP feature extraction on the CK+ and GEMEP-FERA datasets.


international conference on acoustics speech and signal processing | 1998

A syntactic approach to automatic lip feature extraction for speaker identification

Tim Wark; Sridha Sridharan

This paper presents a novel technique for the tracking and extraction of features from lips for the purpose of speaker identification. In noisy or other adverse conditions, identification performance via the speech signal can significantly reduce, hence additional information which can complement the speech signal is of particular interest. In our system, syntactic information is derived from chromatic information in the lip region. A model of the lip contour is formed directly from the syntactic information, with no minimization procedure required to refine estimates. Colour features are then extracted from the lips via profiles taken around the lip contour. Further improvement in lip features is obtained via linear discriminant analysis (LDA). Speaker models are built from the lip features based on the Gaussian mixture model (GMM). Identification experiments are performed on the M2VTS database, with encouraging results.


IEEE Journal on Selected Areas in Communications | 1993

Design and cryptanalysis of transform-based analog speech scramblers

B. Goldburg; Sridha Sridharan; Ed Dawson

Four discrete orthogonal transforms have been evaluated for their suitability for use in transform-based analog speech encryption. Subjective as well as objective tests were conducted to compare the residual intelligibility and the recovered speech quality under channel conditions. The cryptanalytic strengths of the schemes were then compared by applying a novel cryptanalytic attack which exploits the redundancy of speech using a spectral vector codebook. The results indicate that the discrete cosine transform (DCT) is the best transform to use in transform-based encryption. A modification of the DCT-based scheme which significantly improves the security of the scrambler is proposed. >


digital image computing techniques and applications | 2012

A Database for Person Re-Identification in Multi-Camera Surveillance Networks

Alina Bialkowski; Simon Denman; Sridha Sridharan; Clinton Fookes; Patrick Lucey

Person re-identification involves recognising individuals in different locations across a network of cameras and is a challenging task due to a large number of varying factors such as pose (both subject and camera) and ambient lighting conditions. Existing databases do not adequately capture these variations, making evaluations of proposed techniques difficult. In this paper, we present a new challenging multi-camera surveillance database designed for the task of person re-identification. This database consists of 150 unscripted sequences of subjects travelling in a building environment though up to eight camera views, appearing from various angles and in varying illumination conditions. A flexible XML-based evaluation protocol is provided to allow a highly configurable evaluation setup, enabling a variety of scenarios relating to pose and lighting conditions to be evaluated. A baseline person re-identification system consisting of colour, height and texture models is demonstrated on this database.


Digital Signal Processing | 2001

Adaptive Fusion of Speech and Lip Information for Robust Speaker Identification

Tim Wark; Sridha Sridharan

Abstract Wark, T., and Sridharan, S., Adaptive Fusion of Speech and Lip Information for Robust Speaker Identification, Digital Signal Processing 11 (2001) 169–186 This paper compares techniques for asynchronous fusion of speech and lip information for robust speaker identification. In any fusion system, the ultimate challenge is to determine the optimal way to combine all information sources under varying conditions. We propose a new method for estimating confidence levels to allow intelligent fusion of the audio and visual data. We describe a secondary classification system, where secondary classifiers are used to give approximations for the estimation errors of outputs likelihoods from primary classifiers. The error estimates are combined with a dispersion measure technique allowing an adaptive fusion strategy based on the level of data degradation at the time of testing. We compare the performance of this fusion system with two other approaches to linear fusion and show that the use of secondary classifiers is an effective technique for improving classification performance. Identification experiments are performed on the M2VTS multimodal database , with encouraging results.

Collaboration


Dive into the Sridha Sridharan's collaboration.

Top Co-Authors

Avatar

Clinton Fookes

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Simon Denman

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

David Dean

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Vinod Chandran

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Simon Lucey

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Robert J. Vogt

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Brendan Baker

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Robbie Vogt

Queensland University of Technology

View shared research outputs
Top Co-Authors

Avatar

Ruan Lakemond

Queensland University of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge