Leena Mary
Rajiv Gandhi Institute of Technology, Mumbai
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Leena Mary.
international conference on intelligent sensing and information processing | 2004
Leena Mary; B. Yegnanarayana
The objective of this paper is to demonstrate the feasibility of automatic language identification (LID) systems, using spectral features. The powerful features of autoassociative neural network models are exploited for capturing the language specific features for developing the language identification system. The nonlinear models capture the complex distribution of spectral vectors in the feature space for developing system parameters. The LID system can be easily extended for more number of languages without any additional higher level linguistic information. Effectiveness of the proposed method is demonstrated for identification of speech utterances from four Indian languages.
Archive | 2011
Leena Mary
Extraction and Representation of Prosodic Features for Speech Processing Applications deals with prosody from speech processing point of view with topics including: The significance of prosody for speech processing applicationsWhy prosody need to be incorporated in speech processing applicationsDifferent methods for extraction and representation of prosody for applications such as speech synthesis, speaker recognition, language recognition and speech recognitionThis book is for researchers and students at the graduate level.
international conference on acoustics, speech, and signal processing | 2013
Leena Mary; K. K. Anish Babu; Aju Joseph; Gibin M. George
In this paper, we describe a technique for evaluating the quality of mimicked speech. In other words, mimicry artists are evaluated based on their competences to mimic a particular person. This evaluation is done based on prosodic characteristics for the text dependent cases. Prosodic characteristics are represented using features derived from pitch contour, duration and energy. In this work, prosodic features are extracted from speech after automatically segmenting into intonational phrases. Pitch contour corresponding to each phrase is approximated using weighted sum of legendre polynomials. Prosodic feature set includes weights of first four legendre polynomials (w0k, w1k, w2k, w3k), average jitter, average shimmer, voiced duration, total duration and change in energy of each intonation phrase. The effectiveness of the technique is demonstrated using a text dependent database of mimicked speeches. Evaluation is done by dynamic time warping of prosodic features derived from the mimicked speech and the original speech. The scores obtained from this evaluation is compared with the results of manual perception/listening tests, which clearly indicate the effectiveness of the proposed technique.
2013 International Conference on Control Communication and Computing (ICCC) | 2013
A. Sreejith; Leena Mary; K. S. Riyas; Aju Joseph; Anish Augustine
This paper describes an approach to automatic labeling of prosodic events and discusses about the implementation of broad class phonetic engine for Malayalam. Phonetic Engine (PE) which is the first stage of automatic speech recognition converts input speech to a sequence of phonetic symbols. A baseline phonetic engine is created based on speech data collected from various regions of Kerala consists of read speech, extempore, and conversational speech. In order to incorporate the prosodic information into baseline phonetic engine a new algorithm for automatic labeling of prosody is proposed. This algorithm is used to automatically label the pitch variations and pauses in speech utterance in order to represent the prosody.
2013 International Conference on Control Communication and Computing (ICCC) | 2013
S. Renjith; Leena Mary; K. K. Anish Babu; Aju Joseph; Gibin M. George
Speaker recognition has many applications such as access control, person authentication systems, forensics etc. In forensic applications, questioned recording may be received through different channels, noisy conditions and with cases of voice forgery, which make speaker recognition a challenging task. State of the art speaker recognition systems use spectral features which are susceptible to channel mismatch and noises. In this paper we present a novel voice forgery detection system based on prosodic features using Support Vector Machines (SVM). The effectiveness of the proposed method is illustrated on a database collected from professional mimicry artists.
2013 International Conference on Control Communication and Computing (ICCC) | 2013
Jobin George; Leena Mary; K. S. Riyas
In this paper, a new efficient method for detection and classification of vehicles from acoustic signal using ANN and KNN is presented. Automatic Identification and classification of vehicles is a very challenging area, which is in contrast to the traditional practice of monitoring the vehicles manually. In this paper, an algorithm has been developed and implemented for classification of vehicles belonging to different classes in a typical of Indian scenario. Automatic identification and classification of vehicles is a challenging problem in traffic planning, in contrast to the traditional practice of monitoring traffic manually. This becomes even more challenging in single|double lane road with heterogeneous traffic, which is typical in Indian scenario. In this work we propose an algorithm for automatic detection and broad classification of vehicles in to three categories namely heavy, medium and light. When a vehicle passes the microphone the recorded acoustic signal shows a peak in energy. The energy contour is smoothed and peaks are automatically located for detection of vehicle sound signal. Mel frequency cepstral coefficients are extracted for detection the regions around detected peaks. The feature vectors are used for training ANN/KNN classifiers. Efficiency of the method is illustrated using test data which contains approximately 160 vehicles belonging to different categories.
international conference on signal processing | 2008
Leena Mary; B. Yegnanarayana
In this paper, we examine the effectiveness of prosodic features for language identification. Prosodic differences among world languages include variations in intonation, rhythm, and stress. These variations are represented using features derived from fundamental frequency (F0) contour, duration, and energy contour. For extracting the prosodic features, speech signal is segmented into syllable-like units by locating vowel-onset points (VOP) automatically. Various parameters are then derived to represent F0 contour, duration, and energy contour characteristics for each syllable-like unit The features obtained by concatenating the parameters derived from three consecutive syllable-like units are used to represent the prosodic characteristics of a language. The prosodic features thus derived from different languages are used to train a multilayer feedforward neural network (MLFFNN) classifier for language identification. The effectiveness of the proposed approach is verified using Oregon Graduate Institute (OGI) multi-language telephone speech corpus and National Institute of Science and Technology (NIST) 2003 language identification database.
Archive | 2012
Leena Mary
The discussions in Chapter 1 is on the automatic extraction of prosodic features for recognizing speaker, language and speech. In this chapter, different techniques suggested for automatic extraction of prosodic features are described. The techniques are broadly classified as ASR free and ASR based approaches. Techniques are further classified on the basis of segmentation approach.
international conference on signal processing | 2016
P Piyush; Rajeev Rajan; Leena Mary; Bino I. Koshy
The road transport is one of the most common modes of transport. Road planning and traffic management is conducted based on survey of traffic volume. These surveys can be manual or automatic. Audio based survey suffers from low accuracy but has low computational cost. Video based survey has significantly higher accuracy but demands high computational resources and time. In this paper, we propose an approach which utilizes both audio and video of traffic data to perform automatic traffic survey. Vehicles are automatically detected by locating peaks in the smoothed short time energy of the captured audio signal. Video frames are extracted around the location of the detected peaks. Thus, the number of video frames to be processed is reduced considerably. Vehicle image from the extracted video frames are detected using background subtraction and three frame differencing. Noisy binary image thus obtained is transformed into single object using morphological processing. Features such as area, perimeter, maximum length, horizontal length and 32 features generated from the vehicle shape are used to characterize the image of vehicles. These feature vectors are used to train a multilayer feed-forward artificial neural network classifier for seven classes of vehicles. The effectiveness of the proposed algorithm is tested using a query audio to obtain an accuracy of 82%.
2016 International Conference on Next Generation Intelligent Systems (ICNGIS) | 2016
Christin Daniel; Leena Mary
Road planning and traffic monitoring is conducted based on survey of traffic volume. In recent years many of researchers have developed vision and audio based techniques for detection and classification of moving vehicles. Audio based technique suffers from low accuracy but has low computational cost. Then there is visual based approach which has significantly higher accuracy but demands high computational resources. This paper proposes a new approach which utilizes both audio and video of traffic data to perform traffic volume survey. Vehicle detection can be done from audio signal. Video frames around audio peaks are selectively extracted. Then visual feature vectors are extracted from the binary image of the vehicle. Audio features represented using Mel Frequency Cepstral Coefficients (MFCC) are extracted from the regions around the vehicle peak. Classification is done using multilayer feed forward neural network which gave an overall classification accuracy of 92.67% for seven vehicle classes with the chosen set of audio visual features.