Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Viktor Rozgic is active.

Publication


Featured researches published by Viktor Rozgic.


IEEE Transactions on Neural Systems and Rehabilitation Engineering | 2010

Multimodal Physical Activity Recognition by Fusing Temporal and Cepstral Information

Ming Li; Viktor Rozgic; Gautam Thatte; Sangwon Lee; Adar Emken; Murali Annavaram; Urbashi Mitra; Donna Spruijt-Metz; Shrikanth Narayanan

A physical activity (PA) recognition algorithm for a wearable wireless sensor network using both ambulatory electrocardiogram (ECG) and accelerometer signals is proposed. First, in the time domain, the cardiac activity mean and the motion artifact noise of the ECG signal are modeled by a Hermite polynomial expansion and principal component analysis, respectively. A set of time domain accelerometer features is also extracted. A support vector machine (SVM) is employed for supervised classification using these time domain features. Second, motivated by their potential for handling convolutional noise, cepstral features extracted from ECG and accelerometer signals based on a frame level analysis are modeled using Gaussian mixture models (GMMs). Third, to reduce the dimension of the tri-axial accelerometer cepstral features which are concatenated and fused at the feature level, heteroscedastic linear discriminant analysis is performed. Finally, to improve the overall recognition performance, fusion of the multimodal (ECG and accelerometer) and multidomain (time domain SVM and cepstral domain GMM) subsystems at the score level is performed. The classification accuracy ranges from 79.3% to 97.3% for various testing scenarios and outperforms the state-of-the-art single accelerometer based PA recognition system by over 24% relative error reduction on our nine-category PA database.


international conference on acoustics, speech, and signal processing | 2011

Estimation of ordinal approach-avoidance labels in dyadic interactions: Ordinal logistic regression approach

Viktor Rozgic; Bo Xiao; Athanasios Katsamanis; Brian R. Baucom; Panayiotis G. Georgiou; Shrikanth Narayanan

Behavioral Signal Processing aims at automating behavioral coding schemes such as those prevalent in psychology and mental health research. This paper describes a method to automatically quantify the approach-and-avoidance (AA) behavior, described by ordinal labels manually assigned by experts using either video-only or video-with-audio. We propose a novel ordinal regression (OR) algorithm and its hidden Markov model (HMM) extension for estimation of AA labels from visual motion capture based and acoustic features. The proposed algorithm transforms the OR to multiple binary classification problems, solves them by independent score-outputting classifiers and fits the cumulative logit logistic regression model with proportional odds (CLLRMP) to vectors of the classifier scores. The time series extension treats labels as states of the HMM with a likelihood function derived from the probabilistic CLLRMP output. We compare performances of the proposed algorithm applying the weighted binary SVMs in the second step (SVM-OLR), its time-series extension (HMM-SVM-OLR) and the baseline multi-class SVM. On the used dyadic interaction dataset the HMM-SVM-OLR achieves the highest estimation accuracies 71.6 % and 65.7 % for AA labels assigned respectively using video-only and video-with-audio.


Journal of Multimedia | 2010

Multimodal Speaker Segmentation and Identification in Presence of Overlapped Speech Segments

Viktor Rozgic; Kyu Jeong Han; Panayiotis G. Georgiou; Shrikanth Narayanan

We describe a multimodal algorithm for speaker segmentation and identification with two main contributions: First, we propose a hidden Markov model architecture that performs fusion of three information sources: a multicamera system for participant localization, a microphone array for speaker localization, and a speaker identification system. Second, we present a novel likelihood model for the microphone array observations for dealing with overlapped speech. We propose a modification of the Steered Power Response Generalized Cross Correlation Phase Transform (SPR-GCC-PHAT) function that takes into account the possible microphone occlusions and use its local maxima as microphone array observations. The likelihood of the extracted local maxima given positions of active speakers is modeled using the Joint Probabilistic Data Association (JPDA) framework. The state in the proposed hidden Markov model is a vector of the speaker activity indicators of present participants, and the unknown parameter is the mapping of participants’ locations to the set of all possible participants’ identities. We present and compare two ways for the joint estimation of the states and the unknown parameter: the first, a forward Bayesian filter that performs sequential estimate updates as new observations arrive and the second, a batch decoding using the Viterbi algorithm. Results show that, for both decoding algorithms, the proposed method outperforms standard speaker segmentation systems based on (a) speaker identification and (b) microphone array processing, for dataset with significant portion (27.4%) of overlapped speech and scores as high as 94.4% on the F-measure scale.


multimedia signal processing | 2007

Multimodal Meeting Monitoring: Improvements on Speaker Tracking and Segmentation through a Modified Mixture Particle Filter

Viktor Rozgic; Carlos Busso; Panayiotis G. Georgiou; Shrikanth Narayanan

In this paper we address improvements to our multimodal system for tracking of meeting participants and speaker segmentation with a focus on the microphone array modality. We propose an algorithm that uses Directions-of-Arrival estimated for each microphone pair as observations and performs tracking of an unknown number of acoustically-active meeting participants and subsequent speaker segmentation. We propose modified mixture particle fillter (mMPF) for tracking of acoustic sources in the track-before-detection (TbD) framework. Trajectories of sound sources are reconstructed by the optimal assignment of posterior mixture components produced by mMPF in consecutive frames. Further, we propose a sequential optimal change-point detection algorithm which discovers speech segments in the reconstructed trajectories i.e., performs speaker segmentation. The algorithm is tested on a multi-participant meeting dataset both separately and as a part of the multimodal system. On the task of speaker detection in the multimodal setup we report significant improvement over our previous state of the art implementation.


distributed computing in sensor systems | 2009

Optimal Allocation of Time-Resources for Multihypothesis Activity-Level Detection

Gautam Thatte; Viktor Rozgic; Ming Li; Sabyasachi Ghosh; Urbashi Mitra; Shrikanth Narayanan; Murali Annavaram; Donna Spruijt-Metz

The optimal allocation of samples for activity-level detection in a wireless body area network for health-monitoring applications is considered. A wireless body area network with heterogeneous sensors is deployed in a simple star topology with the fusion center receiving biometric samples from each of the sensors. The number of samples collected from each of the sensors is optimized to minimize the probability of misclassification between multiple hypotheses at the fusion center. Using experimental data from our pilot study, we find equally allocating samples amongst sensors is normally suboptimal. A lower probability of error can be achieved by allocating a greater fraction of the samples to sensors which can better discriminate between certain activity-levels. As the number of samples is an integer, prior work employed an exhaustive search to determine the optimal allocation of integer samples. However, such a search is computationally expensive. To this end, an alternate continuous-valued vector optimization is derived which yields approximately optimal allocations which can be found with significantly lower complexity.


international conference on acoustics, speech, and signal processing | 2007

Information Theoretic Analysis of Direct Articulatory Measurements for Phonetic Discrimination

Gloria Silva; Vivek Rangarajan; Viktor Rozgic; Shrikanth Narayanan

This paper focuses on the analysis of speech production signals (physical measurements from electromagnetic articulograph) from the perspective of phone discrimination. We explore two different signal representation schemes for the articulatory signals, one based on time-domain analysis and the other based on frequency domain. We quantify the amount of discrimination information offered by the speech production signals in identifying the phone labels through mutual information. Mutual information analyses establish that substantial discrimination information is present in the articulatory stream. Furthermore, phonological classification results with articulatory signals indicate higher accuracy compared to the acoustic signal.


international conference on body area networks | 2009

Optimal time-resource allocation for activity-detection via multimodal sensing

Gautam Thatte; Viktor Rozgic; Ming Li; Sabyasachi Ghosh; Urbashi Mitra; Shrikanth Narayanan; Murali Annavaram; Donna Spruijt-Metz

The optimal allocation of measurements for activity-level detection in a wireless body area network (WBAN) for health-monitoring applications is considered. The WBAN with heterogeneous sensors is deployed in a simple star topology with the fusion center receiving a fixed number of measurements from the sensors; the number of measurements allocated to each sensor is optimized to minimize the probability of detection error at the fusion center. An analysis of the two-sensor case with binary hypotheses is presented. Since the number of measurements is an integer, an exhaustive search (grid search) is traditionally employed to determine the optimal allocation of measurements. However, such a search is computationally expensive. To this end, an alternate continuous-valued vector optimization is derived which yields approximately optimal allocations which can be found with lower complexity. Numerical case studies based on experimental data for different key activity-states are presented. It is observed that the Kullback-Leibler (KL) distances between the distributions associated with the hypotheses dominate the optimal allocation of measurements.


international symposium on multimedia | 2008

Multimodal Speaker Segmentation in Presence of Overlapped Speech Segments

Viktor Rozgic; Kyu Jeong Han; Panayiotis G. Georgiou; Shrikanth Narayanan

We propose a multimodal speaker segmentation algorithm with two main contributions: First, we suggest a hidden Markov model architecture that performs fusion of the three modalities: a multi-camera system for participant localization, a microphone array for speaker localization, and a speaker identification system; second, we present a novel method for dealing with overlapped speech segments through a likelihood model of the microphone array observations that uses multiple local maxima of the Steered Power response generalized cross correlation phase transform (SPR-GCC-PHAT) function in the joint probabilistic data association (JPDA) framework. Results show that the proposed method outperforms standard speaker segmentation systems based on: (a) speaker identification and; (b) microphone array processing, for datasets with the significant portion (27.4%) of overlapped speech, and scores as high as 94.4% on the F-measure scale.


conference of the international speech communication association | 2012

Emotion Recognition using Acoustic and Lexical Features.

Viktor Rozgic; Sankaranarayanan Ananthakrishnan; Shirin Saleem; Rohit Kumar; Aravind Namandi Vembu; Rohit Prasad


Archive | 2010

A new multichannel multimodal dyadic interaction database

Viktor Rozgic; Bo Xiao; Athanasios Katsamanis; Brian R. Baucom; Panayiotis G. Georgiou; Shrikanth Narayanan

Collaboration


Dive into the Viktor Rozgic's collaboration.

Top Co-Authors

Avatar

Shrikanth Narayanan

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Panayiotis G. Georgiou

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Bo Xiao

University of Southern California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Donna Spruijt-Metz

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Gautam Thatte

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Ming Li

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Murali Annavaram

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Urbashi Mitra

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Athanasios Katsamanis

National Technical University of Athens

View shared research outputs
Researchain Logo
Decentralizing Knowledge