Is this you? Create Your Porfile

Colin Vaz

University of Southern California

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Colin Vaz is active.

Explore More

Publication

Featured researches published by Colin Vaz.

international conference on computational linguistics | 2014

SAIL: Sentiment Analysis using Semantic Similarity and Contrast Features

Nikolaos Malandrakis; Michael Falcone; Colin Vaz; Jesse James Bisogni; Alexandros Potamianos; Shrikanth Narayanan

This paper describes our submission to SemEval2014 Task 9: Sentiment Analysis in Twitter. Our model is primarily a lexicon based one, augmented by some preprocessing, including detection of MultiWord Expressions, negation propagation and hashtag expansion and by the use of pairwise semantic similarity at the tweet level. Feature extraction is repeated for sub-strings and contrasting sub-string features are used to better capture complex phenomena like sarcasm. The resulting supervised system, using a Naive Bayes model, achieved high performance in classifying entire tweets, ranking 7th on the main set and 2nd when applied to sarcastic tweets.

international conference on acoustics, speech, and signal processing | 2014

Barista: A framework for concurrent speech processing by usc-sail

Doğan Can; James Gibson; Colin Vaz; Panayiotis G. Georgiou; Shrikanth Narayanan

We present Barista, an open-source framework for concurrent speech processing based on the Kaldi speech recognition toolkit and the libcppa actor library. With Barista, we aim to provide an easy-to-use, extensible framework for constructing highly customizable concurrent (and/or distributed) networks for a variety of speech processing tasks. Each Barista network specifies a flow of data between simple actors, concurrent entities communicating by message passing, modeled after Kaldi tools. Leveraging the fast and reliable concurrency and distribution mechanisms provided by libcppa, Barista lets demanding speech processing tasks, such as real-time speech recognizers and complex training workflows, to be scheduled and executed on parallel (and/or distributed) hardware. Barista is released under the Apache License v2.0.

conference of the international speech communication association | 2016

State-of-the-Art MRI Protocol for Comprehensive Assessment of Vocal Tract Structure and Function.

Sajan Goud Lingala; Asterios Toutios; Johannes Töger; Yongwan Lim; Yinghua Zhu; Yoon-Chul Kim; Colin Vaz; Shrikanth Narayanan; Krishna S. Nayak

Magnetic Resonance Imaging (MRI) provides a safe and flexible means to study the vocal tract, and is increasingly used in speech production research. This work details a state-ofthe-art MRI protocol for comprehensive assessment of vocal tract structure and function, and presents results from representative speakers. The system incorporates (a) custom upper airway coils that are maximally sensitive to vocal tract tissues, (b) graphical user interface for 2D real-time MRI that provides on-the-fly reconstruction for interactive localization, and correction of imaging artifacts, (c) off-line constrained reconstruction for generating high spatio-temporal resolution dynamic images at (83 frames per sec, 2.4 mm), (d) 3D static imaging of sounds sustained for 7 sec with full vocal tract coverage and isotropic resolution (resolution: 1.25 mm), (e) T2-weighted high-resolution, high-contrast depiction of soft-tissue boundaries of the full vocal tract (axial, coronal, sagittal sweeps with resolution: 0.58 x 0.58 x 3 mm), and (f) simultaneous audio recording with off-line noise cancellation and temporal alignment of audio with 2D real-time MRI. A stimuli set was designed to capture efficiently salient, static and dynamic, articulatory and morphological aspects of speech production in 90minute data acquisition sessions.

international conference on acoustics, speech, and signal processing | 2014

ENERGY-CONSTRAINED MINIMUM VARIANCE RESPONSE FILTER FOR ROBUST VOWEL SPECTRAL ESTIMATION

Colin Vaz; Andreas Tsiartas; Shrikanth Narayanan

We propose the energy-constrained minimum-variance response (ECMVR) filter to perform robust spectral estimation of vowels. We modify the distortionless constraint of the minimum-variance distortionless response (MVDR) filter and add an energy constraint to its formulation to mitigate the influence of noise on the speech spectrum. We test our ECMVR filter on a vowel classification task with different background noises at various SNR levels. Results show that vowels are classified more accurately in certain noises using MFCC and PLP features extracted from the ECMVR spectrum compared to using features extracted from the FFT and MVDR spectra.

multimedia signal processing | 2016

Novel affective features for multiscale prediction of emotion in music

Naveen Kumar; Tanaya Guha; Che-Wei Huang; Colin Vaz; Shrikanth Narayanan

The majority of computational work on emotion in music concentrates on developing machine learning methodologies to build new, more accurate prediction systems, and usually relies on generic acoustic features. Relatively less effort has been put to the development and analysis of features that are particularly suited for the task. The contribution of this paper is twofold. First, the paper proposes two features that can efficiently capture the emotion-related properties in music. These features are named compressibility and sparse spectral components. These features are designed to capture the overall affective characteristics of music (global features). We demonstrate that they can predict emotional dimensions (arousal and valence) with high accuracy as compared to generic audio features. Secondly, we investigate the relationship between the proposed features and the dynamic variation in the emotion ratings. To this end, we propose a novel Haar transform-based technique to predict dynamic emotion ratings using only global features.

conference of the international speech communication association | 2016

Convex Hull Convolutive Non-Negative Matrix Factorization for Uncovering Temporal Patterns in Multivariate Time-Series Data.

Colin Vaz; Asterios Toutios; Shrikanth Narayanan

We propose the Convex Hull Convolutive Non-negative Matrix Factorization (CH-CNMF) algorithm to learn temporal patterns in multivariate time-series data. The algorithm factors a data matrix into a basis tensor that contains temporal patterns and an activation matrix that indicates the time instants when the temporal patterns occurred in the data. Importantly, the temporal patterns correspond closely to the observed data and represent a wide range of dynamics. Experiments with synthetic data show that the temporal patterns found by CH-CNMF match the data better and provide more meaningful information than the temporal patterns found by Convolutive Non-negative Matrix Factorization with sparsity constraints (CNMF-SC). Additionally, CH-CNMF applied on vocal tract constriction data yields a wider range of articulatory gestures compared to CNMF-SC. Moreover, we find that the gestures comprising the CH-CNMF basis generalize better to unseen data and capture vocal tract structure and dynamics significantly better than those comprising the CNMF-SC basis.

Proceedings of the 4th ACM Workshop on Wearable Systems and Applications | 2018

TILES audio recorder: an unobtrusive wearable solution to track audio activity

Tiantian Feng; Amrutha Nadarajan; Colin Vaz; Brandon M. Booth; Shrikanth Narayanan

Most existing speech activity trackers used in human subject studies are bulky, record raw audio content which invades participant privacy, have complicated hardware and non-customizable software, and are too expensive for large-scale deployment. The present effort seeks to overcome these challenges by proposing the TILES Audio Recorder (TAR) - an unobtrusive and scalable solution to track audio activity using an affordable miniature mobile device with an open-source app. For this recorder, we make use of Jelly Pro Mobile, a pocket-sized Android smartphone, and employ two open-source toolkits: openSMILE and Tarsos-DSP. Tarsos-DSP provides a Voice Activity Detection capability that triggers openSMILE to extract and save audio features only when the subject is speaking. Experiments show that performing feature extraction only during speech segments greatly increases battery life, enabling the subject to wear the recorder up to 10 hours at time. Furthermore, recording experiments with ground-truth clean speech show minimal distortion of the recorded features, as measured by root mean-square error and cosine distance. The TAR app further provides subjects with a simple user interface that allows them to both pause feature extraction at any time and also easily upload data to a remote server.

international conference on acoustics, speech, and signal processing | 2016

CNMF-based acoustic features for noise-robust ASR

Colin Vaz; Dimitrios Dimitriadis; Samuel Thomas; Shrikanth Narayanan

We present an algorithm using convolutive non-negative matrix factorization (CNMF) to create noise-robust features for automatic speech recognition (ASR). Typically in noise-robust ASR, CNMF is used to remove noise from noisy speech prior to feature extraction. However, we find that denoising introduces distortion and artifacts, which can degrade ASR performance. Instead, we propose using the time-activation matrices from CNMF as acoustic model features. In this paper, we describe how to create speech and noise dictionaries that generate noise-robust time-activation matrices from noisy speech. Using the time-activation matrices created by our proposed algorithm, we achieve a 11.8% relative improvement in the word error rate on the Aurora 4 corpus compared to using log-mel filterbank energies. Furthermore, we attain a 13.8% relative improvement over log-mel filterbank energies when we combine them with our proposed features, indicating that our features contain complementary information to log-mel features.

conference of the international speech communication association | 2016

Noise Aware and Combined Noise Models for Speech Denoising in Unknown Noise Conditions.

Pavlos Papadopoulos; Colin Vaz; Shrikanth Narayanan

Traditional denoising schemes require prior knowledge or statistics of the noise corrupting the signal, or estimate the noise from noise-only portions of the signal, which requires knowledge of speech boundaries. Extending denoising methods to perform well in unknown noise conditions can facilitate processing of data captured in different real life environments, and relax rigid data acquisition protocols. In this paper we propose two methods for denoising speech signals in unknown noise conditions. The first method has two stages. In the first stage we use Long Term Signal Variability features to decide which noise model to use from a pool of available models. Once we determine the noise type, we use Nonnegative Matrix Factorization with a dictionary trained on that noise to denoise the signal. In the second method, we create a combined noise dictionary from different types of noise, and use that dictionary in the denoising phase. Both of our systems improve signal quality, as measured by PESQ scores, for all the noise types we tested, and for different Signal to Noise Ratio levels.

conference of the international speech communication association | 2016

Illustrating the Production of the International Phonetic Alphabet Sounds Using Fast Real-Time Magnetic Resonance Imaging.

Asterios Toutios; Sajan Goud Lingala; Colin Vaz; Jangwon Kim; John H. Esling; Patricia A. Keating; Matthew Gordon; Dani Byrd; Louis Goldstein; Krishna S. Nayak; Shrikanth Narayanan

Recent advances in real-time magnetic resonance imaging (rtMRI) of the upper airway for acquiring speech production data provide unparalleled views of the dynamics of a speaker’s vocal tract at very high frame rates (83 frames per second and even higher). This paper introduces an effort to collect and make available on-line rtMRI data corresponding to a large subset of the sounds of the world’s languages as encoded in the International Phonetic Alphabet, with supplementary English words and phonetically-balanced texts, produced by four prominent phoneticians, using the latest rtMRI technology. The technique images oral as well as laryngeal articulator movements in the production of each sound category. This resource is envisioned as a teaching tool in pronunciation training, second language acquisition, and speech therapy.

Explore More