Is this you? Create Your Porfile

Alex Park

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alex Park is active.

Explore More

Publication

Featured researches published by Alex Park.

IEEE Transactions on Audio, Speech, and Language Processing | 2008

Unsupervised Pattern Discovery in Speech

Alex Park; James R. Glass

We present a novel approach to speech processing based on the principle of pattern discovery. Our work represents a departure from traditional models of speech recognition, where the end goal is to classify speech into categories defined by a prespecified inventory of lexical units (i.e., phones or words). Instead, we attempt to discover such an inventory in an unsupervised manner by exploiting the structure of repeating patterns within the speech signal. We show how pattern discovery can be used to automatically acquire lexical entities directly from an untranscribed audio stream. Our approach to unsupervised word acquisition utilizes a segmental variant of a widely used dynamic programming technique, which allows us to find matching acoustic patterns between spoken utterances. By aggregating information about these matching patterns across audio streams, we demonstrate how to group similar acoustic sequences together to form clusters corresponding to lexical entities such as words and short multiword phrases. On a corpus of academic lecture material, we demonstrate that clusters found using this technique exhibit high purity and that many of the corresponding lexical identities are relevant to the underlying audio stream.

ieee automatic speech recognition and understanding workshop | 2005

Towards unsupervised pattern discovery in speech

Alex Park; James R. Glass

We present an unsupervised algorithm for discovering acoustic patterns in speech by finding matching subsequences between pairs of utterances. The approach we describe is, in theory, language and topic independent, and is particularly well suited for processing large amounts of speech from a single speaker. A variation of dynamic time warping (DTW), which we call segmental DTW, is used to performing the pairwise utterance comparison. Using academic lecture data, we describe two potentially useful applications for the segmental DTW output: augmenting speech recognition transcriptions for information retrieval and speech segment clustering for unsupervised word discovery. Some preliminary qualitative results for both experiments are shown and the implications for future work and applications are discussed

international conference on acoustics, speech, and signal processing | 2005

Automatic processing of audio lectures for information retrieval: vocabulary selection and language modeling

Alex Park; Timothy J. Hazen; James R. Glass

This paper describes our initial progress towards developing a system for automatically transcribing and indexing audio-visual academic lectures for audio information retrieval. We investigate the problem of how to combine generic spoken data sources with subject-specific text sources for processing lecture speech. In addition to word recognition experiments, we perform audio information retrieval simulations to characterize retrieval performance when using errorful automatic transcriptions. Given an appropriately selected vocabulary, we observe that good retrieval performance can be obtained even with high recognition error rates. For language model training, we observe that the addition of spontaneous speech data to subject-specific written material results in more accurate transcriptions, but has a marginal effect on retrieval performance.

2006 IEEE Odyssey - The Speaker and Language Recognition Workshop | 2006

The MIT Mobile Device Speaker Verification Corpus: Data Collection and Preliminary Experiments

Ram H. Woo; Alex Park; Timothy J. Hazen

In this paper we discuss data collection and preliminary experiments for a new speaker verification corpus collected on a small handheld device in multiple environments using multiple microphones. This corpus, which has been made publically available by MIT, is intended for explorations of the problem of robust speaker verification on handheld devices in noisy environments with limited training data. To provide a set of preliminary results, we examine text-dependent speaker verification under a variety of cross-conditional environment and microphone training constraints. Our preliminary results indicate that the presence of noise in the training data improves the robustness of our speaker verification models even when tested in mismatched environments

international conference on acoustics, speech, and signal processing | 2006

Unsupervised Word Acquisition from Speech using Pattern Discovery

Alex Park; James R. Glass

In this paper, we present an unsupervised method for automatically discovering words from speech using a combination of acoustic pattern discovery, graph clustering, and baseform searching. The algorithm we propose represents an alternative to traditional methods of speech recognition and makes use of the acoustic similarity of multiple realizations of the same words or phrases. On a set of three academic lectures on different subjects, we show that the clustering component of the algorithm is able to successfully generate word clusters that have good coverage of subject-relevant words. Moreover, we illustrate how to use the cluster nodes to retrieve the word identity of each cluster from a large baseform dictionary. Results indicate that this algorithm may prove useful for applications such as vocabulary initialization, speech summarization, or augmentation of existing recognition systems

international conference on multimodal interfaces | 2003

Towards robust person recognition on handheld devices using face and speaker identification technologies

Timothy J. Hazen; Eugene Weinstein; Alex Park

Most face and speaker identification techniques are tested on data collected in controlled environments using high quality cameras and microphones. However, the use of these technologies in variable environments and with the help of the inexpensive sound and image capture hardware present in mobile devices presents an additional challenge. In this study, we investigate the application of existing face and speaker identification techniques to a person identification task on a handheld device. These techniques have proven to perform accurately on tightly constrained experiments where the lighting conditions, visual backgrounds, and audio environments are fixed and specifically adjusted for optimal data quality. When these techniques are applied on mobile devices where the visual and audio conditions are highly variable, degradations in performance can be expected. Under these circumstances, the combination of multiple biometric modalities can improve the robustness and accuracy of the person identification task. In this paper, we present our approach for combining face and speaker identification technologies and experimentally demonstrate a fused multi-biometric system which achieves a 50% reduction in equal error rate over the better of the two independent systems.

spoken language technology workshop | 2006

A NOVEL DTW-BASED DISTANCE MEASURE FOR SPEAKER SEGMENTATION

Alex Park; James R. Glass

We present a novel distance measure for comparing two speech segments that uses a local version of the well-known DTW algorithm. Our approach is based on the idea of finding word-level speech patterns that are repeated by the same speaker. Using this distance measure, we develop a speaker segmentation procedure and apply it to the task of segmenting multi-speaker lectures. We demonstrate that our approach is able to generate segmentations that correlate well to independently generated human segmentations. In experiments performed on over ten hours of multi-speaker lecture data, we were able to find speaker change points with precision and recall rates of 80% and 100%, respectively.

conference of the international speech communication association | 2002