Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sameer Maskey is active.

Publication


Featured researches published by Sameer Maskey.


conference of the international speech communication association | 2005

Comparing Lexical, Acoustic/Prosodic, Structural and Discourse Features for Speech Summarization

Sameer Maskey; Julia Hirschberg

We present results of an empirical study of the usefulness of different types of features in selecting extractive summaries of news broadcasts for our Broadcast News Summarization System. We evaluate lexical, prosodic, structural and discourse features as predictors of those news segments which should be included in a summary. We show that a summarization system that uses a combination of these feature sets produces the most accurate summaries, and that a combination of acoustic/prosodic and structural features are enough to build a ‘good’ summarizer when speech transcription is not available.


international conference on acoustics, speech, and signal processing | 2005

From text to speech summarization

Kathleen R. McKeown; Julia Hirschberg; Michel Galley; Sameer Maskey

In this paper, we present approaches used in text summarization, showing how they can be adapted for speech summarization and where they fall short. Informal style and apparent lack of structure in speech mean that the typical approaches used for text summarization must be extended for use with speech. We illustrate how features derived from speech can help determine summary content within two ongoing summarization projects at Columbia University.


IEEE Signal Processing Magazine | 2008

Speech segmentation and spoken document processing

Mari Ostendorf; Benoit Favre; Ralph Grishman; D. Hakkani-Tur; Mary P. Harper; D. Hillard; J. Hirschberg; Heng Ji; Jeremy G. Kahn; Yang Liu; Sameer Maskey; Hermann Ney; Andrew Rosenberg; Elizabeth Shriberg; Wen Wang; C. Woofers

Progress in both speech and language processing has spurred efforts to support applications that rely on spoken rather than written language input. A key challenge in moving from text-based documents to such spoken documents is that spoken language lacks explicit punctuation and formatting, which can be crucial for good performance. This article describes different levels of speech segmentation, approaches to automatically recovering segment boundary locations, and experimental results demonstrating impact on several language processing tasks. The results also show a need for optimizing segmentation for the end task rather than independently.


Ai Magazine | 2004

Constructionist Design Methodology for Interactive Intelligences

Kristinn R. Thórisson; Hrvoje Benko; Denis Abramov; Andrew Arnold; Sameer Maskey; Aruchunan Vaseekaran

We present a methodology for designing and implementing interactive intelligences. The constructionist design methodology (CDM) -- so called because it advocates modular building blocks and incorporation of prior work -- addresses factors that we see as key to future advances in AI, including support for interdisciplinary collaboration, coordination of teams, and large-scale systems integration. We test the methodology by building an interactive multifunctional system with a real-time perception- action loop. The system, whose construction relied entirely on the methodology, consists of an embodied virtual agent that can perceive both real and virtual objects in an augmented-reality room and interact with a user through coordinated gestures and speech. Wireless tracking technologies give the agent awareness of the environment and the users speech and communicative acts. User and agent can communicate about things in the environment, their placement, and their function, as well as about more abstract topics, such as current news, through situated multimodal dialogue. The results demonstrate the CDMs strength in simplifying the modeling of complex, multifunctional systems that require architectural experimentation and exploration of unclear subsystem boundaries, undefined variables, and tangled data flow and control hierarchies.


conference of the international speech communication association | 2003

Automatic Summarization of Broadcast News using Structural Features

Julia Hirschberg; Sameer Maskey

We present a method for summarizing broadcast news that is not affected by word errors in an automatic speech recognition transcription, using information about the structure of the news program. We construct a directed graphical model to represent the probability distribution and dependencies among the structural features which we train by finding the values of parameters of the conditional probability tables. We then rank segments of the test set and extract the highest ranked ones as a summary. We present the procedure and preliminary test results.


conference of the international speech communication association | 2004

Bootstrapping Phonetic Lexicons for New Languages

Sameer Maskey; Alan W. Black; Laura Mayfield Tomokiyo

Although phonetic lexicons are critical for many speech applications, the process of building one for a new language can take a significant amount of time and effort. We present a bootstrapping algorithm to build phonetic lexicons for new languages. Our method relies on a large amount of unlabeled text, a small set of ’seed words’ with their phonetic transcription, and the proficiency of a native speaker in correctly inspecting the generated pronunciations of the words. The method proceeds by automatically building Letter-to-Sound (LTS) rules from a small set of the most commonly occurring words in a large corpus of a given language. These LTS rules are retrained as new words are added to the lexicon in an Active Learning step. This procedure is repeated until we have a lexicon that can predict the pronunciation of any word in the target language with the accuracy desired. We tested our approach for three languages: English,


international conference on acoustics, speech, and signal processing | 2009

Resampling auxiliary data for language model adaptation in machine translation for speech

Sameer Maskey; Abhinav Sethy

Performance of n-gram language models depends to a large extent on the amount of training text material available for building the models and the degree to which this text matches the domain of interest. The language modeling community is showing a growing interest in using large collections of auxiliary textual material to supplement sparse in-domain resources. One of the problems in using such auxiliary corpora is that they may differ significantly from the specific nature of the domain of interest. In this paper, we propose three different methods for adapting language models for a Speech to Speech (S2S) translation system when auxiliary corpora are of different genre and domain. The proposed methods are based on centroid similarity, n-gram ratios and resampled language models. We show how these methods can be used to select out of domain textual data such as newswire text to improve a S2S system. We were able to achieve an overall relative improvement of 3.8% in BLEU score over a baseline system that uses only in-domain conversational data.


conference of the international speech communication association | 2008

Intonational Phrases for Speech Summarization

Sameer Maskey; Andrew Rosenberg; Julia Hirschberg

Extractive speech summarization approaches select relevant segments of spoken documents and concatenate them to generate a summary. The extraction unit chosen, whether a sentence, syntactic constituent, or other segment, has a significant impact on the overall quality and fluency of the summary. Even though sentences tend to be the choice of most the extractive speech summarizers, in this paper, we present the results of an empirical study indicating that intonational phrases are better units of extraction for summarization. Our study compared four types of input segmentation: sentences, two pause-based segmentation, and intonational phrases (IP). We found that IPs are the best candidates for extractive summarization, improving over the second highest-performing approach, sentence-based summarization, by 8.2% F-measure.


conference of the international speech communication association | 2006

A Phrase-Level Machine Translation Approach For Disfluency Detection Using Weighted Finite State Transducers

Sameer Maskey; Bowen Zhou; Yuqing Gao

We propose a novel algorithm to detect disfluency in speech by reformulating the problem as phrase-level statistical machine translation using weighted finite state transducers. We approach the task as translation of noisy speech to clean speech. We simplify our translation framework such that it does not require fertility and alignment models. We tested our model on the Switchboard disfluency-annotated corpus. Using an optimized decoder that is developed for phrase-based translation at IBM, we are able to detect repeats, repairs and filled pauses for more than a thousand sentences in less than a second with encouraging results.


conference of the international speech communication association | 2006

Soundbite Detection in Broadcast News Domain

Julia Hirschberg; Sameer Maskey

In this paper, we present results of a study designed to identify SOUNDBITES in Broadcast News. We describe a Conditional Random Field-based model for the detection of these included speech segments uttered by individuals who are interviewed or who are the subject of a news story. Our goal is to identify direct quotations in spoken corpora which can be directly attributable to particular individuals, as well as to associate these soundbites with their speakers. We frame soundbite detection as a binary classification problem in which each turn is categorized either as a soundbite or not. We use lexical, acoustic/prosodic and structural features on a turn level to train a CRF. We performed a 10-fold cross validation experiment in which we obtained an accuracy of 67.4% and an Fmeasure of 0.566 which is 20.9% and 38.6% higher than a chance baseline.

Collaboration


Dive into the Sameer Maskey's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

David K. Park

George Washington University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge