Salim Roukos | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Salim Roukos is active.

Explore More

Publication

Featured researches published by Salim Roukos.

meeting of the association for computational linguistics | 2002

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni; Salim Roukos; Todd Ward; Wei-Jing Zhu

Human evaluations of machine translation are extensive but expensive. Human evaluations can take months to finish and involve human labor that can not be reused. We propose a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run. We present this method as an automated understudy to skilled human judges which substitutes for them when there is need for quick or frequent evaluations.

human language technology | 1991

Procedure for quantitatively comparing the syntactic coverage of English grammars

Steven P. Abney; S. Flickenger; Claudia Gdaniec; C. Grishman; Philip Harrison; Donald Hindle; Robert Ingria; Frederick Jelinek; Judith L. Klavans; Mark Liberman; Mitchell P. Marcus; Salim Roukos; Beatrice Santorini; Tomek Strzalkowski; Ezra Black

The problem of quantitatively comparing the performance of different broad-coverage grammars of English has to date resisted solution. Prima facie, known English grammars appear to disagree strongly with each other as to the elements of even the simplest sentences. For instance, the grammars of Steve Abney (Bellcore), Ezra Black (IBM), Dan Flickinger (Hewlett Packard), Claudia Gdaniec (Logos), Ralph Grishman and Tomek Strzalkowski (NYU), Phil Harrison (Boeing), Don Hindle (AT&T), Bob Ingria (BBN), and Mitch Marcus (U. of Pennsylvania) recognize in common only the following constituents, when each grammarian provides the single parse which he/she would ideally want his/her grammar to specify for three sample Brown Corpus sentences:The famed Yankee Clipper, now retired, has been assisting (as (a batting coach)).One of those capital-gains ventures, in fact, has saddled him (with Gore Court).He said this constituted a (very serious) misuse (of the (Criminal court) processes).

international conference on acoustics speech and signal processing | 1996

Statistical natural language understanding using hidden clumpings

Mark E. Epstein; Kishore Papineni; Salim Roukos; Todd Ward; S. Della Pietra

We present a new approach to natural language understanding (NLU) based on the source-channel paradigm, and apply it to ARPAs Air Travel Information Service (ATIS) domain. The model uses techniques similar to those used by IBM in statistical machine translation. The parameters are trained using the exact match algorithm; a hierarchy of models is used to facilitate the bootstrapping of more complex models from simpler models.

international conference on acoustics, speech, and signal processing | 1993

Trigger-based language models: a maximum entropy approach

Raymond Lau; Ronald Rosenfeld; Salim Roukos

Ongoing efforts at adaptive statistical language modeling are described. To extract information from the document history, trigger pairs are used as the basic information-bearing elements. To combine statistical evidence from multiple triggers, the principle of maximum entropy (ME) is used. To combine the trigger-based model with the static model, the latter is absorbed into the ME formalism. Given consistent statistical evidence, a unique ME solution is guaranteed to exist, and an iterative algorithm exists which is guaranteed to converge to it. Among the advantages of this approach are its simplicity, generality, and incremental nature. Among its disadvantages are its computational requirements. The model described here was trained on five million words of Wall Street Journal text. It used some 40000 unigram constraints, 200000 bigram constraints, 200000 trigram constraints, and 60000 trigger constraints. After 13 iterations, it produced a language model whose perplexity was 12% lower than that of a conventional trigram, as measured on independent data.<<ETX>>

human language technology | 1994

A maximum entropy model for prepositional phrase attachment

Adwait Ratnaparkhi; Jeffrey C. Reynar; Salim Roukos

A parser for natural language must often choose between two or more equally grammatical parses for the same sentence. Often the correct parse can be determined from the lexical properties of certain key words or from the context in which the sentence occurs. For example in the sentence.

meeting of the association for computational linguistics | 1993

Towards History-based Grammars: Using Richer Models for Probabilistic Parsing

Ezra Black; Frederick Jelinek; John Lafrerty; David M. Magerman; Robert L. Mercer; Salim Roukos

We describe a generative probabilistic model of natural language, which we call HBG, that takes advantage of detailed linguistic information to resolve ambiguity. HBG incorporates lexical, syntactic, semantic, and structural information from the parse tree into the disambiguation process in a novel way. We use a corpus of bracketed sentences, called a Treebank, in combination with decision tree building to tease out the relevant aspects of a parse tree that will determine the correct parse of a sentence. This stands in contrast to the usual approach of further grammar tailoring via the usual linguistic introspection in the hope of generating the correct parse. In head-to-head tests against one of the best existing robust probabilistic parsing models, which we call P-CFG, the HBG model significantly outperforms P-CFG, increasing the parsing accuracy rate from 60% to 75%, a 37% reduction in error.

international acm sigir conference on research and development in information retrieval | 2003

Challenges in information retrieval and language modeling: report of a workshop held at the center for intelligent information retrieval, University of Massachusetts Amherst, September 2002

James Allan; Jay Aslam; Nicholas J. Belkin; Chris Buckley; James P. Callan; W. Bruce Croft; Susan T. Dumais; Norbert Fuhr; Donna Harman; David J. Harper; Djoerd Hiemstra; Thomas Hofmann; Eduard H. Hovy; Wessel Kraaij; John D. Lafferty; Victor Lavrenko; David Lewis; Liz Liddy; R. Manmatha; Andrew McCallum; Jay M. Ponte; John M. Prager; Dragomir R. Radev; Philip Resnik; Stephen E. Robertson; Ron G. Rosenfeld; Salim Roukos; Mark Sanderson; Richard M. Schwartz; Amit Singhal

Information retrieval (IR) research has reached a point where it is appropriate to assess progress and to define a research agenda for the next five to ten years. This report summarizes a discussion of IR research challenges that took place at a recent workshop. The attendees of the workshop considered information retrieval research in a range of areas chosen to give broad coverage of topic areas that engage information retrieval researchers. Those areas are retrieval models, cross-lingual retrieval, Web search, user modeling, filtering, topic detection and tracking, classification, summarization, question answering, metasearch, distributed retrieval, multimedia retrieval, information extraction, as well as testbed requirements for future work. The potential use of language modeling techniques in these areas was also discussed. The workshop identified major challenges within each of those areas. The following are recurring themes that ran throughout: • User and context sensitive retrieval • Multi-lingual and multi-media issues • Better target tasks • Improved objective evaluations • Substantially more labeled data • Greater variety of data sources • Improved formal models Contextual retrieval and global information access were identified as particularly important long-term challenges.

meeting of the association for computational linguistics | 2004

A Mention-Synchronous Coreference Resolution Algorithm Based On the Bell Tree

Xiaoqiang Luo; Abraham Ittycheriah; Hongyan Jing; Nanda Kambhatla; Salim Roukos

This paper proposes a new approach for coreference resolution which uses the Bell tree to represent the search space and casts the coreference resolution problem as finding the best path from the root of the Bell tree to the leaf nodes. A Maximum Entropy model is used to rank these paths. The coreference performance on the 2002 and 2003 Automatic Content Extraction (ACE) data will be reported. We also train a coreference system using the MUC6 data and competitive results are obtained.

international conference on acoustics, speech, and signal processing | 1995

Performance of the IBM large vocabulary continuous speech recognition system on the ARPA Wall Street Journal task

Lalit R. Bahl; S. Balakrishnan-Aiyer; J.R. Bellgarda; Martin Franz; Ponani S. Gopalakrishnan; David Nahamoo; Miroslav Novak; Mukund Padmanabhan; Michael Picheny; Salim Roukos

In this paper we discuss various experimental results using our continuous speech recognition system on the Wall Street Journal task. Experiments with different feature extraction methods, varying amounts and type of training data, and different vocabulary sizes are reported.

Journal of the Acoustical Society of America | 2002

Phrase splicing and variable substitution using a trainable speech synthesizer

Robert E. Donovan; Martin Franz; Salim Roukos; Jeffrey S. Sorensen

In accordance with the present invention, a method for providing generation of speech includes the steps of providing input to be acoustically produced, comparing the input to training data or application specific splice files to identify one of words and word sequences corresponding to the input for constructing a phone sequence, using a search algorithm to identify a segment sequence to construct output speech according to the phone sequence and concatenating segments and modifying characteristics of the segments to be substantially equal to requested characteristics. Application specific data is advantageously used to make pertinent information available to synthesize both the phone sequence and the output speech. Also, described is a system for performing operations in accordance with the disclosure.

Explore More