Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Dragos Stefan Munteanu is active.

Publication


Featured researches published by Dragos Stefan Munteanu.


meeting of the association for computational linguistics | 2006

Extracting Parallel Sub-Sentential Fragments from Non-Parallel Corpora

Dragos Stefan Munteanu; Daniel Marcu

We present a novel method for extracting parallel sub-sentential fragments from comparable, non-parallel bilingual corpora. By analyzing potentially similar sentence pairs using a signal processing-inspired approach, we detect which segments of the source sentence are translated into segments in the target sentence, and which are not. This method enables us to extract useful machine translation training data even from very non-parallel corpora, which contain no parallel sentence pairs. We evaluate the quality of the extracted data by showing that it improves the performance of a state-of-the-art statistical machine translation system.


language and technology conference | 2006

ParaEval: Using Paraphrases to Evaluate Summaries Automatically

Liang Zhou; Chin-Yew Lin; Dragos Stefan Munteanu; Eduard H. Hovy

ParaEval is an automated evaluation method for comparing reference and peer summaries. It facilitates a tiered-comparison strategy where recall-oriented global optimal and local greedy searches for paraphrase matching are enabled in the top tiers. We utilize a domain-independent paraphrase table extracted from a large bilingual parallel corpus using methods from Machine Translation (MT). We show that the quality of ParaEvals evaluations, measured by correlating with human judgments, closely resembles that of ROUGEs.


empirical methods in natural language processing | 2002

Processing Comparable Corpora With Bilingual Suffix Trees

Dragos Stefan Munteanu; Daniel Marcu

We introduce Bilingual Suffix Trees (BST), a data structure that is suitable for exploiting comparable corpora. We discuss algorithms that use BSTs in order to create parallel corpora and learn translations of unseen words from comparable corpora. Starting with a small bilingual dictionary that was derived automatically from a corpus of 5.000 parallel sentences, we have automatically extracted a corpus of 33.926 parallel phrases of size greater than 3, and learned 9 new word translations from a comparable corpus of 1.3M words (100.000 sentences).


Archive | 2006

Exploiting comparable corpora

Daniel Marcu; Dragos Stefan Munteanu

One of the major bottlenecks in the development of Statistical Machine Translation systems for most language pairs is the lack of bilingual parallel training data. Currently available parallel corpora span relatively few language pairs and very few domains; building new ones of sufficiently large size and high quality is time-consuming and expensive. In this thesis, I propose methods that enable automatic creation of parallel corpora by exploiting a rich, diverse, and readily available resource: comparable corpora. Comparable corpora are bilingual texts that, while not parallel in the strict sense, are somewhat related and convey overlapping information. Such texts exist in large quantities on the Web; a good example are the multilingual news feeds produced by news agencies such as Agence France Presse, CNN, and BBC. I present novel methods for extracting parallel data of good quality from such comparable collections. I show how to detect parallelism at various granularity levels, and thus find parallel documents (if there are any in the collection), parallel sentences, and parallel sub-sentential fragments. In order to demonstrate the validity of this approach, I use my method to extract data from large-scale comparable corpora for various language pairs, and show that the extracted data helps improve the end-to-end performance of a state-of-the art machine translation system.


Archive | 2003

Constructing a translation lexicon from comparable, non-parallel corpora

Daniel Marcu; Kevin Knight; Dragos Stefan Munteanu; Philipp Koehn


north american chapter of the association for computational linguistics | 2004

Improved Machine Translation Performance via Parallel Sentence Extraction from Comparable Corpora.

Dragos Stefan Munteanu; Alexander M. Fraser; Daniel Marcu


Archive | 2005

Discovery of parallel text portions in comparable collections of corpora and training using comparable texts

Daniel Marcu; Dragos Stefan Munteanu


Archive | 2006

Systems and methods for identifying parallel documents and sentence fragments in multilingual document collections

Daniel Marcu; Dragos Stefan Munteanu


Transactions of the Association for Computational Linguistics | 2013

Measuring Machine Translation Errors in New Domains

Ann Irvine; John W. Morgan; Marine Carpuat; Hal Daumé; Dragos Stefan Munteanu


Archive | 2009

Building a translation lexicon from comparable, non-parallel corpora

Daniel Marcu; Kevin Knight; Dragos Stefan Munteanu; Philipp Koehn

Collaboration


Dive into the Dragos Stefan Munteanu's collaboration.

Top Co-Authors

Avatar

Daniel Marcu

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Kevin Knight

University of Southern California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Abdessamad Echihabi

University of Southern California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Eduard H. Hovy

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Liang Zhou

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Radu Soricut

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Ann Irvine

Johns Hopkins University

View shared research outputs
Top Co-Authors

Avatar

Marine Carpuat

Hong Kong University of Science and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge