Andras Csomai | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andras Csomai is active.

Explore More

Publication

Featured researches published by Andras Csomai.

conference on information and knowledge management | 2007

Wikify!: linking documents to encyclopedic knowledge

Rada Mihalcea; Andras Csomai

This paper introduces the use of Wikipedia as a resource for automatic keyword extraction and word sense disambiguation, and shows how this online encyclopedia can be used to achieve state-of-the-art results on both these tasks. The paper also shows how the two methods can be combined into a system able to automatically enrich a text with links to encyclopedic knowledge. Given an input document, the system identifies the important concepts in the text and automatically links these concepts to the corresponding Wikipedia pages. Evaluations of the system show that the automatic annotations are reliable and hardly distinguishable from manual annotations.

IEEE Intelligent Systems | 2008

Linking Documents to Encyclopedic Knowledge

Andras Csomai; Rada Mihalcea

Wikipedia has become one of the largest online repositories of encyclopedic knowledge. Wikipedia editions are available for more than 200 languages, with entries varying from a few pages to more than 1 million articles per language. Embedded in each Wikipedia article is an abundance of links connecting the most important words or phrases in the text to other pages, thereby letting users quickly access additional information. An automatic text-annotation system combines keyword extraction and word-sense disambiguation to identify relevant links to Wikipedia pages.

meeting of the association for computational linguistics | 2005

SenseLearner: Word Sense Disambiguation for All Words in Unrestricted Text

Rada Mihalcea; Andras Csomai

This paper describes SENSELEARNER --- a minimally supervised word sense disambiguation system that attempts to disambiguate all content words in a text using WordNet senses. We evaluate the accuracy of SENSELEARNER on several standard sense-annotated data sets, and show that it compares favorably with the best results reported during the recent SENSEVAL evaluations.

meeting of the association for computational linguistics | 2007

UNT: SubFinder: Combining Knowledge Sources for Automatic Lexical Substitution

Samer Hassan; Andras Csomai; Carmen Banea; Ravi Som Sinha; Rada Mihalcea

This paper describes the University of North Texas SubFinder system. The system is able to provide the most likely set of substitutes for a word in a given context, by combining several techniques and knowledge sources. SubFinder has successfully participated in the best and out of ten (oot) tracks in the SemEval lexical substitution task, consistently ranking in the first or second place.

meeting of the association for computational linguistics | 2007

UNT-Yahoo: SuperSenseLearner: Combining SenseLearner with SuperSense and other Coarse Semantic Features

Rada Mihalcea; Andras Csomai; Massimiliano Ciaramita

We describe the SuperSenseLearner system that participated in the English all-words disambiguation task. The system relies on automatically-learned semantic models using collocational features coupled with features extracted from the annotations of coarse-grained semantic categories generated by an HMM tagger.

international conference on computational linguistics | 2006

Creating a testbed for the evaluation of automatically generated back-of-the-book indexes

Andras Csomai; Rada Mihalcea

The automatic generation of back-of-the book indexes seems to be out of sight of the Information Retrieval and Natural Language Processing communities, although the increasingly large number of books available in electronic format, as well as recent advances in keyphrase extraction, should motivate an increased interest in this topic. In this paper, we describe the background relevant to the process of creating back-of-the-book indexes, namely (1) a short overview of the origin and structure of back-of-the-book indexes, and (2) the correspondence that can be established between techniques for automatic index construction and keyphrase extraction. Since the development of any automatic system requires in the first place an evaluation testbed, we describe our work in building a gold standard collection of books and indexes, and we present several metrics that can be used for the evaluation of automatically generated indexes against the gold standard. Finally, we investigate the properties of the gold standard index, such as index size, length of index entries, and upper bounds on coverage as indicated by the presence of index entries in the document.

meeting of the association for computational linguistics | 2008