Andras Csomai
University of North Texas
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Andras Csomai.
conference on information and knowledge management | 2007
Rada Mihalcea; Andras Csomai
This paper introduces the use of Wikipedia as a resource for automatic keyword extraction and word sense disambiguation, and shows how this online encyclopedia can be used to achieve state-of-the-art results on both these tasks. The paper also shows how the two methods can be combined into a system able to automatically enrich a text with links to encyclopedic knowledge. Given an input document, the system identifies the important concepts in the text and automatically links these concepts to the corresponding Wikipedia pages. Evaluations of the system show that the automatic annotations are reliable and hardly distinguishable from manual annotations.
IEEE Intelligent Systems | 2008
Andras Csomai; Rada Mihalcea
Wikipedia has become one of the largest online repositories of encyclopedic knowledge. Wikipedia editions are available for more than 200 languages, with entries varying from a few pages to more than 1 million articles per language. Embedded in each Wikipedia article is an abundance of links connecting the most important words or phrases in the text to other pages, thereby letting users quickly access additional information. An automatic text-annotation system combines keyword extraction and word-sense disambiguation to identify relevant links to Wikipedia pages.
meeting of the association for computational linguistics | 2005
Rada Mihalcea; Andras Csomai
This paper describes SENSELEARNER --- a minimally supervised word sense disambiguation system that attempts to disambiguate all content words in a text using WordNet senses. We evaluate the accuracy of SENSELEARNER on several standard sense-annotated data sets, and show that it compares favorably with the best results reported during the recent SENSEVAL evaluations.
meeting of the association for computational linguistics | 2007
Samer Hassan; Andras Csomai; Carmen Banea; Ravi Som Sinha; Rada Mihalcea
This paper describes the University of North Texas SubFinder system. The system is able to provide the most likely set of substitutes for a word in a given context, by combining several techniques and knowledge sources. SubFinder has successfully participated in the best and out of ten (oot) tracks in the SemEval lexical substitution task, consistently ranking in the first or second place.
meeting of the association for computational linguistics | 2007
Rada Mihalcea; Andras Csomai; Massimiliano Ciaramita
We describe the SuperSenseLearner system that participated in the English all-words disambiguation task. The system relies on automatically-learned semantic models using collocational features coupled with features extracted from the annotations of coarse-grained semantic categories generated by an HMM tagger.
international conference on computational linguistics | 2006
Andras Csomai; Rada Mihalcea
The automatic generation of back-of-the book indexes seems to be out of sight of the Information Retrieval and Natural Language Processing communities, although the increasingly large number of books available in electronic format, as well as recent advances in keyphrase extraction, should motivate an increased interest in this topic. In this paper, we describe the background relevant to the process of creating back-of-the-book indexes, namely (1) a short overview of the origin and structure of back-of-the-book indexes, and (2) the correspondence that can be established between techniques for automatic index construction and keyphrase extraction. Since the development of any automatic system requires in the first place an evaluation testbed, we describe our work in building a gold standard collection of books and indexes, and we present several metrics that can be used for the evaluation of automatically generated indexes against the gold standard. Finally, we investigate the properties of the gold standard index, such as index size, length of index entries, and upper bounds on coverage as indicated by the presence of index entries in the document.
meeting of the association for computational linguistics | 2008
Andras Csomai; Rada Mihalcea
artificial intelligence in education | 2007
Andras Csomai; Rada Mihalcea
the florida ai research society | 2007
Andras Csomai; Rada Mihalcea
recent advances in natural language processing | 2005
Courtney D. Corley; Andras Csomai; Rada Mihalcea