Roxana Angheluta | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Roxana Angheluta is active.

Explore More

Publication

Featured researches published by Roxana Angheluta.

Information Processing and Management | 2005

Generic technologies for single- and multi-document summarization

Marie-Francine Moens; Roxana Angheluta; Jos Dumortier

The technologies for single- and multi-document summarization that are described and evaluated in this article can be used on heterogeneous texts for different summarization tasks. They refer to the extraction of important sentences from the documents, compressing the sentences to their essential or relevant content, and detecting redundant content across sentences. The technologies are tested at the Document Understanding Conference, organized by the National Institute of Standards and Technology, USA in 2002 and 2003. The system obtained good to very good results in this competition. We tested our summarization system also on a variety of English Encyclopedia texts and on Dutch magazine articles. The results show that relying on generic linguistic resources and statistical techniques offer a basis for text summarization.

international conference on computational linguistics | 2006

An unsupervised language independent method of name discrimination using second order co-occurrence features

Ted Pedersen; Anagha Kulkarni; Roxana Angheluta; Zornitsa Kozareva; Thamar Solorio

Previous work by Pedersen, Purandare and Kulkarni (2005) has resulted in an unsupervised method of name discrimination that represents the context in which an ambiguous name occurs using second order co–occurrence features. These contexts are then clustered in order to identify which are associated with different underlying named entities. It also extracts descriptive and discriminating bigrams from each of the discovered clusters in order to serve as identifying labels. These methods have been shown to perform well with English text, although we believe them to be language independent since they rely on lexical features and use no syntactic features or external knowledge sources. In this paper we apply this methodology in exactly the same way to Bulgarian, English, Romanian, and Spanish corpora. We find that it attains discrimination accuracy that is consistently well above that of a majority classifier, thus providing support for the hypothesis that the method is language independent.

workshop on graph based methods for natural language processing | 2006

Measuring Aboutness of an Entity in a Text

Marie-Francine Moens; Patrick Jeuniaux; Roxana Angheluta; Rudradeb Mitra

In many information retrieval and selection tasks it is valuable to score how much a text is about a certain entity and to compute how much the text discusses the entity with respect to a certain viewpoint. In this paper we are interested in giving an aboutness score to a text, when the input query is a person name and we want to measure the aboutness with respect to the biographical data of that person. We present a graph-based algorithm and compare its results with other approaches.

international conference on artificial intelligence and law | 2003

Concept extraction from legal cases: the use of a statistic of coincidence

Marie-Francine Moens; Roxana Angheluta

Effective retrieval of court decisions is important. Automatically identifying legal concepts in the decision texts would be very helpful. In this paper we investigate how a statistics for hypothesis testing, i.e., the likelihood ratio, can help in this task. We describe how this statistic can be used for detecting important multi-term phrases in the case texts, how it can be used to find correlated terms, and how it is a means for feature or topic signature selection in automated case categorization. The technology has been tested upon more than 600 US cases.

international conference on computational linguistics | 2002

Semantic case role detection for information extraction

Rik De Busser; Roxana Angheluta; Marie-Francine Moens

If information extraction wants to make its results more accurate, it will have to resort increasingly to a coherent implementation of natural language semantics. In this paper, we will focus on the extraction of semantic case roles from texts. After setting the essential theoretical framework, we will argue that it is possible to detect case roles on the basis of morphosyntactic and lexical surface phenomena. We will give a concise overview of our methodology and of a preliminary test that seems to confirm our hypotheses.

CrossLangInduction '06 Proceedings of the International Workshop on Cross-Language Knowledge Induction | 2006

Improving name discrimination: a language salad approach

Ted Pedersen; Anagha Kulkarni; Zornitsa Kozareva; Roxana Angheluta; Thamar Solorio

This paper describes a method of discriminating ambiguous names that relies upon features found in corpora of a more abundant language. In particular, we discriminate ambiguous names in Bulgarian, Romanian, and Spanish corpora using information derived from much larger quantities of English data. We also mix together occurrences of the ambiguous name found in English with the occurrences of the name in the language in which we are trying to discriminate. We refer to this as a language salad, and find that it often results in even better performance than when only using English or the language itself as the source of information for discrimination.

Archive | 2003

Summarization of texts found on the world wide web

Marie-Francine Moens; Roxana Angheluta; Rik De Busser

Summaries of texts found on the World Wide Web are valuable. They help the user of a search engine to select information and are an aid for processing the vast amount of information found on the Web. This chapter describes the technologies that can be applied for summarizing the texts of Web pages. The focus is on technologies that currently generate the best results and are suited for the specific heterogeneous environment that makes up the World Wide Web. This chapter gives an overview of generic, query-biased and task-specific summarization, as well as single-document and multi-document summarization. Among the technologies that are discussed are semantic frame technologies, rhetorical structure analysis, learning discourse patterns, techniques relying upon lexical cohesion, and text clustering.

european conference on information retrieval | 2007

Cross-document entity tracking

Roxana Angheluta; Marie-Francine Moens

The main focus of current work is to analyze useful features for linking and disambiguating person entities across documents. The more general problem of linking and disambiguating any kind of entity is known as entity detection and tracking (EDT) or noun phrase coreference resolution. EDT has applications in many important areas of information retrieval: clustering results in search engines when looking for a particular person; possibility to answer questions such as “Who was Woodward’s source in the Plame scandal?” with “senior administration official” or “Richard Armitage” and information fusion from multiple documents. In current work person entities are limited to names and nominal entities. We emphasize the linguistic aspect of cross-document EDT: testing novel features useful in EDT across documents, such as the syntactic and semantic characteristics of the entities. The most important class of new features are contextual features, at varying levels of detail: events, related named-entities, and local context. The validity of the features is evaluated on a corpus annotated for cross-document coreference resolution of person names and nominals, and also on a corpus annotated only for names.

meeting of the association for computational linguistics | 2002