Roxana Angheluta
Katholieke Universiteit Leuven
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Roxana Angheluta.
Information Processing and Management | 2005
Marie-Francine Moens; Roxana Angheluta; Jos Dumortier
The technologies for single- and multi-document summarization that are described and evaluated in this article can be used on heterogeneous texts for different summarization tasks. They refer to the extraction of important sentences from the documents, compressing the sentences to their essential or relevant content, and detecting redundant content across sentences. The technologies are tested at the Document Understanding Conference, organized by the National Institute of Standards and Technology, USA in 2002 and 2003. The system obtained good to very good results in this competition. We tested our summarization system also on a variety of English Encyclopedia texts and on Dutch magazine articles. The results show that relying on generic linguistic resources and statistical techniques offer a basis for text summarization.
international conference on computational linguistics | 2006
Ted Pedersen; Anagha Kulkarni; Roxana Angheluta; Zornitsa Kozareva; Thamar Solorio
Previous work by Pedersen, Purandare and Kulkarni (2005) has resulted in an unsupervised method of name discrimination that represents the context in which an ambiguous name occurs using second order co–occurrence features. These contexts are then clustered in order to identify which are associated with different underlying named entities. It also extracts descriptive and discriminating bigrams from each of the discovered clusters in order to serve as identifying labels. These methods have been shown to perform well with English text, although we believe them to be language independent since they rely on lexical features and use no syntactic features or external knowledge sources. In this paper we apply this methodology in exactly the same way to Bulgarian, English, Romanian, and Spanish corpora. We find that it attains discrimination accuracy that is consistently well above that of a majority classifier, thus providing support for the hypothesis that the method is language independent.
workshop on graph based methods for natural language processing | 2006
Marie-Francine Moens; Patrick Jeuniaux; Roxana Angheluta; Rudradeb Mitra
In many information retrieval and selection tasks it is valuable to score how much a text is about a certain entity and to compute how much the text discusses the entity with respect to a certain viewpoint. In this paper we are interested in giving an aboutness score to a text, when the input query is a person name and we want to measure the aboutness with respect to the biographical data of that person. We present a graph-based algorithm and compare its results with other approaches.
international conference on artificial intelligence and law | 2003
Marie-Francine Moens; Roxana Angheluta
Effective retrieval of court decisions is important. Automatically identifying legal concepts in the decision texts would be very helpful. In this paper we investigate how a statistics for hypothesis testing, i.e., the likelihood ratio, can help in this task. We describe how this statistic can be used for detecting important multi-term phrases in the case texts, how it can be used to find correlated terms, and how it is a means for feature or topic signature selection in automated case categorization. The technology has been tested upon more than 600 US cases.
international conference on computational linguistics | 2002
Rik De Busser; Roxana Angheluta; Marie-Francine Moens
If information extraction wants to make its results more accurate, it will have to resort increasingly to a coherent implementation of natural language semantics. In this paper, we will focus on the extraction of semantic case roles from texts. After setting the essential theoretical framework, we will argue that it is possible to detect case roles on the basis of morphosyntactic and lexical surface phenomena. We will give a concise overview of our methodology and of a preliminary test that seems to confirm our hypotheses.
CrossLangInduction '06 Proceedings of the International Workshop on Cross-Language Knowledge Induction | 2006
Ted Pedersen; Anagha Kulkarni; Zornitsa Kozareva; Roxana Angheluta; Thamar Solorio
This paper describes a method of discriminating ambiguous names that relies upon features found in corpora of a more abundant language. In particular, we discriminate ambiguous names in Bulgarian, Romanian, and Spanish corpora using information derived from much larger quantities of English data. We also mix together occurrences of the ambiguous name found in English with the occurrences of the name in the language in which we are trying to discriminate. We refer to this as a language salad, and find that it often results in even better performance than when only using English or the language itself as the source of information for discrimination.
Archive | 2003
Marie-Francine Moens; Roxana Angheluta; Rik De Busser
Summaries of texts found on the World Wide Web are valuable. They help the user of a search engine to select information and are an aid for processing the vast amount of information found on the Web. This chapter describes the technologies that can be applied for summarizing the texts of Web pages. The focus is on technologies that currently generate the best results and are suited for the specific heterogeneous environment that makes up the World Wide Web. This chapter gives an overview of generic, query-biased and task-specific summarization, as well as single-document and multi-document summarization. Among the technologies that are discussed are semantic frame technologies, rhetorical structure analysis, learning discourse patterns, techniques relying upon lexical cohesion, and text clustering.
european conference on information retrieval | 2007
Roxana Angheluta; Marie-Francine Moens
The main focus of current work is to analyze useful features for linking and disambiguating person entities across documents. The more general problem of linking and disambiguating any kind of entity is known as entity detection and tracking (EDT) or noun phrase coreference resolution. EDT has applications in many important areas of information retrieval: clustering results in search engines when looking for a particular person; possibility to answer questions such as “Who was Woodward’s source in the Plame scandal?” with “senior administration official” or “Richard Armitage” and information fusion from multiple documents. In current work person entities are limited to names and nominal entities. We emphasize the linguistic aspect of cross-document EDT: testing novel features useful in EDT across documents, such as the syntactic and semantic characteristics of the entities. The most important class of new features are contextual features, at varying levels of detail: events, related named-entities, and local context. The validity of the features is evaluated on a corpus annotated for cross-document coreference resolution of person names and nominals, and also on a corpus annotated only for names.
meeting of the association for computational linguistics | 2002
Marie-Francine Moens; Rik De Busser; Roxana Angheluta
Journal of Digital Information Management | 2005
Maria Biryukov; Roxana Angheluta; Marie-Francine Moens