Aurora Pons-Porrata | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Aurora Pons-Porrata is active.

Explore More

Publication

Featured researches published by Aurora Pons-Porrata.

Information Processing and Management | 2007

Topic discovery based on text mining techniques

Aurora Pons-Porrata; Rafael Berlanga-Llavori; José Ruiz-Shulcloper

In this paper, we present a topic discovery system aimed to reveal the implicit knowledge present in news streams. This knowledge is expressed as a hierarchy of topic/subtopics, where each topic contains the set of documents that are related to it and a summary extracted from these documents. Summaries so built are useful to browse and select topics of interest from the generated hierarchies. Our proposal consists of a new incremental hierarchical clustering algorithm, which combines both partitional and agglomerative approaches, taking the main benefits from them. Finally, a new summarization method based on Testor Theory has been proposed to build the topic summaries. Experimental results in the TDT2 collection demonstrate its usefulness and effectiveness not only as a topic detection system, but also as a classification and summarization tool.

Pattern Recognition Letters | 2010

Dynamic hierarchical algorithms for document clustering

Reynaldo Gil-García; Aurora Pons-Porrata

In this paper, two clustering algorithms called dynamic hierarchical compact and dynamic hierarchical star are presented. Both methods aim to construct a cluster hierarchy, dealing with dynamic data sets. The first creates disjoint hierarchies of clusters, while the second obtains overlapped hierarchies. The experimental results on several benchmark text collections show that these methods not only are suitable for producing hierarchical clustering solutions in dynamic environments effectively and efficiently, but also offer hierarchies easier to browse than traditional algorithms. Therefore, we advocate its use for tasks that require dynamic clustering, such as information organization, creation of document taxonomies and hierarchical topic detection.

iberoamerican congress on pattern recognition | 2003

Extended Star Clustering Algorithm

Reynaldo Gil-García; José Manuel Badía-Contelles; Aurora Pons-Porrata

In this paper we propose the extended star clustering algorithm and compare it with the original star clustering algorithm. We introduce a new concept of star and as a consequence, we obtain different star-shaped clusters. The evaluation experiments on TREC data, show that the proposed algorithm outperforms the original algorithm. Our algorithm is independent of the data order and obtains a smaller number of clusters.

Pattern Recognition Letters | 2010

A document clustering algorithm for discovering and describing topics

Henry Anaya-Sánchez; Aurora Pons-Porrata; Rafael Berlanga-Llavori

In this paper, we introduce a new clustering algorithm for discovering and describing the topics comprised in a text collection. Our proposal relies on both the most probable term pairs generated from the collection and the estimation of the topic homogeneity associated to these pairs. Topics and their descriptions are generated from those term pairs whose support sets are homogeneous enough for representing collection topics. Experimental results obtained over three benchmark text collections demonstrate the effectiveness and utility of this new approach.

iberoamerican congress on pattern recognition | 2009

BR: A New Method for Computing All Typical Testors

Alexsey Lias-Rodríguez; Aurora Pons-Porrata

Typical testors are very useful in Pattern Recognition, especially for Feature Selection problems. The complexity of computing all typical testors of a training matrix has an exponential growth with respect to the number of features. Several methods that speed up the calculation of the set of all typical testors have been developed, but nowadays, there are still problems where this set is impossible to find. With this aim, a new external scale algorithm BR is proposed. The experimental results demonstrate that this method clearly outperforms the two best algorithms reported in the literature.

iberoamerican congress on pattern recognition | 2003

A Method for the Automatic Summarization of Topic-Based Clusters of Documents

Aurora Pons-Porrata; José Ruiz-Shulcloper; Rafael Berlanga-Llavori

In this paper we propose an effective method to summarize document clusters. This method is based on the Testor Theory, and it is applied to a group of newspaper articles in order to summarize the events that they describe. This method is also applicable to either a very large document collection or a very large document, in order to identify the main themes (topics) of the collection (documents) and to summarize them. The results obtained in the experiments demonstrate the usefulness of the proposed method.

iberoamerican congress on pattern recognition | 2008

A New Document Clustering Algorithm for Topic Discovering and Labeling

Henry Anaya-Sánchez; Aurora Pons-Porrata; Rafael Berlanga-Llavori

In this paper, we introduce a new clustering algorithm for obtaining labeled document clusters that accurately identify the topics of a text collection. In order to determine the topics, our approach relies on both probable term pairs generated from the collection and the estimation of the topic homogeneity associated to term pair clusters. Experimental results obtained over two benchmark text collections demonstrate the utility of this new approach.

iberoamerican congress on pattern recognition | 2007

Using typical testors for feature selection in text categorization

Aurora Pons-Porrata; Reynaldo Gil-García; Rafael Berlanga-Llavori

A major difficulty of text categorization problems is the high dimensionality of the feature space. Thus, feature selection is often performed in order to increase both the efficiency and effectiveness of the classification. In this paper, we propose a feature selection method based on Testor Theory. This criterion takes into account inter-feature relationships. We experimentally compared our method with the widely used information gain using two well-known classification algorithms: k-nearest neighbour and Support Vector Machine. Two benchmark text collections were chosen as the testbeds: Reuters- 21578 and Reuters Corpus Version 1 (RCV1-v2). We found that our method consistently outperformed information gain for both classifiers and both data collections, especially when aggressive feature selection is carried out.

european conference on information retrieval | 2003

Building a hierarchy of events and topics for newspaper digital libraries

Aurora Pons-Porrata; Rafael Berlanga-Llavori; José Ruiz-Shulcloper

In this paper we propose an incremental hierarchical clustering algorithm for on-line event detection. This algorithm is applied to a set of newspaper articles in order to discover the structure of topics and events that they describe. In the first level, articles with a high temporal-semantic similarity are clustered together into events. In the next levels of the hierarchy, these events are successively clustered so that composite events and topics can be discovered. The results obtained for the F1-measure and the Detection Cost demonstrate the validity of our algorithm for on-line event detection tasks.

iberoamerican congress on pattern recognition | 2008

Hierarchical Star Clustering Algorithm for Dynamic Document Collections

Reynaldo Gil-García; Aurora Pons-Porrata

In this paper, a new clustering algorithm called DynamicHierarchical Staris introduced. Our approach aims to construct a hierarchy of overlapped clusters, dealing with dynamic data sets. The experimental results on several benchmark text collections show that this method obtains smaller hierarchies than traditional algorithms while achieving a similar clustering quality. Therefore, we advocate its use for tasks that require dynamic overlapped clustering, such as information organization, creation of document taxonomies and hierarchical topic detection.

Explore More