Alberto Pérez García-Plaza

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alberto Pérez García-Plaza is active.

Explore More

Publication

Featured researches published by Alberto Pérez García-Plaza.

advances in social networks analysis and mining | 2009

Content-Based Clustering for Tag Cloud Visualization

Arkaitz Zubiaga; Alberto Pérez García-Plaza; Víctor Fresno; Raquel Martínez

Social tagging systems are becoming an interesting way to retrieve web information from previously annotated data. These sites present a tag cloud made up by the most popular tags, where neither tag grouping nor their corresponding content is considered. We present a methodology to obtain and visualize a cloud of related tags based on the use of self-organizing maps, and where the relations among tags are established taking into account the textual content of tagged documents. Each map unit can be represented by the most relevant terms of the tags it contains, so that it is possible to study and analyze the groups as well as to visualize and navigate through the relevant terms and tags.

Applied Soft Computing | 2012

Learning a taxonomy from a set of text documents

Mari-Sanna Paukkeri; Alberto Pérez García-Plaza; Víctor Fresno; Raquel Martínez Unanue; Timo Honkela

We present a methodology for learning a taxonomy from a set of text documents that each describes one concept. The taxonomy is obtained by clustering the concept definition documents with a hierarchical approach to the Self-Organizing Map. In this study, we compare three different feature extraction approaches with varying degree of language independence. The feature extraction schemes include fuzzy logic-based feature weighting and selection, statistical keyphrase extraction, and the traditional tf-idf weighting scheme. The experiments are conducted for English, Finnish, and Spanish. The results show that while the rule-based fuzzy logic systems have an advantage in automatic taxonomy learning, taxonomies can also be constructed with tolerable results using statistical methods without domain- or style-specific knowledge.

IEEE Transactions on Knowledge and Data Engineering | 2013

Harnessing Folksonomies to Produce a Social Classification of Resources

Arkaitz Zubiaga; Víctor Fresno; Raquel Martínez; Alberto Pérez García-Plaza

In our daily lives, organizing resources like books or webpages into a set of categories to ease future access is a common task. The usual largeness of these collections requires a vast endeavor and an outrageous expense to organize manually. As an approach to effectively produce an automated classification of resources, we consider the immense amounts of annotations provided by users on social tagging systems in the form of bookmarks. In this paper, we deal with the utilization of these user-provided tags to perform a social classification of resources. For this purpose, we have created three large-scale social tagging data sets including tagging data for different types of resources, webpages and books. Those resources are accompanied by categorization data from sound expert-driven taxonomies. We analyze the characteristics of the three social tagging systems and perform an analysis on the usefulness of social tags to perform a social classification of resources that resembles the classification by experts as much as possible. We analyze six different representations using tags and compare to other data sources by using three different settings of SVM classifiers. Finally, we explore combinations of different data sources with tags using classifier committees to best classify the resources.

Expert Systems With Applications | 2012

Reorganizing clouds

Alberto Pérez García-Plaza; Arkaitz Zubiaga; Víctor Fresno; Raquel Martínez

Highlights? Representing tags by co-occurrences yields more accurate clusters than representing them by content. ? Representations based on co-occurrences significantly reduce the computational cost of the process. ? Some of the studied functions to weight co-occurrences outperform approaches used in earlier works. ? Our dataset presents a reliable solution to build approaches for finding tag relations while evaluating on sound quantitative criteria. ? Language modeling techniques help discover that the usage of some tags is biased to unexpected meanings. Finding and visualizing semantic relations among tags within a tag cloud enhances user experience, particularly regarding access to and retrieval of web pages on social tagging systems. Several approaches have been proposed to visualize tag relations in these systems. However, results of previous research rely on qualitative evaluation methods, and do not provide robust and sound comparison criteria. In order to allow quantitative evaluation we present a benchmark social tagging dataset, where a subset of 140 tags from a well-known social bookmarking site, delicious, have been manually categorized according to the open directory project (ODP). The manual categorization is utilized as a ground truth that enables quantitative evaluation providing a way of inferring the best of different clustering approaches. With this dataset we also explore different tag representation approaches to present a reorganized tag cloud by using self organizing maps. In addition, we present an approach to enrich the resultant tag cloud with the most characteristic terms for each tag and group of tags, making possible a further filtered navigation, both by tag and document content, and easing a deeper qualitative evaluation of the clusters.

web intelligence | 2008

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-Organizing Maps

Alberto Pérez García-Plaza; Víctor Fresno; Raquel Martínez

This article introduces and evaluates a fuzzy logic based representation for HTML document clustering using Self-Organizing Maps. This representation is built on heuristic combinations of criteria by means of a fuzzy rules system and based on the HTML markup. We evaluate the model using different feature vector sizes. Experimental results show an improvement in clustering quality when the fuzzy logic-based model is used instead of the vector space model with traditional term weighting functions in a standard benchmark dataset.

Expert Systems With Applications | 2012

Improving the accuracy of COPLIMO to estimate the payoff of a software product line

Ruben Heradio; David Fernandez-Amoros; Luis Torre-Cubillo; Alberto Pérez García-Plaza

Software product line engineering pursues the efficient development of families of similar products. COPLIMO is an economic model that relies on COCOMO II to estimate the benefits of adopting a product line approach compared to developing the products one by one. Although COPLIMO is an ideal economic model to support decision making on the incremental development of a product line, it makes some simplifying assumptions that may produce high distortions in the estimates (e.g., COPLIMO takes for granted that all the products have the same size). This paper proposes a COPLIMO reformulation that avoids such assumptions and, consequently, improves the accuracy of the estimates. To support our proposal, we present an algorithm that infers the additional information that our COPLIMO reformulation requires from feature diagrams, which is a widespread notation to model the domain of a product line.

international multiconference on computer science and information technology | 2010

Learning taxonomic relations from a set of text documents

Mari-Sanna Paukkeri; Alberto Pérez García-Plaza; Sini Pessala; Timo Honkela

This paper presents a methodology for learning taxonomic relations from a set of documents that each explain one of the concepts. Three different feature extraction approaches with varying degree of language independence are compared in this study. The first feature extraction scheme is a language-independent approach based on statistical keyphrase extraction, and the second one is based on a combination of rule-based stemming and fuzzy logic-based feature weighting and selection. The third approach is the traditional tf-idf weighting scheme with commonly used rule-based stemming. The concept hierarchy is obtained by combining Self-Organizing Map clustering with agglomerative hierarchical clustering. Experiments are conducted for both English and Finnish. The results show that concept hierarchies can be constructed automatically also by using statistical methods without heavy language-specific preprocessing.

IEEE Transactions on Fuzzy Systems | 2017

Using Fuzzy Logic to Leverage HTML Markup for Web Page Representation

Alberto Pérez García-Plaza; Víctor Fresno; Raquel Martínez Unanue; Arkaitz Zubiaga

The selection of a suitable document representation approach plays a crucial role in the performance of a document clustering task. Being able to pick out representative words within a document can lead to substantial improvements in document clustering. In the case of web documents, the HTML markup that defines the layout of the content provides additional structural information that can be further exploited to identify representative words. In this paper, we introduce a fuzzy term weighing approach that makes the most of the HTML structure for document clustering. We set forth and build on the hypothesis that a good representation can take advantage of how humans skim through documents to extract the most representative words. The authors of web pages make use of HTML tags to convey the most important message of a web page through page elements that attract the readers’ attention, such as page titles or emphasized elements. We define a set of criteria to exploit the information provided by these page elements, and introduce a fuzzy combination of these criteria that we evaluate within the context of a web page clustering task. Our proposed approach, called abstract fuzzy combination of criteria (AFCC), can adapt to datasets whose features are distributed differently, achieving good results compared with other similar fuzzy logic based approaches and TF-IDF across different datasets.

ieee international conference on fuzzy systems | 2012

Fitting document representation to specific datasets by adjusting membership functions

Alberto Pérez García-Plaza; Víctor Fresno; Raquel Martínez

In this work we deal with the problem of web page clustering from the point of view of document representation. Fuzzy ruled-based systems have been successfully used to represent web documents by means of heuristic combinations of criteria. In these systems, rules were established based on the way humans read documents and have been analyzed in previous works. However, membership functions parameters were fixed by default, assuming that any document would follow similar patterns regardless of the rest of documents in the collection. In this work we analyze to what extent collection information could be used to adjust the membership functions in order to improve document representation, and therefore, clustering results. We compare our proposal to the original one in which is based, and to another similar or common approaches. We also perform statistical significance tests to ensure that our modifications have a real effect over the original representation. Results show that adjusting document representation parameters to concrete collections leads to better clustering results.

international conference on computational linguistics | 2012

Fuzzy combinations of criteria: an application to web page representation for clustering

Alberto Pérez García-Plaza; Víctor Fresno; Raquel Martínez

Document representation is an essential step in web page clustering. Web pages are usually written in HTML, offering useful information to select the most important features to represent them. In this paper we investigate the use of nonlinear combinations of criteria by means of a fuzzy system to find those important features. We start our research from a term weighting function called Fuzzy Combination of Criteria (fcc) that relies on term frequency, document title, emphasis and term positions in the text. Next, we analyze its drawbacks and explore the possibility of adding contextual information extracted from inlinks anchor texts, proposing an alternative way of combining criteria based on our experimental results. Finally, we apply a statistical test of significance to compare the original representation with our proposal.

Explore More