Fermín L. Cruz
University of Seville
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Fermín L. Cruz.
Computer Networks | 2012
F. Javier Ortega; José A. Troyano; Fermín L. Cruz; Carlos G. Vallejo; Fernando Enríquez
Trust and Reputation Systems constitute an essential part of many social networks due to the great expansion of these on-line communities in the past few years. As a consequence of this growth, some users try to disturb the normal atmosphere of these communities, or even to take advantage of them in order to obtain some kind of benefits. Therefore, the concept of trust is a key point in the performance of on-line systems such as on-line marketplaces, review aggregators, social news sites, and forums. In this work we propose a method to compute a ranking of the users in a social network, regarding their trustworthiness. The aim of our method is to prevent malicious users from illicitly gaining high reputation in the network by demoting them in the ranking of users. We propose a novel system intended to propagate both positive and negative opinions of the users through a network, in such way that the opinions from each user about others influence their global trust score. Our proposal has been evaluated in different challenging situations. The experiments include the generation of random graphs, the use of a real-world dataset extracted from a social news site, and a combination of both a real dataset and generation techniques, in order to test our proposals in different environments. The results show that our method performs well in every situations, showing the propagation of trust and distrust to be a reliable mechanism in a Trust and Reputation System.
Expert Systems With Applications | 2013
Fermín L. Cruz; José A. Troyano; Fernando Enríquez; F. Javier Ortega; Carlos G. Vallejo
Nowadays, people do not only navigate the web, but they also contribute contents to the Internet. Among other things, they write their thoughts and opinions in review sites, forums, social networks, blogs and other websites. These opinions constitute a valuable resource for businesses, governments and consumers. In the last years, some researchers have proposed opinion extraction systems, mostly domain-independent ones, to automatically extract structured representations of opinions contained in those texts. In this work, we tackle this task in a domain-oriented approach, defining a set of domain-specific resources which capture valuable knowledge about how people express opinions on a given domain. These resources are automatically induced from a set of annotated documents. Some experiments were carried out on three different domains (user-generated reviews of headphones, hotels and cars), comparing our approach to other state-of-the-art, domain-independent techniques. The results confirm the importance of the domain in order to build accurate opinion extraction systems. Some experiments on the influence of the dataset size and an example of aggregation and visualization of the extracted opinions are also shown.
Proceedings of the 2nd international workshop on Search and mining user-generated contents | 2010
Fermín L. Cruz; José A. Troyano; Fernando Enríquez; F. Javier Ortega; Carlos G. Vallejo
Feature-based opinion extraction is a task related to information extraction, which consists of extracting structured opinions on features of some object from reviews or other subjective textual sources. Over the last years, this problem has been studied by some researchers, generally in an unsupervised, domain-independent manner. As opposed to that, in this work we propose a redefinition of the problem from a more practical point of view, and describe a domain-specific, resource-based opinion extraction system. We focus on the description and generation of those resources, and briefly report the extraction system architecture and a few initial experiments. The results suggest that domain-specific knowledge is a valuable resource in order to build precise opinion extraction systems.
Expert Systems With Applications | 2014
Fermín L. Cruz; José A. Troyano; Beatriz Pontes; F. Javier Ortega
Abstract Many tasks related to sentiment analysis rely on sentiment lexicons, lexical resources containing information about the emotional implications of words (e.g., sentiment orientation of words, positive or negative). In this work, we present an automatic method for building lemma-level sentiment lexicons, which has been applied to obtain lexicons for English, Spanish and other three official languages in Spain. Our lexicons are multi-layered, allowing applications to trade off between the amount of available words and the accuracy of the estimations. Our evaluations show high accuracy values in all cases. As a previous step to the lemma-level lexicons, we have built a synset-level lexicon for English similar to S enti W ord N et 3.0, one of the most used sentiment lexicons nowadays. We have made several improvements in the original S enti W ord N et 3.0 building method, reflecting significantly better estimations of positivity and negativity, according to our evaluations. The resource containing all the lexicons, ML-S enti C on , is publicly available.
Knowledge Based Systems | 2015
Marc Franco-Salvador; Fermín L. Cruz; José A. Troyano; Paolo Rosso
We propose a new generic meta-learning-based approach to polarity categorization.Study impact of word sense disambiguation and vocabulary expansion-based features.State-of-the-art results on single and cross-domain polarity categorization.Our approach does not perform any domain adaptation, therefore it is generic.Our approach obtains the most stable results across the different tested domains. Current approaches to single and cross-domain polarity classification usually use bag of words, n-grams or lexical resource-based classifiers. In this paper, we propose the use of meta-learning to combine and enrich those approaches by adding also other knowledge-based features. In addition to the aforementioned classical approaches, our system uses the BabelNet multilingual semantic network to generate features derived from word sense disambiguation and vocabulary expansion. Experimental results show state-of-the-art performance on single and cross-domain polarity classification. Contrary to other approaches, ours is generic. These results were obtained without any domain adaptation technique. Moreover, the use of meta-learning allows our approach to obtain the most stable results across domains. Finally, our empirical analysis provides interesting insights on the use of semantic network-based features.
Information Processing and Management | 2012
Fermín L. Cruz; Carlos G. Vallejo; Fernando Enríquez; José A. Troyano
In this paper we present the relevance ranking algorithm named PolarityRank. This algorithm is inspired in PageRank, the webpage relevance calculus method used by Google, and generalizes it to deal with graphs having not only positive but also negative weighted arcs. Besides the definition of our algorithm, this paper includes the algebraic justification, the convergence demonstration and an empirical study in which PolarityRank is applied to two unrelated tasks where a graph with positive and negative weights can be built: the calculation of word semantic orientation and instance selection from a learning dataset.
Information Fusion | 2013
Fernando Enríquez; Fermín L. Cruz; F. Javier Ortega; Carlos G. Vallejo; José A. Troyano
The paper is devoted to a comparative study of classifier combination methods, which have been successfully applied to multiple tasks including Natural Language Processing (NLP) tasks. There is variety of classifier combination techniques and the major difficulty is to choose one that is the best fit for a particular task. In our study we explored the performance of a number of combination methods such as voting, Bayesian merging, behavior knowledge space, bagging, stacking, feature sub-spacing and cascading, for the part-of-speech tagging task using nine corpora in five languages. The results show that some methods that, currently, are not very popular could demonstrate much better performance. In addition, we learned how the corpus size and quality influence the combination methods performance. We also provide the results of applying the classifier combination methods to the other NLP tasks, such as name entity recognition and chunking. We believe that our study is the most exhaustive comparison made with combination methods applied to NLP tasks so far.
Information Fusion | 2016
Juan M. Cotelo; Fermín L. Cruz; Fernando Enríquez; José A. Troyano
We explore the idea of integrating both textual and structural information.Using only structural information gives similar results to ones yielded BoW model.Complementing textual content with structural information achieves the best results.A proper combination scheme is critical when integrating both types of models.Experimental results show that our combination proposal is quite effective. Display Omitted Twitter is a worldwide social media platform where millions of people frequently express ideas and opinions about any topic. This widespread success makes the analysis of tweets an interesting and possibly lucrative task, being those tweets rarely objective and becoming the targeting for large-scale analysis. In this paper, we explore the idea of integrating two fundamental aspects of a tweet, the proper textual content and its underlying structural information, when addressing the tweet categorization task. Thus, not only we analyze textual content of tweets but also analyze the structural information provided by the relationship between tweets and users, and we propose different methods for effectively combining both kinds of feature models extracted from the different knowledge sources. In order to test our approach, we address the specific task of determining the political opinion of Twitter users within their political context, observing that our most refined knowledge integration approach performs remarkably better (about 5 points above) than the textual-based classic model.
Journal of the Association for Information Science and Technology | 2014
Juan M. Cotelo; Fermín L. Cruz; José A. Troyano
Twitter is a social network in which people publish publicly accessible brief, instant messages. With its exponential growth and the public nature and transversality of its contents, more researchers are using Twitter as a source of data for multiple purposes. In this context, the ability to retrieve those messages (tweets) related to a certain topic becomes critical. In this work, we define the topic‐related tweet retrieval task and propose a dynamic, graph‐based method with which to address it. We have applied our method to capture a data set containing tweets related to the participation of the Spanish team in the Euro 2012 soccer competition, measuring the precision and recall against other simple but commonly used approaches. The results demonstrate the effectiveness of our method, which significantly increases coverage of the chosen topic and is able to capture related but unknown à priori subtopics.
Proceedings of the 2nd international workshop on Search and mining user-generated contents | 2010
F. Javier Ortega; Craig Macdonald; José A. Troyano; Fermín L. Cruz
In this work we tackle the problem of the spam detection on the Web. Spam web pages have become a problem for Web search engines, due to the negative effects that this phenomenon can cause in their retrieval results. Our approach is based on a random-walk algorithm that obtains a ranking of pages according to their relevance and their spam likelihood. We introduce the novelty of taking into account the content of the web pages to characterize the web graph and to obtain an a-priori estimation of the spam likekihood of the web pages. Our graph-based algorithm computes two scores for each node in the graph. Intuitively, these values represent how bad or good (spam-like or not) is a web page, according to its textual content and the relations in the graph. Our experiments show that our proposed technique outperforms other link-based techniques for spam detection.