Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Francisco Rangel is active.

Publication


Featured researches published by Francisco Rangel.


Information Processing and Management | 2016

On the impact of emotions on author profiling

Francisco Rangel; Paolo Rosso

Abstract In this paper, we investigate the impact of emotions on author profiling, concretely identifying age and gender. Firstly, we propose the EmoGraph method for modelling the way people use the language to express themselves on the basis of an emotion-labelled graph. We apply this representation model for identifying gender and age in the Spanish partition of the PAN-AP-13 corpus, obtaining comparable results to the best performing systems of the PAN Lab of CLEF.


cross language evaluation forum | 2013

Recent Trends in Digital Text Forensics and Its Evaluation

Tim Gollub; Martin Potthast; Anna Beyer; Matthias Busse; Francisco Rangel; Paolo Rosso; Efstathios Stamatatos; Benno Stein

This paper outlines the concepts and achievements of our evaluation lab on digital text forensics, PANi¾?13, which called for original research and development on plagiarism detection, author identification, and author profiling. We present a standardized evaluation framework for each of the three tasks and discuss the evaluation results of the altogether 58i¾?submitted contributions. For the first time, instead of accepting the output of software runs, we collected the softwares themselves and run them on a computer cluster at our site. As evaluation and experimentation platform we use TIRA, which is being developed at the Webis Group in Weimar. TIRA can handle large-scale software submissions by means of virtualization, sandboxed execution, tailored unit testing, and staged submission. In addition to the achieved evaluation results, a major achievement of our lab is that we now have the largest collection of state-of-the-art approaches with regard to the mentioned tasks for further analysis at our disposal.


cross language evaluation forum | 2015

Overview of the PAN/CLEF 2015 Evaluation Lab

Efstathios Stamatatos; Martin Potthast; Francisco Rangel; Paolo Rosso; Benno Stein

This paper presents an overview of the PAN/CLEF evaluation lab. During the last decade, PAN has been established as the main forum of text mining research focusing on the identification of personal traits of authors left behind in texts unintentionally. PAN 2015 comprises three tasks: plagiarism detection, author identification and author profiling studying important variations of these problems. In plagiarism detection, community-driven corpus construction is introduced as a new way of developing evaluation resources with diversity. In author identification, cross-topic and cross-genre author verification where the texts of known and unknown authorship do not match in topic and/or genre is introduced. A new corpus was built for this challenging, yet realistic, task covering four languages. In author profiling, in addition to usual author demographics, such as gender and age, five personality traits are introduced openness, conscientiousness, extraversion, agreeableness, and neuroticism and a new corpus of Twitter messages covering four languages was developed. In total, 53 teams participated in all three tasks of PAN 2015 and, following the practice of previous editions, software submissions were required and evaluated within the TIRA experimentation framework.


cross language evaluation forum | 2014

Improving the Reproducibility of PAN’s Shared Tasks:

Martin Potthast; Tim Gollub; Francisco Rangel; Paolo Rosso; Efstathios Stamatatos; Benno Stein

This paper reports on the PAN 2014 evaluation lab which hosts three shared tasks on plagiarism detection, author identification, and author profiling. To improve the reproducibility of shared tasks in general, and PAN’s tasks in particular, the Webis group developed a new web service called TIRA, which facilitates software submissions. Unlike many other labs, PAN asks participants to submit running softwares instead of their run output. To deal with the organizational overhead involved in handling software submissions, the TIRA experimentation platform helps to significantly reduce the workload for both participants and organizers, whereas the submitted softwares are kept in a running state. This year, we addressed the matter of responsibility of successful execution of submitted softwares in order to put participants back in charge of executing their software at our site. In sum, 57 softwares have been submitted to our lab; together with the 58 software submissions of last year, this forms the largest collection of softwares for our three tasks to date, all of which are readily available for further analysis. The report concludes with a brief summary of each task.


cross language evaluation forum | 2015

Language Variety Identification Using Distributed Representations of Words and Documents

Marc Franco-Salvador; Francisco Rangel; Paolo Rosso; Mariona Taulé; M. Antònia Martít

Language variety identification is an author profiling subtask which aims to detect lexical and semantic variations in order to classify different varieties of the same language. In this work we focus on the use of distributed representations of words and documents using the continuous Skip-gram model. We compare this model with three recent approaches: Information Gain Word-Patterns, TF-IDF graphs and Emotion-labeled Graphs, in addition to several baselines. We evaluate the models introducing the Hispablogs dataset, a new collection of Spanish blogs from five different countries: Argentina, Chile, Mexico, Peru and Spain. Experimental results show state-of-the-art performance in language variety identification. In addition, our empirical analysis provides interesting insights on the use of the evaluated approaches.


conference on intelligent text processing and computational linguistics | 2016

A Low Dimensionality Representation for Language Variety Identification

Francisco Rangel; Marc Franco-Salvador; Paolo Rosso

Language variety identification aims at labelling texts in a native language (e.g. Spanish, Portuguese, English) with its specific variation (e.g. Argentina, Chile, Mexico, Peru, Spain; Brazil, Portugal; UK, US). In this work we propose a low dimensionality representation (LDR) to address this task with five different varieties of Spanish: Argentina, Chile, Mexico, Peru and Spain. We compare our LDR method with common state-of-the-art representations and show an increase in accuracy of \({\sim }\)35%. Furthermore, we compare LDR with two reference distributed representation models. Experimental results show competitive performance while dramatically reducing the dimensionality—and increasing the big data suitability—to only 6 features per variety. Additionally, we analyse the behaviour of the employed machine learning algorithms and the most discriminating features. Finally, we employ an alternative dataset to test the robustness of our low dimensionality representation with another set of similar languages.


cross language evaluation forum | 2015

On the Multilingual and Genre Robustness of EmoGraphs for Author Profiling in Social Media

Francisco Rangel; Paolo Rosso

Author profiling aims at identifying different traits such as age and gender of an author on the basis of her writings. We propose the novel EmoGraph graph-based approach where morphosyntactic categories are enriched with semantic and affective information. In this work we focus on testing the robustness of EmoGraphs when applied to age and gender identification. Results with PAN-AP-14 corpus show the competitiveness of the representation over genres and languages. Finally, some interesting insights are shown, for example with topic and emotion bounded genres such as hotel reviews.


forum for information retrieval evaluation | 2016

PAN@FIRE: Overview of the PR-SOCO Track on Personality Recognition in SOurce COde

Francisco Rangel; Fabio A. González; Felipe Restrepo; Manuel Montes; Paolo Rosso

Author profiling consists of predicting an author’s demographics (e.g. age, gender, personality) from her writing. After addressing at PAN@CLEF mainly age and gender identification, and also personality recognition in Twitter (http://pan.webis.de/), in this PAN@FIRE track on Personality Recognition from SOurce COde (PR-SOCO) we have addressed the problem of predicting an author’s personality from her source code. In this paper, we analyse 48 runs sent by 11 participants. Given a set of source codes written in Java by students who answered also a personality test, participants had to predict big five traits. Results have been evaluated with two complementary measures (RMSE and Pearson product-moment correlation) that have allowed to identify whether systems with low error rates may work due to random chance. No matter the approach, openness is the trait that allowed to obtain the best results for both measures.


cross language evaluation forum | 2018

Overview of PAN 2018

Efstathios Stamatatos; Francisco Rangel; Michael Tschuggnall; Benno Stein; Mike Kestemont; Paolo Rosso; Martin Potthast

PAN 2018 explores several authorship analysis tasks enabling a systematic comparison of competitive approaches and advancing research in digital text forensics. More specifically, this edition of PAN introduces a shared task in cross-domain authorship attribution, where texts of known and unknown authorship belong to distinct domains, and another task in style change detection that distinguishes between single-author and multi-author texts. In addition, a shared task in multimodal author profiling examines, for the first time, a combination of information from both texts and images posted by social media users to estimate their gender. Finally, the author obfuscation task studies how a text by a certain author can be paraphrased so that existing author identification tools are confused and cannot recognize the similarity with other texts of the same author. New corpora have been built to support these shared tasks. A relatively large number of software submissions (41 in total) was received and evaluated. Best paradigms are highlighted while baselines indicate the pros and cons of submitted approaches.


applications of natural language to data bases | 2018

Identifying and Classifying Influencers in Twitter only with Textual Information

Victoria Nebot; Francisco Rangel; Rafael Berlanga; Paolo Rosso

Online Reputation Management systems aim at identifying and classifying Twitter influencers due to their importance for brands. Current methods mainly rely on metrics provided by Twitter such as followers, retweets, etc. In this work we follow the research initiated at RepLab 2014, but relying only on the textual content of tweets. Moreover, we have proposed a workflow to identify influencers and classify them into an interest group from a reputation point of view, besides the classification proposed at RepLab. We have evaluated two families of classifiers, which do not require feature engineering, namely: deep learning classifiers and traditional classifiers with embeddings. Additionally, we also use two baselines: a simple language model classifier and the “majority class” classifier. Experiments show that most of our methods outperform the reported results in RepLab 2014, especially the proposed Low Dimensionality Statistical Embedding.

Collaboration


Dive into the Francisco Rangel's collaboration.

Top Co-Authors

Avatar

Paolo Rosso

Polytechnic University of Valencia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Marc Franco-Salvador

Polytechnic University of Valencia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Antonio Reyes

Polytechnic University of Valencia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Fabio A. González

National University of Colombia

View shared research outputs
Top Co-Authors

Avatar

Felipe Restrepo

National University of Colombia

View shared research outputs
Researchain Logo
Decentralizing Knowledge