David Vilares | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David Vilares is active.

Explore More

Publication

Featured researches published by David Vilares.

association for information science and technology | 2015

On the usefulness of lexical and syntactic processing in polarity classification of Twitter messages

David Vilares; Miguel A. Alonso; Carlos Gómez-Rodríguez

Millions of micro texts are published every day on Twitter. Identifying the sentiment present in them can be helpful for measuring the frame of mind of the public, their satisfaction with respect to a product, or their support of a social event. In this context, polarity classification is a subfield of sentiment analysis focused on determining whether the content of a text is objective or subjective, and in the latter case, if it conveys a positive or a negative opinion. Most polarity detection techniques tend to take into account individual terms in the text and even some degree of linguistic knowledge, but they do not usually consider syntactic relations between words. This article explores how relating lexical, syntactic, and psychometric information can be helpful to perform polarity classification on Spanish tweets. We provide an evaluation for both shallow and deep linguistic perspectives. Empirical results show an improved performance of syntactic approaches over pure lexical models when using large training sets to create a classifier, but this tendency is reversed when small training collections are used.

Natural Language Engineering | 2015

A syntactic approach for opinion mining on Spanish reviews

David Vilares; Miguel A. Alonso; Carlos Gómez-Rodríguez

We describe an opinion mining system which classies the polarity of Spanish texts. We propose an NLP approach that undertakes pre-processing, tokenisation and POS tagging of texts to then obtain the syntactic structure of sentences by means of a dependency parser. This structure is then used to address three of the most signicant linguistic constructions for the purpose in question: intensication, subordinate adversative clauses and negation.

Journal of Information Science | 2015

The megaphone of the people? Spanish SentiStrength for real-time analysis of political tweets

David Vilares; Mike Thelwall; Miguel A. Alonso

Twitter is an important platform for sharing opinions about politicians, parties and political decisions. These opinions can be exploited as a source of information to monitor the impact of politics on society. This article analyses the sentiment of 2,704,523 tweets referring to Spanish politicians and parties from a month in 2014–2015. The article makes three specific contributions: (a) enriching SentiStrength, a fast unsupervised sentiment strength detection system, for Spanish political tweeting; (b) analysing how linguistic phenomena such as negation, idioms and character duplication influence Spanish sentiment strength detection accuracy; and (c) analysing Spanish political tweets to rank political leaders, parties and personalities for popularity. Sentiment in Twitter for key politicians broadly reflects the main official polls for popularity but not for voting intention. In addition, the data suggests that the primary role of Twitter in politics is to select and amplify political events published by traditional media.

Information Processing and Management | 2017

Supervised sentiment analysis in multilingual environments

David Vilares; Miguel A. Alonso; Carlos Gmez-Rodrguez

This article tackles the problem of performing multilingual polarity classification on Twitter, comparing three techniques: (1) a multilingual model trained on a multilingual dataset, obtained by fusing existing monolingual resources, that does not need any language recognition step, (2) a dual monolingual model with perfect language detection on monolingual texts and (3) a monolingual model that acts based on the decision provided by a language identification tool. The techniques were evaluated on monolingual, synthetic multilingual and code-switching corpora of English and Spanish tweets. In the latter case we introduce the first code-switching Twitter corpus with sentiment labels. The samples are labelled according to two well-known criteria used for this purpose: the SentiStrength scale and a trinary scale (positive, neutral and negative categories). The experimental results show the robustness of the multilingual approach (1) and also that it outperforms the monolingual models on some monolingual datasets.

international symposium on neural networks | 2016

Lyapunov filtering of objectivity for Spanish sentiment model

Iti Chaturvedi; Erik Cambria; David Vilares

Objective sentences lack sentiments and, hence, can reduce the accuracy of a sentiment classifier. Traditional methods prior to 2001 used hand-crafted templates to identify subjectivity and did not generalize well for resource-deficient languages such as Spanish. Later works published between 2002 and 2009 proposed the use of deep neural networks to automatically learn a dictionary of features (in the form of convolution kernels) that is portable to new languages. Recently, recurrent neural networks are being used to model alternating subjective and objective sentences within a single review. Such networks are difficult to train for a large vocabulary of words due to the problem of vanishing gradients. Hence, in this paper we consider use of a Lyapunov linear matrix inequality to classify Spanish text as subjective or objective by combining Spanish features and features obtained from the corresponding translated English text. The aligned features for each sentence are next evolved using multiple kernel learning. The proposed Lyapunov deep neural network outperforms baselines by over 10% and the features learned in the hidden layers improve our understanding subjective sentences in Spanish.

Journal of Information Science | 2015

A linguistic approach for determining the topics of Spanish Twitter messages

David Vilares; Miguel A. Alonso; Carlos Gómez-Rodríguez

The vast number of opinions and reviews provided in Twitter is helpful in order to make interesting findings about a given industry, but given the huge number of messages published every day, it is important to detect the relevant ones. In this respect, the Twitter search functionality is not a practical tool when we want to poll messages dealing with a given set of general topics. This article presents an approach to classify Twitter messages into various topics. We tackle the problem from a linguistic angle, taking into account part-of-speech, syntactic and semantic information, showing how language processing techniques should be adapted to deal with the informal language present in Twitter messages. The TASS 2013 General corpus, a collection of tweets that has been specifically annotated to perform text analytics tasks, is used as the dataset in our evaluation framework. We carry out a wide range of experiments to determine which kinds of linguistic information have the greatest impact on this task and how they should be combined in order to obtain the best-performing system. The results lead us to conclude that relating features by means of contextual information adds complementary knowledge over pure lexical models, making it possible to outperform them on standard metrics for multilabel classification tasks.

document engineering | 2013

Supervised polarity classification of Spanish tweets based on linguistic knowledge

David Vilares; Miguel A. Alonso; Carlos Gómez-Rodríguez

We describe a system that classifies the polarity of Spanish tweets. We adopt a hybrid approach, which combines machine learning and linguistic knowledge acquired by means of NLP. We use part-of-speech tags, syntactic dependencies and semantic knowledge as features for a supervised classifier. Lexical particularities of the language used in Twitter are taken into account in a pre-processing step. Experimental results improve over those of pure machine learning approaches and confirm the practical utility of the proposal.

Knowledge Based Systems | 2017

Universal, unsupervised (rule-based), uncovered sentiment analysis

David Vilares; Carlos Gómez-Rodríguez; Miguel A. Alonso

We present a novel unsupervised approach for multilingual sentiment analysis driven by compositional syntax-based rules. On the one hand, we exploit some of the main advantages of unsupervised algorithms: (1) the interpretability of their output, in contrast with most supervised models, which behave as a black box and (2) their robustness across different corpora and domains. On the other hand, by introducing the concept of compositional operations and exploiting syntactic information in the form of universal dependencies, we tackle one of their main drawbacks: their rigidity on data that are differently structured depending on the language. Experiments show an improvement both over existing unsupervised methods, and over state-of-the-art supervised models when evaluating outside their corpus of origin. The system is freely available.

Artificial Intelligence Review | 2017

How important is syntactic parsing accuracy? An empirical evaluation on rule-based sentiment analysis

Carlos Gómez-Rodríguez; Iago Alonso-Alonso; David Vilares

Syntactic parsing, the process of obtaining the internal structure of sentences in natural languages, is a crucial task for artificial intelligence applications that need to extract meaning from natural language text or speech. Sentiment analysis is one example of application for which parsing has recently proven useful. In recent years, there have been significant advances in the accuracy of parsing algorithms. In this article, we perform an empirical, task-oriented evaluation to determine how parsing accuracy influences the performance of a state-of-the-art rule-based sentiment analysis system that determines the polarity of sentences from their parse trees. In particular, we evaluate the system using four well-known dependency parsers, including both current models with state-of-the-art accuracy and more innacurate models which, however, require less computational resources. The experiments show that all of the parsers produce similarly good results in the sentiment analysis task, without their accuracy having any relevant influence on the results. Since parsing is currently a task with a relatively high computational cost that varies strongly between algorithms, this suggests that sentiment analysis researchers and users should prioritize speed over accuracy when choosing a parser; and parsing researchers should investigate models that improve speed further, even at some cost to accuracy.

empirical methods in natural language processing | 2017

Detecting Perspectives in Political Debates

David Vilares; Yulan He

We explore how to detect people’s perspectives that occupy a certain proposition. We propose a Bayesian modelling approach where topics (or propositions) and their associated perspectives (or viewpoints) are modeled as latent variables. Words associated with topics or perspectives follow different generative routes. Based on the extracted perspectives, we can extract the top associated sentences from text to generate a succinct summary which allows a quick glimpse of the main viewpoints in a document. The model is evaluated on debates from the House of Commons of the UK Parliament, revealing perspectives from the debates without the use of labelled data and obtaining better results than previous related solutions under a variety of evaluations.

Explore More