Barbara Poblete
University of Chile
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Barbara Poblete.
knowledge discovery and data mining | 2010
Marcelo Mendoza; Barbara Poblete; Carlos Castillo
In this article we explore the behavior of Twitter users under an emergency situation. In particular, we analyze the activity related to the 2010 earthquake in Chile and characterize Twitter in the hours and days following this disaster. Furthermore, we perform a preliminary study of certain social phenomenons, such as the dissemination of false rumors and confirmed news. We analyze how this information propagated through the Twitter network, with the purpose of assessing the reliability of Twitter as an information source under extreme circumstances. Our analysis shows that the propagation of tweets that correspond to rumors differs from tweets that spread news because rumors tend to be questioned more than news by the Twitter community. This result shows that it is posible to detect rumors by using aggregate analysis on tweets.
Internet Research | 2013
Carlos Castillo; Marcelo Mendoza; Barbara Poblete
Purpose – Twitter is a popular microblogging service which has proven, in recent years, its potential for propagating news and information about developing events. The purpose of this paper is to focus on the analysis of information credibility on Twitter. The purpose of our research is to establish if an automatic discovery process of relevant and credible news events can be achieved. Design/methodology/approach – The paper follows a supervised learning approach for the task of automatic classification of credible news events. A first classifier decides if an information cascade corresponds to a newsworthy event. Then a second classifier decides if this cascade can be considered credible or not. The paper undertakes this effort training over a significant amount of labeled data, obtained using crowdsourcing tools. The paper validates these classifiers under two settings: the first, a sample of automatically detected Twitter “trends” in English, and second, the paper tests how well this model transfers to...
conference on information and knowledge management | 2011
Barbara Poblete; Ruth Garcia; Marcelo Mendoza; Alejandro Jaimes
Social media services have spread throughout the world in just a few years. They have become not only a new source of information, but also new mechanisms for societies world-wide to organize themselves and communicate. Therefore, social media has a very strong impact in many aspects -- at personal level, in business, and in politics, among many others. In spite of its fast adoption, little is known about social media usage in different countries, and whether patterns of behavior remain the same or not. To provide deep understanding of differences between countries can be useful in many ways, e.g.: to improve the design of social media systems (which features work best for which country?), and influence marketing and political campaigns. Moreover, this type of analysis can provide relevant insight into how societies might differ. In this paper we present a summary of a large-scale analysis of Twitter for an extended period of time. We analyze in detail various aspects of social media for the ten countries we identified as most active. We collected one years worth of data and report differences and similarities in terms of activity, sentiment, use of languages, and network structure. To the best of our knowledge, this is the first on-line social network study of such characteristics.
Knowledge Based Systems | 2014
Felipe Bravo-Marquez; Marcelo Mendoza; Barbara Poblete
People react to events, topics and entities by expressing their personal opinions and emotions. These reactions can correspond to a wide range of intensities, from very mild to strong. An adequate processing and understanding of these expressions has been the subject of research in several fields, such as business and politics. In this context, Twitter sentiment analysis, which is the task of automatically identifying and extracting subjective information from tweets, has received increasing attention from the Web mining community. Twitter provides an extremely valuable insight into human opinions, as well as new challenging Big Data problems. These problems include the processing of massive volumes of streaming data, as well as the automatic identification of human expressiveness within short text messages. In that area, several methods and lexical resources have been proposed in order to extract sentiment indicators from natural language texts at both syntactic and semantic levels. These approaches address different dimensions of opinions, such as subjectivity, polarity, intensity and emotion. This article is the first study of how these resources, which are focused on different sentiment scopes, complement each other. With this purpose we identify scenarios in which some of these resources are more useful than others. Furthermore, we propose a novel approach for sentiment classification based on meta-level features. This supervised approach boosts existing sentiment classification of subjectivity and polarity detection on Twitter. Our results show that the combination of meta-level features provides significant improvements in performance. However, we observe that there are important differences that rely on the type of lexical resource, the dataset used to build the model, and the learning strategy. Experimental results indicate that manually generated lexicons are focused on emotional words, being very useful for polarity prediction. On the other hand, lexicons generated with automatic methods include neutral words, introducing noise in the detection of subjectivity. Our findings indicate that polarity and subjectivity prediction are different dimensions of the same problem, but they need to be addressed using different subspace features. Lexicon-based approaches are recommendable for polarity, and stylistic part-of-speech based approaches are meaningful for subjectivity. With this research we offer a more global insight of the resource components for the complex task of classifying human emotion and opinion.
Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining | 2013
Felipe Bravo-Marquez; Marcelo Mendoza; Barbara Poblete
Twitter sentiment analysis or the task of automatically retrieving opinions from tweets has received an increasing interest from the web mining community. This is due to its importance in a wide range of fields such as business and politics. People express sentiments about specific topics or entities with different strengths and intensities, where these sentiments are strongly related to their personal feelings and emotions. A number of methods and lexical resources have been proposed to analyze sentiment from natural language texts, addressing different opinion dimensions. In this article, we propose an approach for boosting Twitter sentiment classification using different sentiment dimensions as meta-level features. We combine aspects such as opinion strength, emotion and polarity indicators, generated by existing sentiment analysis methods and resources. Our research shows that the combination of sentiment dimensions provides significant improvement in Twitter sentiment classification tasks such as polarity and subjectivity.
knowledge discovery and data mining | 2013
Jheser Guzman; Barbara Poblete
On-line social networks have become a massive communication and information channel for users world-wide. In particular, the microblogging platform Twitter, is characterized by short-text message exchanges at extremely high rates. In this type of scenario, the detection of emerging topics in text streams becomes an important research area, essential for identifying relevant new conversation topics, such as breaking news and trends. Although emerging topic detection in text is a well established research area, its application to large volumes of streaming text data is quite novel. Making scalability, efficiency and rapidness, the key aspects for any emerging topic detection algorithm in this type of environment. Our research addresses the aforementioned problem by focusing on detecting significant and unusual bursts in keyword arrival rates or bursty keywords. We propose a scalable and fast on-line method that uses normalized individual frequency signals per term and a windowing variation technique. This method reports keyword bursts which can be composed of single or multiple terms, ranked according to their importance. The average complexity of our method is O(n log n), where n is the number of messages in the time window. This complexity allows our approach to be scalable for large streaming datasets. If bursts are only detected and not ranked, the algorithm remains with lineal complexity O(n), making it the fastest in comparison to the current state-of-the-art. We validate our approach by comparing our performance to similar systems using the TREC Tweet 2011 Challenge tweets, obtaining 91% of matches with LDA, an off-line gold standard used in similar evaluations. In addition, we study Twitter messages related to the SuperBowl football events in 2011 and 2013.
lasers and electro optics society meeting | 2003
Ricardo A. Baeza-Yates; Barbara Poblete
We present the evolution of the structure of the Chilean Web between 2000 and 2002. Our results show that although the Web grows as expected, also a significant part of it disappears. In addition, some components are much more stable than others. We also compare the expected life cycle of a Web site in the structure with the actual real data.
knowledge discovery and data mining | 2007
Barbara Poblete; Myra Spiliopoulou; Ricardo A. Baeza-Yates
In this paper we study privacy preservation for the publication of search engine query logs. We introduce a new privacy concern, website privacy as a special case of business privacy.We define the possible adversaries who could be interested in disclosing website information and the vulnerabilities in the query log, which they could exploit. We elaborate on anonymization techniques to protect website information, discuss different types of attacks that an adversary could use and propose an anonymization strategy for one of these attacks. We then present a graph-based heuristic to validate the effectiveness of our anonymization method and perform an experimental evaluation of this approach. Our experimental results show that the query log can be appropriately anonymized against the specific attack, while retaining a significant volume of useful data.
Computer Networks | 2006
Ricardo A. Baeza-Yates; Barbara Poblete
In this paper we present a large scale study on the evolution of the Web structure of the Chilean domain (.cl) from 2000 to 2004, focusing on the Web site transitions in the structure. This is the study of the largest time span and the most detailed of its kind. Our results show that there are many stable Web sites, but also a majority of chaotic changes. We also present the first known results on the death behavior of Web sites.
latin american web congress | 2012
Felipe Bravo-Marquez; Daniel Gayo-Avello; Marcelo Mendoza; Barbara Poblete
In this work we conduct an empirical study of opinion time series created from Twitter data regarding the 2008 U.S. elections. The focus of our proposal is to establish whether a time series is appropriate or not for generating a reliable predictive model. We analyze time series obtained from Twitter messages related to the 2008 U.S. elections using ARMA/ARIMA and GARCH models. The first models are used in order to assess the conditional mean of the process and the second ones to assess the conditional variance or volatility. The main argument we discuss is that opinion time series that exhibit volatility should not be used for long-term forecasting purposes. We present an in-depth analysis of the statistical properties of these time series. Our experiments show that these time series are not fit for predicting future opinion trends. Due to the fact that researchers have not provided enough evidence to support the alleged predictive power of opinion time series, we discuss how more rigorous validation of predictive models generated from time series could benefit the opinion mining field.