Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Pedro Saleiro is active.

Publication


Featured researches published by Pedro Saleiro.


dependable autonomic and secure computing | 2015

POPmine: Tracking Political Opinion on the Web

Pedro Saleiro; Silvio Amir; Mário J. Silva; Carlos Soares

The automatic content analysis of mass media in the social sciences has become necessary and possible with the raise of social media and computational power. One particularly promising avenue of research concerns the use of opinion mining. We design and implement the POPmine system which is able to collect texts from web-based conventional media (news items in mainstream media sites) and social media (blogs and Twitter) and to process those texts, recognizing topics and political actors, analyzing relevant linguistic units, and generating indicators of both frequency of mention and polarity (positivity/negativity) of mentions to political actors across sources, types of sources, and across time.


european conference on information retrieval | 2016

TimeMachine: Entity-Centric Search and Visualization of News Archives

Pedro Saleiro; Jorge Teixeira; Carlos Soares; Eugénio C. Oliveira

We present a dynamic web tool that allows interactive search and visualization of large news archives using an entity-centric approach. Users are able to search entities using keyword phrases expressing news stories or events and the system retrieves the most relevant entities to the user query based on automatically extracted and indexed entity profiles. From the computational journalism perspective, TimeMachine allows users to explore media content through time using automatic identification of entity names, jobs, quotations and relations between entities from co-occurrences networks extracted from the news articles. TimeMachine demo is available at http://maquinadotempo.sapo.pt/.


intelligent data analysis | 2016

Learning from the News: Predicting Entity Popularity on Twitter

Pedro Saleiro; Carlos Soares

In this work, we tackle the problem of predicting entity popularity on Twitter based on the news cycle. We apply a supervised learning approach and extract four types of features: (i) signal, (ii) textual, (iii) sentiment and (iv) semantic, which we use to predict whether the popularity of a given entity will be high or low in the following hours. We run several experiments on six different entities in a dataset of over 150M tweets and 5M news and obtained F1 scores over 0.70. Error analysis indicates that news perform better on predicting entity popularity on Twitter when they are the primary information source of the event, in opposition to events such as live TV broadcasts, political debates or football matches.


international c conference on computer science & software engineering | 2016

Sentiment Aggregate Functions for Political Opinion Polling using Microblog Streams

Pedro Saleiro; Luís S. Gomes; Carlos Soares

The automatic content analysis of mass media in the social sciences has become necessary and possible with the raise of social media and computational power. One particularly promising avenue of research concerns the use of sentiment analysis in microblog streams. However, one of the main challenges consists in aggregating sentiment polarity in a timely fashion that can be fed to the prediction method. We investigated a large set of sentiment aggregate functions and performed a regression analysis using political opinion polls as gold standard. Our dataset contains nearly 233 000 tweets, classified according to their polarity (positive, negative or neutral), regarding the five main Portuguese political leaders during the Portuguese bailout (2011-2014). Results show that different sentiment aggregate functions exhibit different feature importance over time while the error keeps almost unchanged.


international acm sigir conference on research and development in information retrieval | 2017

RELink: A Research Framework and Test Collection for Entity-Relationship Retrieval

Pedro Saleiro; Natasa Milic-Frayling; Eduarda Mendes Rodrigues; Carlos Soares

Improvements of entity-relationship (E-R) search techniques have been hampered by a lack of test collections, particularly for complex queries involving multiple entities and relationships. In this paper we describe a method for generating E-R test queries to support comprehensive E-R search experiments. Queries and relevance judgments are created from content that exists in a tabular form where columns represent entity types and the table structure implies one or more relationships among the entities. Editorial work involves creating natural language queries based on relationships represented by the entries in the table. We have publicly released the RELink test collection comprising 600 queries and relevance judgments obtained from a sample of Wikipedia List-of-lists-of-lists tables. The latter comprise tuples of entities that are extracted from columns and labelled by corresponding entity types and relationships they represent. In order to facilitate research in complex E-R retrieval, we have created and released as open source the RELink Framework that includes Apache Lucene indexing and search specifically tailored to E-R retrieval. RELink includes entity and relationship indexing based on the ClueWeb-09-B Web collection with FACC1 text span annotations linked to Wikipedia entities. With ready to use search resources and a comprehensive test collection, we support community in pursuing E-R research at scale.


international conference on intelligent transportation systems | 2016

Mining social media for open innovation in transportation systems

Daniela Ulloa; Pedro Saleiro; Rosaldo J. F. Rossetti; Elis Silva

This work proposes a novel framework for the development of new products and services in transportation through an open innovation approach based on automatic content analysis of social media data. The framework is able to extract users comments from Online Social Networks (OSN), to process and analyze text through information extraction and sentiment analysis techniques to obtain relevant information about product reception on the market. A use case was developed using the mobile application Uber, which is today one of the fastest growing technology companies in the world. We measured how a controversial, highly diffused event influences the volume of tweets about Uber and the perception of its users. While there is no change in the image of Uber, a large increase in the number of tweets mentioning the company is observed, which meant a free and important diffusion of its product.


international c conference on computer science & software engineering | 2016

SentiBubbles: Topic Modeling and Sentiment Visualization of Entity-centric Tweets

João Manuel de Oliveira; Mike Pinto; Pedro Saleiro; Jorge Teixeira

Social Media users tend to mention entities when reacting to news events. The main purpose of this work is to create entity-centric aggregations of tweets on a daily basis. By applying topic modeling and sentiment analysis, we create data visualization insights about current events and people reactions to those events from an entity-centric perspective.


New Generation Computing | 2017

TexRep: A Text Mining Framework for Online Reputation Monitoring

Pedro Saleiro; Eduarda Mendes Rodrigues; Carlos Soares; Eugénio C. Oliveira

AbstractThis work aims to understand, formalize and explore the scientific challenges of using unstructured text data from different Web sources for Online Reputation Monitoring. We here present TexRep, an adaptable text mining framework specifically tailored for Online Reputation Monitoring that can be reused in multiple application scenarios, from politics to finance. This framework is able to collect texts from online media, such as Twitter, and identify entities of interest and classify sentiment polarity and intensity. The framework supports multiple data aggregation methods, as well as visualization and modeling techniques that can be used for both descriptive analytics, such as analyze how political polls evolve over time, and predictive analytics, such as predict elections. We here present case studies that illustrate and validate TexRep for Online Reputation Monitoring. In particular, we provide an evaluation of TexRep Entity Filtering and Sentiment Analysis modules using well known external benchmarks. We also present an illustrative example of TexRep application in the political domain.


portuguese conference on artificial intelligence | 2017

Learning Word Embeddings from the Portuguese Twitter Stream: A Study of Some Practical Aspects.

Pedro Saleiro; Luís Sarmento; Eduarda Mendes Rodrigues; Carlos Soares; Eugénio C. Oliveira

This paper describes a preliminary study for producing and distributing a large-scale database of embeddings from the Portuguese Twitter stream. We start by experimenting with a relatively small sample and focusing on three challenges: volume of training data, vocabulary size and intrinsic evaluation metrics. Using a single GPU, we were able to scale up vocabulary size from 2048 words embedded and 500K training examples to 32768 words over 10M training examples while keeping a stable validation loss and approximately linear trend on training time per epoch. We also observed that using less than 50\% of the available training examples for each vocabulary size might result in overfitting. Results on intrinsic evaluation show promising performance for a vocabulary size of 32768 words. Nevertheless, intrinsic evaluation metrics suffer from over-sensitivity to their corresponding cosine similarity thresholds, indicating that a wider range of metrics need to be developed to track progress.


portuguese conference on artificial intelligence | 2017

Transportation in Social Media: An Automatic Classifier for Travel-Related Tweets

João Pereira; Arian Pasquali; Pedro Saleiro; Rosaldo J. F. Rossetti

In the last years researchers in the field of intelligent transportation systems have made several efforts to extract valuable information from social media streams. However, collecting domain-specific data from any social media is a challenging task demanding appropriate and robust classification methods. In this work we focus on exploring geo-located tweets in order to create a travel-related tweet classifier using a combination of bag-of-words and word embeddings. The resulting classification makes possible the identification of interesting spatio-temporal relations in Sao Paulo and Rio de Janeiro.

Collaboration


Dive into the Pedro Saleiro's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge