Damiano Spina
RMIT University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Damiano Spina.
cross language evaluation forum | 2013
Enrique Amigó; Jorge Carrillo de Albornoz; Irina Chugur; Adolfo Corujo; Julio Gonzalo; Tamara Mart́ın; Edgar Meij; Maarten de Rijke; Damiano Spina
This paper summarizes the goals, organization, and results of the second RepLab competitive evaluation campaign for Online Reputation Management Systems RepLab 2013. RepLab focused on the process of monitoring the reputation of companies and individuals, and asked participant systems to annotate different types of information on tweets containing the names of several companies: first tweets had to be classified as related or unrelated to the entity; relevant tweets had to be classified according to their polarity for reputation Does the content of the tweet have positive or negative implications for the reputation of the entity?, clustered in coherent topics, and clusters had to be ranked according to their priority potential reputation problems had to come first. The gold standard consists of more than 140,000 tweets annotated by a group of trained annotators supervised and monitored by reputation experts.
acm conference on hypertext | 2012
Arkaitz Zubiaga; Damiano Spina; Enrique Amigó; Julio Gonzalo
We deal with shrinking the stream of tweets for scheduled events in real-time, following two steps: (i) sub-event detection, which determines if something new has occurred, and (ii) tweet selection, which picks a tweet to describe each sub-event. By comparing summaries in three languages to live reports by journalists, we show that simple text analysis methods which do not involve external knowledge lead to summaries that cover 84% of the sub-events on average, and 100% of key types of sub-events (such as goals in soccer).
conference on information and knowledge management | 2011
Arkaitz Zubiaga; Damiano Spina; Víctor Fresno; Raquel Martínez
Twitter summarizes the great deal of messages posted by users in the form of trending topics that reflect the top conversations being discussed at a given moment. These trending topics tend to be connected to current affairs. Different happenings can give rise to the emergence of these trending topics. For instance, a sports event broadcasted on TV, or a viral meme introduced by a community of users. Detecting the type of origin can facilitate information filtering, enhance real-time data processing, and improve user experience. In this paper, we introduce a typology to categorize the triggers that leverage trending topics: news, current events, memes, and commemoratives. We define a set of straightforward language-independent features that rely on the social spread of the trends to discriminate among those types of trending topics. Our method provides an efficient way to immediately and accurately categorize trending topics without need of external data, outperforming a content-based approach.
association for information science and technology | 2015
Arkaitz Zubiaga; Damiano Spina; Raquel Martínez; Víctor Fresno
In this work, we explore the types of triggers that spark trends on Twitter, introducing a typology with the following 4 types: news, ongoing events, memes, and commemoratives. While previous research has analyzed trending topics over the long term, we look at the earliest tweets that produce a trend, with the aim of categorizing trends early on. This allows us to provide a filtered subset of trends to end users. We experiment with a set of straightforward language‐independent features based on the social spread of trends and categorize them using the typology. Our method provides an efficient way to accurately categorize trending topics without need of external data, enabling news organizations to discover breaking news in real‐time, or to quickly identify viral memes that might inform marketing decisions, among others. The analysis of social features also reveals patterns associated with each type of trend, such as tweets about ongoing events being shorter as many were likely sent from mobile devices, or memes having more retweets originating from a few trend‐setters.
cross language evaluation forum | 2014
Enrique Amigó; Jorge Carrillo de Albornoz; Irina Chugur; Adolfo Corujo; Julio Gonzalo; Edgar Meij; Damiano Spina
This paper describes the organisation and results of RepLab 2014, the third competitive evaluation campaign for Online Reputation Management systems. This year the focus lied on two new tasks: reputation dimensions classification and author profiling, which complement the aspects of reputation analysis studied in the previous campaigns. The participants were asked (1) to classify tweets applying a standard typology of reputation dimensions and (2) categorise Twitter profiles by type of author as well as rank them according to their influence. New data collections were provided for the development and evaluation of systems that participated in this benchmarking activity.
international acm sigir conference on research and development in information retrieval | 2014
Damiano Spina; Julio Gonzalo; Enrique Amigó
Reputation management experts have to monitor--among others--Twitter constantly and decide, at any given time, what is being said about the entity of interest (a company, organization, personality...). Solving this reputation monitoring problem automatically as a topic detection task is both essential--manual processing of data is either costly or prohibitive--and challenging--topics of interest for reputation monitoring are usually fine-grained and suffer from data sparsity. We focus on a solution for the problem that (i) learns a pairwise tweet similarity function from previously annotated data, using all kinds of content-based and Twitter-based features; (ii) applies a clustering algorithm on the previously learned similarity function. Our experiments indicate that (i) Twitter signals can be used to improve the topic detection process with respect to using content signals only; (ii) learning a similarity function is a flexible and efficient way of introducing supervision in the topic detection clustering process. The performance of our best system is substantially better than state-of-the-art approaches and gets close to the inter-annotator agreement rate. A detailed qualitative inspection of the data further reveals two types of topics detected by reputation experts: reputation alerts / issues (which usually spike in time) and organizational topics (which are usually stable across time).
Expert Systems With Applications | 2013
Damiano Spina; Julio Gonzalo; Enrique Amigó
A major problem in monitoring the online reputation of companies, brands, and other entities is that entity names are often ambiguous (apple may refer to the company, the fruit, the singer, etc.). The problem is particularly hard in microblogging services such as Twitter, where texts are very short and there is little context to disambiguate. In this paper we address the filtering task of determining, out of a set of tweets that contain a company name, which ones do refer to the company. Our approach relies on the identification of filter keywords: those whose presence in a tweet reliably confirm (positive keywords) or discard (negative keywords) that the tweet refers to the company. We describe an algorithm to extract filter keywords that does not use any previously annotated data about the target company. The algorithm allows to classify 58% of the tweets with 75% accuracy; and those can be used to feed a machine learning algorithm to obtain a complete classification of all tweets with an overall accuracy of 73%. In comparison, a 10-fold validation of the same machine learning algorithm provides an accuracy of 85%, i.e., our unsupervised algorithm has a 14% loss with respect to its supervised counterpart. Our study also shows that (i) filter keywords for Twitter does not directly derive from the public information about the company in the Web: a manual selection of keywords from relevant web sources only covers 15% of the tweets with 86% accuracy; (ii) filter keywords can indeed be a productive way of classifying tweets: the five best possible keywords cover, in average, 28% of the tweets for a company in our test collection.
european conference on information retrieval | 2016
Liu Yang; Qingyao Ai; Damiano Spina; Ruey-Cheng Chen; Liang Pang; W. Bruce Croft; Jiafeng Guo; Falk Scholer
Retrieving finer grained text units such as passages or sentences as answers for non-factoid Web queries is becoming increasingly important for applications such as mobile Web search. In this work, we introduce the answer sentence retrieval task for non-factoid Web queries, and investigate how this task can be effectively solved under a learning to rank framework. We design two types of features, namely semantic and context features, beyond traditional text matching features. We compare learning to rank methods with multiple baseline methods including query likelihood and the state-of-the-art convolutional neural network based method, using an answer-annotated version of the TREC GOV2 collection. Results show that features used previously to retrieve topical sentences and factoid answer sentences are not sufficient for retrieving answer sentences for non-factoid queries, but with semantic and context features, we can significantly outperform the baseline methods.
cross language evaluation forum | 2011
Damiano Spina; Enrique Amigó; Julio Gonzalo
Monitoring the online reputation of a company starts by retrieving all (fresh) information where the company is mentioned; and a major problem in this context is that company names are often ambiguous (apple may refer to the company, the fruit, the singer, etc.). The problem is particularly hard in microblogging, where there is little context to disambiguate: this was the task addressed in the WePS-3 CLEF lab exercise in 2010. This paper introduces a novel fingerprint representation technique to visualize and compare system results for the task. We apply this technique to the systems that originally participated in WePS-3, and then we use it to explore the usefulness of filter keywords (those whose presence in a tweet reliably signals either the positive or the negative class) and finding the majority class (whether positive or negative tweets are predominant for a given company name in a tweet stream) as signals that contribute to address the problem. Our study shows that both are key signals to solve the task, and we also find that, remarkably, the vocabulary associated to a company in the Web does not seem to match the vocabulary used in Twitter streams: even a manual extraction of filter keywords from web pages has substantially lower recall than an oracle selection of the best terms from the Twitter stream.
exploiting semantic annotations in information retrieval | 2015
Ruey-Cheng Chen; Damiano Spina; W. Bruce Croft; Mark Sanderson; Falk Scholer
Finding answer passages from the Web is a challenging task. One major difficulty is to retrieve sentences that may not have many terms in common with the question. In this paper, we experiment with two semantic approaches for finding non-factoid answers using a learning-to-rank retrieval setting. We show that using semantic representations learned from external resources such as Wikipedia or Google News may substantially improve the quality of top-ranked retrieved answers.