Ilaria Bordino
Yahoo!
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ilaria Bordino.
PLOS ONE | 2012
Ilaria Bordino; Stefano Battiston; Guido Caldarelli; Matthieu Cristelli; Antti Ukkonen; Ingmar Weber
We live in a computerized and networked society where many of our actions leave a digital trace and affect other people’s actions. This has lead to the emergence of a new data-driven research field: mathematical methods of computer science, statistical physics and sociometry provide insights on a wide range of disciplines ranging from social science to human mobility. A recent important discovery is that search engine traffic (i.e., the number of requests submitted by users to search engines on the www) can be used to track and, in some cases, to anticipate the dynamics of social phenomena. Successful examples include unemployment levels, car and home sales, and epidemics spreading. Few recent works applied this approach to stock prices and market sentiment. However, it remains unclear if trends in financial markets can be anticipated by the collective wisdom of on-line users on the web. Here we show that daily trading volumes of stocks traded in NASDAQ-100 are correlated with daily volumes of queries related to the same stocks. In particular, query volumes anticipate in many cases peaks of trading by one day or more. Our analysis is carried out on a unique dataset of queries, submitted to an important web search engine, which enable us to investigate also the user behavior. We show that the query volume dynamics emerges from the collective but seemingly uncoordinated activity of many users. These findings contribute to the debate on the identification of early warnings of financial systemic risk, based on the activity of users of the www.
international conference on data mining | 2008
Ilaria Bordino; Debora Donato; Aristides Gionis; Stefano Leonardi
The problem of mining frequent patterns in networks has many applications, including analysis of complex networks, clustering of graphs, finding communities in social networks, and indexing of graphical and biological databases. Despite this wealth of applications, the current state of the art lacks algorithmic tools for counting the number of subgraphs contained in a large network. In this paper we develop data-stream algorithms that approximate the number of all subgraphs of three and four vertices in directed and undirected networks. We use the frequency of occurrence of all subgraphs to prove their significance in order to characterize different kinds of networks: we achieve very good precision in clustering networks with similar structure. The significance of our method is supported by the fact that such high precision cannot be achieved when performing clustering based on simpler topological properties, such as degree, assortativity, and eigenvector distributions. We have also tested our techniques using swap randomization.
conference on information and knowledge management | 2013
Ilaria Bordino; Yelena Mejova; Mounia Lalmas
In many cases, when browsing the Web users are searching for specific information or answers to concrete questions. Sometimes, though, users find unexpected, yet interesting and useful results, and are encouraged to explore further. What makes a result serendipitous? We propose to answer this question by exploring the potential of entities extracted from two sources of user-generated content -- Wikipedia, a user-curated online encyclopedia, and Yahoo! Answers, a more unconstrained question/answering forum -- in promoting serendipitous search. In this work, the content of each data source is represented as an entity network, which is further enriched with metadata about sentiment, writing quality, and topical category. We devise an algorithm based on lazy random walk with restart to retrieve entity recommendations from the networks. We show that our method provides novel results from both datasets, compared to standard web search engines. However, unlike previous research, we find that choosing highly emotional entities does not increase user interest for many categories of entities, suggesting a more complex relationship between topic matter and the desirable metadata attributes in serendipitous search.
web search and data mining | 2013
Ilaria Bordino; Gianmarco De Francisci Morales; Ingmar Weber; Francesco Bonchi
We study the problem of anticipating user search needs, based on their browsing activity. Given the current web page p that a user is visiting we want to recommend a small and diverse set of search queries that are relevant to the content of p, but also non-obvious and serendipitous. We introduce a novel method that is based on the content of the page visited, rather than on past browsing patterns as in previous literature. Our content-based approach can be used even for previously unseen pages. We represent the topics of a page by the set of Wikipedia entities extracted from it. To obtain useful query suggestions for these entities, we exploit a novel graph model that we call EQGraph (Entity-Query Graph), containing entities, queries, and transitions between entities, between queries, as well as from entities to queries. We perform Personalized PageRank computation on such a graph to expand the set of entities extracted from a page into a richer set of entities, and to associate these entities with relevant query suggestions. We develop an efficient implementation to deal with large graph instances and suggest queries from a large and diverse pool. We perform a user study that shows that our method produces relevant and interesting recommendations, and outperforms an alternative method based on reverse IR.
international conference on data mining | 2008
Ilaria Bordino; Paolo Boldi; Debora Donato; Massimo Santini; Sebastiano Vigna
Recently, a new temporal dataset has been made public: it is made of a series of twelve 100 M pages snapshots of the .uk domain. The Web graphs of the twelve snapshots have been merged into a single time-aware graph that provide constant-time access to temporal information. In this paper we present the first statistical analysis performed on this graph, with the goal of checking whether the information contained in the graph is reliable (i.e. whether it depends essentially on appearance and disappearance of pages and links, or on the crawler behaviour). We perform a number of tests that show that the graph is actually reliable, and provide the first public data on the evolution of the Web that use a large scale and a significant diversity in the sites considered.
PLOS ONE | 2016
Gabriele Ranco; Ilaria Bordino; Giacomo Bormetti; Guido Caldarelli; Fabrizio Lillo; Michele Treccani
The new digital revolution of big data is deeply changing our capability of understanding society and forecasting the outcome of many social and economic systems. Unfortunately, information can be very heterogeneous in the importance, relevance, and surprise it conveys, affecting severely the predictive power of semantic and statistical methods. Here we show that the aggregation of web users’ behavior can be elicited to overcome this problem in a hard to predict complex system, namely the financial market. Specifically, our in-sample analysis shows that the combined use of sentiment analysis of news and browsing activity of users of Yahoo! Finance greatly helps forecasting intra-day and daily price changes of a set of 100 highly capitalized US stocks traded in the period 2012–2013. Sentiment analysis or browsing activity when taken alone have very small or no predictive power. Conversely, when considering a news signal where in a given time interval we compute the average sentiment of the clicked news, weighted by the number of clicks, we show that for nearly 50% of the companies such signal Granger-causes hourly price returns. Our result indicates a “wisdom-of-the-crowd” effect that allows to exploit users’ activity to identify and weigh properly the relevant and surprising news, enhancing considerably the forecasting power of the news sentiment.
international conference on data engineering | 2014
Ilaria Bordino; Nicolas Kourtellis; Nikolay Laptev; Youssef Billawala
Web traffic represents a powerful mirror for various real-world phenomena. For example, it was shown that web search volumes have a positive correlation with stock trading volumes and with the sentiment of investors. Our hypothesis is that user browsing behavior on a domain-specific portal is a better predictor of user intent than web searches.
ACM Transactions on Information Systems | 2015
Aris Anagnostopoulos; Luca Becchetti; Ilaria Bordino; Stefano Leonardi; Ida Mele; Piotr Sankowski
We design algorithms that, given a collection of documents and a distribution over user queries, return a small subset of the document collection in such a way that we can efficiently provide high-quality answers to user queries using only the selected subset. This approach has applications when space is a constraint or when the query-processing time increases significantly with the size of the collection. We study our algorithms through the lens of stochastic analysis and prove that even though they use only a small fraction of the entire collection, they can provide answers to most user queries, achieving a performance close to the optimal. To complement our theoretical findings, we experimentally show the versatility of our approach by considering two important cases in the context of Web search. In the first case, we favor the retrieval of documents that are relevant to the query, whereas in the second case we aim for document diversification. Both the theoretical and the experimental analysis provide strong evidence of the potential value of query covering in diverse application scenarios.
conference on information and knowledge management | 2014
Olivier Van Laere; Ilaria Bordino; Yelena Mejova; Mounia Lalmas
We present DEESSE [1], a tool that enables an exploratory and serendipitous exploration - at entity level, of the content of two different social media: Wikipedia, a user-curated online encyclopedia, and Yahoo Answers, a more unconstrained question/answering forum. DEESSE represents the content of each source as an entity network, which is further enriched with metadata about sentiment, writing quality, and topical category. Given a query entity, entity results are retrieved from the network by employing an algorithm based on a random walk with restart to the query. Following the emerging paradigm of composite retrieval, we organize the results into topically coherent bundles instead of showing them in a simple ranked list.
international world wide web conferences | 2013
Yelena Mejova; Ilaria Bordino; Mounia Lalmas; Aristides Gionis
In many cases, when browsing the Web, users are searching for specific information. Sometimes, though, users are also looking for something interesting, surprising, or entertaining. Serendipitous search puts interestingness on par with relevance. We investigate how interesting are the results one can obtain via serendipitous search, and what makes them so, by comparing entity networks extracted from two prominent social media sites, Wikipedia and Yahoo! Answers.