Gabriele Tolomei | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gabriele Tolomei is active.

Explore More

Publication

Featured researches published by Gabriele Tolomei.

web search and data mining | 2011

Identifying task-based sessions in search engine query logs

Claudio Lucchese; Salvatore Orlando; Raffaele Perego; Fabrizio Silvestri; Gabriele Tolomei

The research challenge addressed in this paper is to devise effective techniques for identifying task-based sessions, i.e. sets of possibly non contiguous queries issued by the user of a Web Search Engine for carrying out a given task. In order to evaluate and compare different approaches, we built, by means of a manual labeling process, a ground-truth where the queries of a given query log have been grouped in tasks. Our analysis of this ground-truth shows that users tend to perform more than one task at the same time, since about 75% of the submitted queries involve a multi-tasking activity. We formally define the Task-based Session Discovery Problem (TSDP) as the problem of best approximating the manually annotated tasks, and we propose several variants of well known clustering algorithms, as well as a novel efficient heuristic algorithm, specifically tuned for solving the TSDP. These algorithms also exploit the collaborative knowledge collected by Wiktionary and Wikipedia for detecting query pairs that are not similar from a lexical content point of view, but actually semantically related. The proposed algorithms have been evaluated on the above ground-truth, and are shown to perform better than state-of-the-art approaches, because they effectively take into account the multi-tasking behavior of users.

ACM Transactions on Information Systems | 2013

Discovering tasks from search engine query logs

Claudio Lucchese; Salvatore Orlando; Raffaele Perego; Fabrizio Silvestri; Gabriele Tolomei

Although Web search engines still answer user queries with lists of ten blue links to webpages, people are increasingly issuing queries to accomplish their daily tasks (e.g., finding a recipe, booking a flight, reading online news, etc.). In this work, we propose a two-step methodology for discovering tasks that users try to perform through search engines. First, we identify user tasks from individual user sessions stored in search engine query logs. In our vision, a user task is a set of possibly noncontiguous queries (within a user search session), which refer to the same need. Second, we discover collective tasks by aggregating similar user tasks, possibly performed by distinct users. To discover user tasks, we propose query similarity functions based on unsupervised and supervised learning approaches. We present a set of query clustering methods that exploit these functions in order to detect user tasks. All the proposed solutions were evaluated on a manually-built ground truth, and two of them performed better than state-of-the-art approaches. To detect collective tasks, we propose four methods that cluster previously discovered user tasks, which in turn are represented by the bag-of-words extracted from their composing queries. These solutions were also evaluated on another manually-built ground truth.

knowledge discovery and data mining | 2015

Promoting Positive Post-Click Experience for In-Stream Yahoo Gemini Users

Mounia Lalmas; Janette Lehmann; Guy Shaked; Fabrizio Silvestri; Gabriele Tolomei

Click-through rate (CTR) is the most common metric used to assess the performance of an online advert; another performance of an online advert is the user post-click experience. In this paper, we describe the method we have implemented in Yahoo Gemini to measure the post-click experience on Yahoo mobile news streams via an automatic analysis of advert landing pages. We measure the post-click experience by means of two well-known metrics, dwell time and bounce rate. We show that these metrics can be used as proxy of an advert post-click experience, and that a negative post-click experience has a negative effect on user engagement and future ad clicks. We then put forward an approach that analyses advert landing pages, and show how these can affect dwell time and bounce rate. Finally, we develop a prediction model for advert quality based on dwell time, which was deployed on Yahoo mobile news stream app running on iOS. The results show that, using dwell time as a proxy of post-click experience, we can prioritise higher quality ads. We demonstrate the impact of this on users via A/B testing.

international world wide web conferences | 2014

Quite a mess in my cookie jar!: leveraging machine learning to protect web authentication

Stefano Calzavara; Gabriele Tolomei; Michele Bugliesi; Salvatore Orlando

Browser-based defenses have recently been advocated as an effective mechanism to protect web applications against the threats of session hijacking, fixation, and related attacks. In existing approaches, all such defenses ultimately rely on client-side heuristics to automatically detect cookies containing session information, to then protect them against theft or otherwise unintended use. While clearly crucial to the effectiveness of the resulting defense mechanisms, these heuristics have not, as yet, undergone any rigorous assessment of their adequacy. In this paper, we conduct the first such formal assessment, based on a gold set of cookies we collect from 70 popular websites of the Alexa ranking. To obtain the gold set, we devise a semi-automatic procedure that draws on a novel notion of authentication token, which we introduce to capture multiple web authentication schemes. We test existing browser-based defenses in the literature against our gold set, unveiling several pitfalls both in the heuristics adopted and in the methods used to assess them. We then propose a new detection method based on supervised learning, where our gold set is used to train a binary classifier, and report on experimental evidence that our method outperforms existing proposals. Interestingly, the resulting classification, together with our hands-on experience in the construction of the gold set, provides new insight on how web authentication is implemented in practice.

requirements engineering foundation for software quality | 2013

Using clustering to improve the structure of natural language requirements documents

Alessio Ferrari; Stefania Gnesi; Gabriele Tolomei

[Context and motivation] System requirements are normally provided in the form of natural language documents. Such documents need to be properly structured, in order to ease the overall uptake of the requirements by the readers of the document. A structure that allows a proper understanding of a requirements document shall satisfy two main quality attributes: (i) requirements relatedness: each requirement is conceptually connected with the requirements in the same section; (ii) sections independence: each section is conceptually separated from the others. [Question/Problem] Automatically identifying the parts of the document that lack requirements relatedness and sections independence may help improve the document structure. [Principal idea/results] To this end, we define a novel clustering algorithm named Sliding Head-Tail Component (S-HTC). The algorithm groups together similar requirements that are contiguous in the requirements document. We claim that such algorithm allows discovering the structure of the document in the way it is perceived by the reader. If the structure originally provided by the document does not match the structure discovered by the algorithm, hints are given to identify the parts of the document that lack requirements relatedness and sections independence. [Contribution] We evaluate the effectiveness of the algorithm with a pilot test on a requirements standard of the railway domain (583 requirements).

international conference on ultra modern telecommunications | 2009

Challenges in designing an interest-based distributed aggregation of users in P2P systems

Matteo Mordacchini; Patrizio Dazzi; Gabriele Tolomei; Ranieri Baraglia; Fabrizio Silvestri; Salvatore Orlando

Most users retrieve and access resources in complex systems, like Distributed Virtual Environments (DVE), or the Web by querying centralized search engines. Such systems normally compute their answers by estimating query-document similarities to rank the results, but also global ranks of the result pages by exploiting the hyperlink Web structure. User interests typically follow a sort of clustering property: users interested in a topic in the past are likely to be interested in these same topic also in the future. It follows that search results considered relevant by a user belonging to a group of homogeneous users will likely also be of interest to other users from the same group. In this paper, we propose the architecture of a peer-to-peer system that exploits a collaborative search mechanism, based on interest similarities among users. The paper discusses the challenges associated with a system in scenarios like DVEs and the Web and based on a self-organized network of users, grouped according to the interests detected by the queries they previously submitted to search engines. The final aim is to enhance the quality of both the results and the experience perceived by users.

theory and practice of digital libraries | 2011

Improving Europeana search experience using query logs

Diego Ceccarelli; Sergiu Gordea; Claudio Lucchese; Franco Maria Nardini; Gabriele Tolomei

Europeana is a long-term project funded by the European Commission with the goal of making Europes cultural and scientific heritage accessible to the public. Since 2008, about 1500 institutions have contributed to Europeana, enabling people to explore the digital resources of Europes museums, libraries and archives. The huge amount of collected multi-lingual multi-media data is made available today through the Europeana portal, a search engine allowing users to explore such content through textual queries. One of the most important techniques for enhancing users search experience in large information spaces, is the exploitation of the knowledge contained in query logs. In this paper we present a characterization of the Europeana query log, showing statistics on common behavioral patterns of the Europeana users. Our analysis highlights some significative differences between the Europeana query log and the historical data collected by general purpose Web Search Engine logs. In particular, we find out that both query and search session distributions show different behaviors. Finally, we use this information for designing a query recommendation technique having the goal of enhancing the functionality of the Europeana portal.

ACM Transactions on The Web | 2015

A Supervised Learning Approach to Protect Client Authentication on the Web

Stefano Calzavara; Gabriele Tolomei; Andrea Casini; Michele Bugliesi; Salvatore Orlando

Browser-based defenses have recently been advocated as an effective mechanism to protect potentially insecure web applications against the threats of session hijacking, fixation, and related attacks. In existing approaches, all such defenses ultimately rely on client-side heuristics to automatically detect cookies containing session information, to then protect them against theft or otherwise unintended use. While clearly crucial to the effectiveness of the resulting defense mechanisms, these heuristics have not, as yet, undergone any rigorous assessment of their adequacy. In this article, we conduct the first such formal assessment, based on a ground truth of 2,464 cookies we collect from 215 popular websites of the Alexa ranking. To obtain the ground truth, we devise a semiautomatic procedure that draws on the novel notion of authentication token, which we introduce to capture multiple web authentication schemes. We test existing browser-based defenses in the literature against our ground truth, unveiling several pitfalls both in the heuristics adopted and in the methods used to assess them. We then propose a new detection method based on supervised learning, where our ground truth is used to train a set of binary classifiers, and report on experimental evidence that our method outperforms existing proposals. Interestingly, the resulting classifiers, together with our hands-on experience in the construction of the ground truth, provide new insight on how web authentication is actually implemented in practice.

international world wide web conferences | 2013

SEED: a framework for extracting social events from press news

Salvatore Orlando; Francesco Pizzolon; Gabriele Tolomei

Everyday people are exchanging a huge amount of data through the Internet. Mostly, such data consist of unstructured texts, which often contain references to structured information (e.g., person names, contact records, etc.). In this work, we propose a novel solution to discover social events from actual press news edited by humans. Concretely, our method is divided in two steps, each one addressing a specific Information Extraction (IE) task: first, we use a technique to automatically recognize four classes of named-entities from press news: DATE, LOCATION, PLACE, and ARTIST. Furthermore, we detect social events by extracting ternary relations between such entities, also exploiting evidence from external sources (i.e., the Web). Finally, we evaluate both stages of our proposed solution on a real-world dataset. Experimental results highlight the quality of our first-step Named-Entity Recognition (NER) approach, which indeed performs consistently with state-of-the-art solutions. Eventually, we show how to precisely select true events from the list of all candidate events (i.e., all the ternary relations), which result from our second-step Relation Extraction (RE) method. Indeed, we discover that true social events can be detected if enough evidence of those is found in the result list of Web search engines.

conference on information and knowledge management | 2013

Twitter anticipates bursts of requests for Wikipedia articles

Gabriele Tolomei; Salvatore Orlando; Diego Ceccarelli; Claudio Lucchese

Most of the tweets that users exchange on Twitter make implicit mentions of named-entities, which in turn can be mapped to corresponding Wikipedia articles using proper Entity Linking (EL) techniques. Some of those become trending entities on Twitter due to a long-lasting or a sudden effect on the volume of tweets where they are mentioned. We argue that the set of trending entities discovered from Twitter may help predict the volume of requests for relating Wikipedia articles. To validate this claim, we apply an EL technique to extract trending entities from a large dataset of public tweets. Then, we analyze the time series derived from the hourly trending score (i.e., an index of popularity) of each entity as measured by Twitter and Wikipedia, respectively. Our results reveals that Twitter actually leads Wikipedia by one or more hours.

Explore More