Alberto Tonon | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alberto Tonon is active.

Explore More

Publication

Featured researches published by Alberto Tonon.

international acm sigir conference on research and development in information retrieval | 2012

Combining inverted indices and structured search for ad-hoc object retrieval

Alberto Tonon; Gianluca Demartini; Philippe Cudré-Mauroux

Retrieving semi-structured entities to answer keyword queries is an increasingly important feature of many modern Web applications. The fast-growing Linked Open Data (LOD) movement makes it possible to crawl and index very large amounts of structured data describing hundreds of millions of entities. However, entity retrieval approaches have yet to find efficient and effective ways of ranking and navigating through those large data sets. In this paper, we address the problem of Ad-hoc Object Retrieval over large-scale LOD data by proposing a hybrid approach that combines IR and structured search techniques. Specifically, we propose an architecture that exploits an inverted index to answer keyword queries as well as a semi-structured database to improve the search effectiveness by automatically generating queries over the LOD graph. Experimental results show that our ranking algorithms exploiting both IR and graph indices outperform state-of-the-art entity retrieval techniques by up to 25% over the BM25 baseline.

international semantic web conference | 2013

TRank: Ranking Entity Types Using the Web of Data

Alberto Tonon; Michele Catasta; Gianluca Demartini; Philippe Cudré-Mauroux; Karl Aberer

Much of Web search and browsing activity is today centered around entities. For this reason, Search Engine Result Pages (SERPs) increasingly contain information about the searched entities such as pictures, short summaries, related entities, and factual information. A key facet that is often displayed on the SERPs and that is instrumental for many applications is the entity type. However, an entity is usually not associated to a single generic type in the background knowledge bases but rather to a set of more specific types, which may be relevant or not given the document context. For example, one can find on the Linked Open Data cloud the fact that Tom Hanks is a person, an actor, and a person from Concord, California. All those types are correct but some may be too general to be interesting (e.g., person), while other may be interesting but already known to the user (e.g., actor), or may be irrelevant given the current browsing context (e.g., person from Concord, California). In this paper, we define the new task of ranking entity types given an entity and its context. We propose and evaluate new methods to find the most relevant entity type based on collection statistics and on the graph structure interconnecting entities and types. An extensive experimental evaluation over several document collections at different levels of granularity (e.g., sentences, paragraphs, etc.) and different type hierarchies (including DBPedia, Freebase, and schema.org) shows that hierarchy-based approaches provide more accurate results when picking entity types to be displayed to the end-user while still being highly scalable.

Information Retrieval | 2015

Pooling-based continuous evaluation of information retrieval systems

Alberto Tonon; Gianluca Demartini; Philippe Cudré-Mauroux

The dominant approach to evaluate the effectiveness of information retrieval (IR) systems is by means of reusable test collections built following the Cranfield paradigm. In this paper, we propose a new IR evaluation methodology based on pooled test-collections and on the continuous use of either crowdsourcing or professional editors to obtain relevance judgements. Instead of building a static collection for a finite set of systems known a priori, we propose an IR evaluation paradigm where retrieval approaches are evaluated iteratively on the same collection. Each new retrieval technique takes care of obtaining its missing relevance judgements and hence contributes to augmenting the overall set of relevance judgements of the collection. We also propose two metrics: Fairness Score, and opportunistic number of relevant documents, which we then use to define new pooling strategies. The goal of this work is to study the behavior of standard IR metrics, IR system ranking, and of several pooling techniques in a continuous evaluation context by comparing continuous and non-continuous evaluation results on classic test collections. We both use standard and crowdsourced relevance judgements, and we actually run a continuous evaluation campaign over several existing IR systems.

Journal of Web Semantics | 2014

B-hist

Michele Catasta; Alberto Tonon; Gianluca Demartini; Jean-Eudes Ranvier; Karl Aberer; Philippe Cudré-Mauroux

Web Search is increasingly entity centric; as a large fraction of common queries target specific entities, search results get progressively augmented with semi-structured and multimedia information about those entities. However, search over personal web browsing history still revolves around keyword-search mostly. In this paper, we present a novel approach to answer queries over web browsing logs that takes into account entities appearing in the web pages, user activities, as well as temporal information. Our system, B-hist, aims at providing web users with an effective tool for searching and accessing information they previously looked up on the web by supporting multiple ways of filtering results using clustering and entity-centric search. In the following, we present our system and motivate our User Interface (UI) design choices by detailing the results of a survey on web browsing and history search. In addition, we present an empirical evaluation of our entity-based approach used to cluster web pages.

Journal of Web Semantics | 2016

Contextualized ranking of entity types based on knowledge graphs

Alberto Tonon; Michele Catasta; Roman Prokofyev; Gianluca Demartini; Karl Aberer; Philippe Cudré-Mauroux

A large fraction of online queries targets entities. For this reason, Search Engine Result Pages (SERPs) increasingly contain information about the searched entities such as pictures, short summaries, related entities, and factual information. A key facet that is often displayed on the SERPs and that is instrumental for many applications is the entity type. However, an entity is usually not associated to a single generic type in the background knowledge graph but rather to a set of more specific types, which may be relevant or not given the document context. For example, one can find on the Linked Open Data cloud the fact that Tom Hanks is a person, an actor, and a person from Concord, California. All these types are correct but some may be too general to be interesting (e.g., person), while other may be interesting but already known to the user (e.g., actor), or may be irrelevant given the current browsing context (e.g., person from Concord, California). In this paper, we define the new task of ranking entity types given an entity and its context. We propose and evaluate new methods to find the most relevant entity type based on collection statistics and on the knowledge graph structure interconnecting entities and types. An extensive experimental evaluation over several document collections at different levels of granularity (e.g., sentences, paragraphs) and different type hierarchies (including DBpedia, Freebase, and schema.org) shows that hierarchy-based approaches provide more accurate results when picking entity types to be displayed to the end-user.

international semantic web conference | 2015

SANAPHOR: Ontology-Based Coreference Resolution

Roman Prokofyev; Alberto Tonon; Michael Luggen; Loic Vouilloz; Djellel Eddine Difallah; Philippe Cudré-Mauroux

We tackle the problem of resolving coreferences in textual content by leveraging Semantic Web techniques. Specifically, we focus on noun phrases that coreference identifiable entities that appear in the text; the challenge in this context is to improve the coreference resolution by leveraging potential semantic annotations that can be added to the identified mentions. Our system, SANAPHOR, first applies state-of-the-art techniques to extract entities, noun phrases, and candidate coreferences. Then, we propose an approach to type noun phrases using an inverted index built on top of a Knowledge Graph e.g., DBpedia. Finally, we use the semantic relatedness of the introduced types to improve the state-of-the-art techniques by splitting and merging coreference clusters. We evaluate SANAPHOR on CoNLL datasets, and show how our techniques consistently improve the state of the art in coreference resolution.

european semantic web conference | 2017

ArmaTweet : Detecting Events by Semantic Tweet Analysis

Alberto Tonon; Philippe Cudré-Mauroux; Albert Blarer; Vincent Lenders; Boris Motik

Armasuisse Science and Technology, the R&D agency for the Swiss Armed Forces, is developing a Social Media Analysis (SMA) system to help detect events such as natural disasters and terrorist activity by analysing Twitter posts. The system currently supports only keyword search, which cannot identify complex events such as ‘politician dying’ or ‘militia terror act’ since the keywords that correctly identify such events are typically unknown. In this paper we present ArmaTweet, an extension of SMA developed in a collaboration between armasuisse and the Universities of Fribourg and Oxford that supports semantic event detection. Our system extracts a structured representation from the tweets’ text using NLP technology, which it then integrates with DBpedia and WordNet in an RDF knowledge graph. Security analysts can thus describe the events of interest precisely and declaratively using SPARQL queries over the graph. Our experiments show that ArmaTweet can detect many complex events that cannot be detected by keywords alone.

international semantic web conference | 2016

VoldemortKG: Mapping schema.org and Web Entities to Linked Open Data

Alberto Tonon; Victor Felder; Djellel Eddine Difallah; Philippe Cudré-Mauroux

Increasingly, webpages mix entities coming from various sources and represented in different ways. It can thus happen that the same entity is both described by using schema.org annotations and by creating a text anchor pointing to its Wikipedia page. Often, those representations provide complementary information which is not exploited since those entities are disjoint. We explored the extent to which entities represented in different ways repeat on the Web, how they are related, and how they complement (or link) to each other. Our initial experiments showed that we can unveil a previously unexploited knowledge graph by applying simple instance matching techniques on a large collection of schema.org annotations and Wikipedia. The resulting knowledge graph aggregates entities (often tail entities) scattered across several webpages, and complements existing Wikipedia entities with new facts and properties. In order to facilitate further investigation in how to mine such information, we are releasing (i) an excerpt of all Common Crawl webpages containing both Wikipedia and schema.org annotations, (ii) the toolset to extract this information and perform knowledge graph construction and mapping onto DBpedia, as well as (iii) the resulting knowledge graph (VoldemortKG) obtained via label matching techniques.

international world wide web conferences | 2014

Hippocampus: answering memory queries using transactive search

Michele Catasta; Alberto Tonon; Djellel Eddine Difallah; Gianluca Demartini; Karl Aberer; Philippe Cudré-Mauroux

Memory queries denote queries where the user is trying to recall from his/her past personal experiences. Neither Web search nor structured queries can effectively answer this type of queries, even when supported by Human Computation solutions. In this paper, we propose a new approach to answer memory queries that we call Transactive Search: The user-requested memory is reconstructed from a group of people by exchanging pieces of personal memories in order to reassemble the overall memory, which is stored in a distributed fashion among members of the group. We experimentally compare our proposed approach against a set of advanced search techniques including the use of Machine Learning methods over the Web of Data, online Social Networks, and Human Computation techniques. Experimental results show that Transactive Search significantly outperforms the effectiveness of existing search approaches for memory queries.

World Wide Web | 2018

CrimeTelescope: crime hotspot prediction based on urban and social media data fusion

Dingqi Yang; Terence Heaney; Alberto Tonon; Leye Wang; Philippe Cudré-Mauroux

Crime is a complex social issue impacting a considerable number of individuals within a society. Preventing and reducing crime is a top priority in many countries. Given limited policing and crime reduction resources, it is often crucial to identify effective strategies to deploy the available resources. Towards this goal, crime hotspot prediction has previously been suggested. Crime hotspot prediction leverages past data in order to identify geographical areas susceptible of hosting crimes in the future. However, most of the existing techniques in crime hotspot prediction solely use historical crime records to identify crime hotspots, while ignoring the predictive power of other data such as urban or social media data. In this paper, we propose CrimeTelescope, a platform that predicts and visualizes crime hotspots based on a fusion of different data types. Our platform continuously collects crime data as well as urban and social media data on the Web. It then extracts key features from the collected data based on both statistical and linguistic analysis. Finally, it identifies crime hotspots by leveraging the extracted features, and offers visualizations of the hotspots on an interactive map. Based on real-world data collected from New York City, we show that combining different types of data can effectively improve the crime hotspot prediction accuracy (by up to 5.2%), compared to classical approaches based on historical crime records only. In addition, we demonstrate the usability of our platform through a System Usability Scale (SUS) survey on a full prototype of CrimeTelescope.

Explore More