Andrea Marino
University of Pisa
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Andrea Marino.
web search and data mining | 2012
Ugo Scaiella; Paolo Ferragina; Andrea Marino; Massimiliano Ciaramita
Search results clustering (SRC) is a challenging algorithmic problem that requires grouping together the results returned by one or more search engines in topically coherent clusters, and labeling the clusters with meaningful phrases describing the topics of the results included in them. In this paper we propose to solve SRC via an innovative approach that consists of modeling the problem as the labeled clustering of the nodes of a newly introduced graph of topics. The topics are Wikipedia-pages identified by means of recently proposed topic annotators [9, 11, 16, 20] applied to the search results, and the edges denote the relatedness among these topics computed by taking into account the linkage of the Wikipedia-graph. We tackle this problem by designing a novel algorithm that exploits the spectral properties and the labels of that graph of topics. We show the superiority of our approach with respect to academic state-of-the-art work [6] and well-known commercial systems (CLUSTY and LINGO3G) by performing an extensive set of experiments on standard datasets and user studies via Amazon Mechanical Turk. We test several standard measures for evaluating the performance of all systems and show a relative improvement of up to 20%.
Theoretical Computer Science | 2013
Pilu Crescenzi; Roberto Grossi; Michel Habib; Leonardo Lanzi; Andrea Marino
We propose a new algorithm for the classical problem of computing the diameter of undirected unweighted graphs, namely, the maximum distance among all the pairs of nodes, where the distance of a pair of nodes is the number of edges contained in the shortest path connecting these two nodes. Although its worst-case complexity is O(nm) time, where n is the number of nodes and m is the number of edges of the graph, we experimentally show that our algorithm works in O(m) time in practice, requiring few breadth-first searches to complete its task on almost 200 real-world graphs.
international world wide web conferences | 2014
Paolo Boldi; Andrea Marino; Massimo Santini; Sebastiano Vigna
Although web crawlers have been around for twenty years by now, there is virtually no freely available, open-source crawling software that guarantees high throughput, overcomes the limits of single-machine tools and at the same time scales linearly with the amount of resources available. This paper aims at filling this gap.
Signal Processing-image Communication | 2014
Irene Amerini; Roberto Caldelli; Pierluigi Crescenzi; A. Del Mastio; Andrea Marino
Abstract Camera identification is a well known problem in image forensics, addressing the issue to identify the camera a digital image has been shot by. In this paper, we pose our attention to the task of clustering images, belonging to a heterogenous set, in groups coming from the same camera and of doing this in a blind manner; this means that side information neither about the sources nor, above all, about the number of expected clusters is requested. A novel methodology based on Normalized Cuts (NC) criterion is presented and evaluated in comparison with other state-of-the-art techniques, such as Multi-Class Spectral Clustering (MCSC) and Hierarchical Agglomerative Clustering (HAC). The proposed method well fits the problem of blind image clustering because it does not a priori require the knowledge of the amount of classes in which the dataset has to be divided but it needs only a stop threshold; such a threshold has been properly defined by means of a ROC curves approach by relying on the goodness of cluster aggregation. Several experimental tests have been carried out in different operative conditions and the proposed methodology globally presents superior performances in terms of clustering accuracy and robustness as well as a reduced computational burden.
european symposium on algorithms | 2010
Pierluigi Crescenzi; Roberto Grossi; Claudio Imbrenda; Leonardo Lanzi; Andrea Marino
The diameter of an unweighted graph is the maximum pairwise distance among its connected vertices. It is one of the main measures in real-world graphs and complex networks. The double sweep is a simple method to find a lower bound for the diameter. It chooses a random vertex and performs two breadth-first searches (BFSes), returning the maximum length among the shortest paths thus found. We propose an algorithm called fringe, which uses few BFSes to find a matching upper bound for almost all the graphs in our dataset of 44 real-world graphs. In the few graphs it cannot, we perform an exhaustive search of the diameter using a cluster of machines for a total of 40 cores. In all cases, the diameter is surprisingly equal to the lower bound found after very few executions of the double sweep method. The lesson learned is that the latter can be used to find the diameter of real-world graphs in many more cases than expected, and our fringe algorithm can quickly validate this finding for most of them.
symposium on experimental and efficient algorithms | 2012
Pierluigi Crescenzi; Roberto Grossi; Leonardo Lanzi; Andrea Marino
In this paper we propose a new algorithm for computing the diameter of directed unweighted graphs. Even though, in the worst case, this algorithm has complexity O(nm), where n is the number of nodes and m is the number of edges of the graph, we experimentally show that in practice our method works in O(m) time. Moreover, we show how to extend our algorithm to the case of directed weighted graphs and, even in this case, we present some preliminary very positive experimental results.
theory and practice of algorithms in computer systems | 2011
Pierluigi Crescenzi; Roberto Grossi; Leonardo Lanzi; Andrea Marino
The distance for a pair of vertices in a graph G is the length of the shortest path between them. The distance distribution for G specifies how many vertex pairs are at distance h, for all feasible values h. We study three fast randomized algorithms to approximate the distance distribution in large graphs. The Eppstein-Wang (EW) algorithm exploits sampling through a limited (logarithmic) number of Breadth-First Searches (BFSes). The Size-Estimation Framework (SEF) by Cohen employs random ranking and least-element lists to provide several estimators. Finally, the Approximate Neighborhood Function (ANF) algorithm by Palmer, Gibbons, and Faloutsos makes use of the probabilistic counting technique introduced by Flajolet and Martin, in order to estimate the number of distinct elements in a large multiset. We investigate how good is the approximation of the distance distribution, when the three algorithms are run in similar settings. The analysis of ANF derives from the results on the probabilistic counting method, while the one of sef is given by Cohen. For what concerns EW (originally designed for another problem), we extend its simple analysis in order to bound its error with high probability and to show its convergence. We then perform an experimental study on 30 real-world graphs, showing that our implementation of ew combines the accuracy of sef with the performance of ANF.
Emerging Infectious Diseases | 2013
Marino Faccini; Luigi Codecasa; Giorgio Ciconali; Serafina Cammarata; Catia Rosanna Borriello; Costanza De Gioia; Alessandro Za; Andrea Marino; Valentina Vighi; Maurizio Ferrarese; Giovanni P. Gesu; Ester Mazzola; Silvana Castaldi
Investigation of an outbreak of tuberculosis (TB) in a primary school in Milan, Italy, found 15 schoolchildren had active TB disease and 173 had latent TB infection. TB was also identified in 2 homeless men near the school. Diagnostic delay, particularly in the index case-patient, contributed to the transmission of infection.
international colloquium on automata languages and programming | 2016
Alessio Conte; Roberto Grossi; Andrea Marino; Luca Versari
Due to the sheer size of real-world networks, delay and space become quite relevant measures for the cost of enumeration in network analytics. This paper presents efficient algorithms for listing maximum cliques in networks, providing the first sublinear-space bounds with guaranteed delay per enumerated clique, thus comparing favorably with the known literature.
european symposium on algorithms | 2015
Michele Borassi; David Coudert; Pierluigi Crescenzi; Andrea Marino
The (Gromov) hyperbolicity is a topological property of a graph, which has been recently applied in several different contexts, such as the design of routing schemes, network security, computational biology, the analysis of graph algorithms, and the classification of complex networks. Computing the hyperbolicity of a graph can be very time consuming: indeed, the best available algorithm has running-time \(\mathcal{O}(n^{3.69})\), which is clearly prohibitive for big graphs. In this paper, we provide a new and more efficient algorithm: although its worst-case complexity is \(\mathcal{O}(n^4)\), in practice it is much faster, allowing, for the first time, the computation of the hyperbolicity of graphs with up to 200,000 nodes. We experimentally show that our new algorithm drastically outperforms the best previously available algorithms, by analyzing a big dataset of real-world networks. Finally, we apply the new algorithm to compute the hyperbolicity of random graphs generated with the Erdos-Renyi model, the Chung-Lu model, and the Configuration Model.