Mauro Sozio | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mauro Sozio is active.

Explore More

Publication

Featured researches published by Mauro Sozio.

international world wide web conferences | 2009

SOFIE: a self-organizing framework for information extraction

Fabian M. Suchanek; Mauro Sozio; Gerhard Weikum

This paper presents SOFIE, a system for automated ontology extension. SOFIE can parse natural language documents, extract ontological facts from them and link the facts into an ontology. SOFIE uses logical reasoning on the existing knowledge and on the new knowledge in order to disambiguate words to their most probable meaning, to reason on the meaning of text patterns and to take into account world knowledge axioms. This allows SOFIE to check the plausibility of hypotheses and to avoid inconsistencies with the ontology. The framework of SOFIE unites the paradigms of pattern matching, word sense disambiguation and ontological reasoning in one unified model. Our experiments show that SOFIE delivers high-quality output, even from unstructured Internet documents.

knowledge discovery and data mining | 2010

The community-search problem and how to plan a successful cocktail party

Mauro Sozio; Aristides Gionis

A lot of research in graph mining has been devoted in the discovery of communities. Most of the work has focused in the scenario where communities need to be discovered with only reference to the input graph. However, for many interesting applications one is interested in finding the community formed by a given set of nodes. In this paper we study a query-dependent variant of the community-detection problem, which we call the community-search problem: given a graph G, and a set of query nodes in the graph, we seek to find a subgraph of G that contains the query nodes and it is densely connected. We motivate a measure of density based on minimum degree and distance constraints, and we develop an optimum greedy algorithm for this measure. We proceed by characterizing a class of monotone constraints and we generalize our algorithm to compute optimum solutions satisfying any set of monotone constraints. Finally we modify the greedy algorithm and we present two heuristic algorithms that find communities of size no greater than a specified upper bound. Our experimental evaluation on real datasets demonstrates the efficiency of the proposed algorithms and the quality of the solutions we obtain.

workshop on algorithms in bioinformatics | 2004

Fast Hare: A Fast Heuristic for Single Individual SNP Haplotype Reconstruction

Alessandro Panconesi; Mauro Sozio

We study the single individual SNP haplotype reconstruction problem. We introduce a simple heuristic and prove experimentally that is very fast and accurate. In particular, when compared with a dynamic programming of [8] it is much faster and also more accurate. We expect Fast Hare to be very useful in practical applications. We also introduce a combinatorial problem related to the SNP haplotype reconstruction problem that we call Min Element Removal. We prove its NP-hardness in the gapless case and its O(log n)-approximability in the general case.

international conference on data engineering | 2009

STAR: Steiner-Tree Approximation in Relationship Graphs

Gjergji Kasneci; Maya Ramanath; Mauro Sozio; Fabian M. Suchanek; Gerhard Weikum

Large graphs and networks are abundant in modern information systems: entity-relationship graphs over relational data or Web-extracted entities, biological networks, social online communities, knowledge bases, and many more. Often such data comes with expressive node and edge labels that allow an interpretation as a semantic graph, and edge weights that reflect the strengths of semantic relations between entities. Finding close relationships between a given set of two, three, or more entities is an important building block for many search, ranking, and analysis tasks. From an algorithmic point of view, this translates into computing the best Steiner trees between the given nodes, a classical NP-hard problem. In this paper, we present a new approximation algorithm, coined STAR, for relationship queries over large relationship graphs. We prove that for n query entities, STAR yields an O(log(n))-approximation of the optimal Steiner tree in pseudopolynomial run-time, and show that in practical cases the results returned by STAR are qualitatively comparable to or even better than the results returned by a classical 2-approximation algorithm. We then describe an extension to our algorithm to return the top-k Steiner trees. Finally, we evaluate our algorithm over both main-memory as well as completely diskresident graphs containing millions of nodes. Our experiments show that in terms of efficiency STAR outperforms the best state-of-the-art database methods by a large margin, and also returns qualitatively better results.

very large data bases | 2011

Social content matching in MapReduce

Gianmarco De Francisci Morales; Aristides Gionis; Mauro Sozio

Matching problems are ubiquitous. They occur in economic markets, labor markets, internet advertising, and elsewhere. In this paper we focus on an application of matching for social media. Our goal is to distribute content from information suppliers to information consumers. We seek to maximize the overall relevance of the matched content from suppliers to consumers while regulating the overall activity, e.g., ensuring that no consumer is overwhelmed with data and that all suppliers have chances to deliver their content. We propose two matching algorithms, GreedyMR and StackMR, geared for the MapReduce paradigm. Both algorithms have provable approximation guarantees, and in practice they produce high-quality solutions. While both algorithms scale extremely well, we can show that Stack-MR requires only a poly-logarithmic number of MapReduce steps, making it an attractive option for applications with very large datasets. We experimentally show the trade-offs between quality and efficiency of our solutions on two large datasets coming from real-world social-media web sites.

symposium on principles of database systems | 2008

Near-optimal dynamic replication in unstructured peer-to-peer networks

Mauro Sozio; Thomas Neumann; Gerhard Weikum

Replicating data in distributed systems is often needed for availability and performance. In unstructured peer-to-peer networks, with epidemic messaging for query routing, replicating popular data items is also crucial to ensure high probability of finding the data within a bounded search distance from the requestor. This paper considers such networks and aims to maximize the probability of successful search. Prior work along these lines has analyzed the optimal degrees of replication for data items with non-uniform but global request rates, but did not address the issue of where replicas should be placed and was very very limited in the capabilities for handling heterogeneity and dynamics of network and workload. This paper presents the integrated P2R2 algorithm for dynamic replication that addresses all these issues, and determines both the degrees of replication and the placement of the replicas in a provably near-optimal way. We prove that the P2R2 algorithm can guarantee a successful-search probability that is within a factor of 2 of the optimal solution. The algorithm is efficient and can handle workload evolution. We prove that, whenever the access patterns are in steady state, our algorithm converges to the desired near-optimal placement. We further show by simulations that the convergence rate is fast and that our algorithm outperforms prior methods.

principles of distributed computing | 2005

Primal-dual based distributed algorithms for vertex cover with semi-hard capacities

Fabrizio Grandoni; Jochen Könemann; Alessandro Panconesi; Mauro Sozio

In this paper we consider the weighted, capacitated vertex cover problem with hard capacities (capVC). Here, we are given an undirected graph G=(V,E), non-negative vertex weights wt<inf>v</inf> for all vertices v ∈ V, and node-capacities B<inf>v</inf> ≥ 1 for all v ∈ V. A feasible solution to a given capVC instance consists of a vertex cover C ⊆ V. Each edge e ∈ E is assigned to one of its endpoints in C and the number of edges assigned to any vertex v ∈ C is at most B<inf>v</inf>. The goal is to minimize the total weight of C.For a parameter ε>0 we give a deterministic, distributed algorithm for the capVC problem that computes a vertex cover C of weight at most (2+ε) • opt where opt is the weight of a minimum-weight feasible solution to the given instance. The number of edges assigned to any node v ∈ C is at most (4+ε)• B<inf>v</inf>. The running time of our algorithm is O(log (n W)/ε), where n is the number of nodes in the network and W=wt<inf>max</inf>/weight<inf>min</inf> is the ratio of largest to smallest weight.This result is complemented by a lower-bound saying that any distributed algorithm for capVC which requires a poly-logarithmic number of rounds is bound to violate the capacity constraints by a factor two.The main feature of the algorithm is that it is derived in a systematic fashion starting from a primal-dual sequential algorithm.

Distributed Computing | 2010

Fast primal-dual distributed algorithms for scheduling and matching problems

Alessandro Panconesi; Mauro Sozio

In this paper we give efficient distributed algorithms computing approximate solutions to general scheduling and matching problems. All approximation guarantees are within a constant factor of the optimum. By “efficient”, we mean that the number of communication rounds is poly-logarithmic in the size of the input. In the scheduling problem, we have a bipartite graph with computing agents on one side and resources on the other. Agents that share a resource can communicate in one time step. Each agent has a list of jobs, each with its own length and profit, to be executed on a neighbouring resource within a given time-window. Each job is also associated with a rational number in the range between zero and one (width), specifying the amount of resource required by the job. Resources can execute non preemptively multiple jobs whose total width at any given time is at most one. The goal is to maximize the profit of the jobs that are scheduled. We then adapt our algorithm for scheduling, to solve the weighted b-matching problem, which is the generalization of the weighted matching problem where for each vertex v, at most b(v) edges incident to v, can be included in the matching. For this problem we obtain a randomized distributed algorithm with approximation guarantee of

SIAM Journal on Computing | 2008