Philippe Cudré-Mauroux

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Philippe Cudré-Mauroux is active.

Explore More

Publication

Featured researches published by Philippe Cudré-Mauroux.

international conference on management of data | 2003

P-Grid: a self-organizing structured P2P system

Karl Aberer; Philippe Cudré-Mauroux; Anwitaman Datta; Zoran Despotovic; Manfred Hauswirth; Magdalena Punceva; Roman Schmidt

1 Self-organizing Structured P2P Systems In the P2P community a fundamental distinction is made among unstructured and structured P2P systems for resource location. In unstructured P2P systems in principle peers are unaware of the resources that neighboring peers in the overlay networks maintain. Typically they resolve search requests by flooding techniques. Gnutella [9] is the most prominent example of this class. In contrast, in structured P2P systems peers maintain information about what resources neighboring peers offer. Thus queries can be directed and in consequence substantially fewer messages are needed. This comes at the cost of increased maintenance efforts during changes in the overlay network as a result of peers joining or leaving. The most prominent class of approaches to structured P2P systems are distributed hash tables (DHT), for example Chord [17]. Unstructured P2P systems have generated substantial interest because of emergent globalscale phenomena. For example, the Gnutella overlay network exhibits the following characteristics [15]: 1. The network has a small diameter, which ensures that a message flooding approach for search works with a relatively low timeto-life (approximately 7). 2. The node degrees of the overlay network follow a power-law distribution. Thus few peers have a large number of incoming links whereas most peers have a very low number of such links. These properties result from the way Gnutella performs network maintenance: each peer maintains a fixed number of active links. Using the network maintenance protocol a peer discovers new peers in the network by flooding discovery

international world wide web conferences | 2012

ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking

Gianluca Demartini; Djellel Eddine Difallah; Philippe Cudré-Mauroux

We tackle the problem of entity linking for large collections of online pages; Our system, ZenCrowd, identifies entities from natural language text using state of the art techniques and automatically connects them to the Linked Open Data cloud. We show how one can take advantage of human intelligence to improve the quality of the links by dynamically generating micro-tasks on an online crowdsourcing platform. We develop a probabilistic framework to make sensible decisions about candidate links and to identify unreliable human workers. We evaluate ZenCrowd in a real deployment and show how a combination of both probabilistic reasoning and crowdsourcing techniques can significantly improve the quality of the links, while limiting the amount of work performed by the crowd.

international semantic web conference | 2004

GridVine: building internet-scale semantic overlay networks

Karl Aberer; Philippe Cudré-Mauroux; Manfred Hauswirth; Tim Van Pelt

This paper addresses the problem of building scalable semantic overlay networks. Our approach follows the principle of data independence by separating a logical layer, the semantic overlay for managing and mapping data and metadata schemas, from a physical layer consisting of a structured peer-to-peer overlay network for efficient routing of messages. The physical layer is used to implement various functions at the logical layer, including attribute-based search, schema management and schema mapping management. The separation of a physical from a logical layer allows us to process logical operations in the semantic overlay using different physical execution strategies. In particular we identify iterative and recursive strategies for the traversal of semantic overlay networks as two important alternatives. At the logical layer we support semantic interoperability through schema inheritance and Semantic Gossiping. Thus our system provides a complete solution to the implementation of semantic overlay networks supporting both scalability and interoperability.

very large data bases | 2010

HYRISE: a main memory hybrid storage engine

Martin Grund; Jens H. Krüger; Hasso Plattner; Alexander Zeier; Philippe Cudré-Mauroux; Samuel Madden

In this paper, we describe a main memory hybrid database system called HYRISE, which automatically partitions tables into vertical partitions of varying widths depending on how the columns of the table are accessed. For columns accessed as a part of analytical queries (e.g., via sequential scans), narrow partitions perform better, because, when scanning a single column, cache locality is improved if the values of that column are stored contiguously. In contrast, for columns accessed as a part of OLTP-style queries, wider partitions perform better, because such transactions frequently insert, delete, update, or access many of the fields of a row, and co-locating those fields leads to better cache locality. Using a highly accurate model of cache misses, HYRISE is able to predict the performance of different partitionings, and to automatically select the best partitioning using an automated database design algorithm. We show that, on a realistic workload derived from customer applications, HYRISE can achieve a 20% to 400% performance improvement over pure all-column or all-row designs, and that it is both more scalable and produces better designs than previous vertical partitioning approaches for main memory systems.

international world wide web conferences | 2003

The chatty web: emergent semantics through gossiping

Karl Aberer; Philippe Cudré-Mauroux; Manfred Hauswirth

This paper describes a novel approach for obtaining semantic interoperability among data sources in a bottom-up, semi-automatic manner without relying on pre-existing, global semantic models. We assume that large amounts of data exist that have been organized and annotated according to local schemas. Seeing semantics as a form of agreement, our approach enables the participating data sources to incrementally develop global agreement in an evolutionary and completely decentralized process that solely relies on pair-wise, local interactions: Participants provide translations between schemas they are interested in and can learn about other translations by routing queries (gossiping). To support the participants in assessing the semantic quality of the achieved agreements we develop a formal framework that takes into account both syntactic and semantic criteria. The assessment process is incremental and the quality ratings are adjusted along with the operation of the system. Ultimately, this process results in global agreement, i.e., the semantics that all participants understand. We discuss strategies to efficiently find translations and provide results from a case study to justify our claims. Our approach applies to any system which provides a communication infrastructure (existing websites or databases, decentralized systems, P2P systems) and offers the opportunity to study semantic interoperability as a global phenomenon in a network of information sharing parties.

very large data bases | 2009

A demonstration of SciDB: a science-oriented DBMS

Philippe Cudré-Mauroux; Hideaki Kimura; Kian-Tat Lim; Jennie Rogers; Roman Simakov; Emad Soroush; Pavel Velikhov; Daniel L. Wang; Magdalena Balazinska; Jacek Becla; David J. DeWitt; Bobbi Heath; David Maier; Samuel Madden; Jignesh M. Patel; Michael Stonebraker; Stanley B. Zdonik

In CIDR 2009, we presented a collection of requirements for SciDB, a DBMS that would meet the needs of scientific users. These included a nested-array data model, science-specific operations such as regrid, and support for uncertainty, lineage, and named versions. In this paper, we present an overview of SciDBs key features and outline a demonstration of the first version of SciDB on data and operations from one of our lighthouse users, the Large Synoptic Survey Telescope (LSST).

database systems for advanced applications | 2004

Emergent semantics principles and issues

Karl Aberer; Philippe Cudré-Mauroux; Aris M. Ouksel; Tiziana Catarci; Mohand-Said Hacid; Arantza Illarramendi; Vipul Kashyap; Massimo Mecella; Eduardo Mena; Erich J. Neuhold; Olga De Troyer; Thomas Risse; Monica Scannapieco; Fèlix Saltor; Luca De Santis; Stefano Spaccapietra; Steifen Staab; Rudi Studer

Information and communication infrastructures underwent a rapid and extreme decentralization process over the past decade: From a world of statically and partially connected central servers rose an intricate web of millions of information sources loosely connecting one to another. Today, we expect to witness the extension of this revolution with the wide adoption of meta-data standards like RDF or OWL underpinning the creation of a semantic web. Again, we hope for global properties to emerge from a multiplicity of pair-wise, local interactions, resulting eventually in a self-stabilizing semantic infrastructure. This paper represents an effort to summarize the conditions under which this revolution would take place as well as an attempt to underline its main properties, limitations and possible applications.

international conference on data engineering | 2010

TrajStore: An adaptive storage system for very large trajectory data sets

Philippe Cudré-Mauroux; Eugene Wu; Samuel Madden

The rise of GPS and broadband-speed wireless devices has led to tremendous excitement about a range of applications broadly characterized as “location based services”. Current database storage systems, however, are inadequate for manipulating the very large and dynamic spatio-temporal data sets required to support such services. Proposals in the literature either present new indices without discussing how to cluster data, potentially resulting in many disk seeks for lookups of densely packed objects, or use static quadtrees or other partitioning structures, which become rapidly suboptimal as the data or queries evolve. As a result of these performance limitations, we built TrajStore, a dynamic storage system optimized for efficiently retrieving all data in a particular spatiotemporal region. TrajStore maintains an optimal index on the data and dynamically co-locates and compresses spatially and temporally adjacent segments on disk. By letting the storage layer evolve with the index, the system adapts to incoming queries and data and is able to answer most queries via a very limited number of I/Os, even when the queries target regions containing hundreds or thousands of different trajectories.

international world wide web conferences | 2013

Pick-a-crowd: tell me what you like, and i'll tell you what to do

Djellel Eddine Difallah; Gianluca Demartini; Philippe Cudré-Mauroux

Crowdsourcing allows to build hybrid online platforms that combine scalable information systems with the power of human intelligence to complete tasks that are difficult to tackle for current algorithms. Examples include hybrid database systems that use the crowd to fill missing values or to sort items according to subjective dimensions such as picture attractiveness. Current approaches to Crowdsourcing adopt a pull methodology where tasks are published on specialized Web platforms where workers can pick their preferred tasks on a first-come-first-served basis. While this approach has many advantages, such as simplicity and short completion times, it does not guarantee that the task is performed by the most suitable worker. In this paper, we propose and extensively evaluate a different Crowdsourcing approach based on a push methodology. Our proposed system carefully selects which workers should perform a given task based on worker profiles extracted from social networks. Workers and tasks are automatically matched using an underlying categorization structure that exploits entities extracted from the task descriptions on one hand, and categories liked by the user on social platforms on the other hand. We experimentally evaluate our approach on tasks of varying complexity and show that our push methodology consistently yield better results than usual pull strategies.

IEEE Internet Computing | 2007

GridVine: An Infrastructure for Peer Information Management

Philippe Cudré-Mauroux; Suchit Agarwal; Karl Aberer

GridVine is a semantic overlay infrastructure based on a peer-to-peer (P2P) access structure. Built following the principle of data independence, it separates a logical layer - in which data, schemas, and schema mappings are managed - from a physical layer consisting of a structured P2P network supporting decentralized indexing, key load-balancing, and efficient routing. The system is decentralized, yet fosters semantic interoperability through pair-wise schema mappings and query reformulation. GridVines heterogeneous but semantically related information sources can be queried transparently using iterative query reformulation. The authors discuss a reference implementation of the system and several mechanisms for resolving queries collaboratively.

Explore More