Esther Pacitti
French Institute for Research in Computer Science and Automation
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Esther Pacitti.
very large data bases | 2000
Esther Pacitti; Eric Simon
Abstract. Many distributed database applications need to replicate data to improve data availability and query response time. The two-phase commit protocol guarantees mutual consistency of replicated data but does not provide good performance. Lazy replication has been used as an alternative solution in several types of applications such as on-line financial transactions and telecommunication systems. In this case, mutual consistency is relaxed and the concept of freshness is used to measure the deviation between replica copies. In this paper, we propose two update propagation strategies that improve freshness. Both of them use immediate propagation: updates to a primary copy are propagated towards a slave node as soon as they are detected at the master node without waiting for the commitment of the update transaction. Our performance study shows that our strategies can improve data freshness by up to five times compared with the deferred approach.
Distributed and Parallel Databases | 2006
Reza Akbarinia; Esther Pacitti; Patrick Valduriez
A major problem of unstructured P2P systems is their heavy network traffic. This is caused mainly by high numbers of query answers, many of which are irrelevant for users. One solution to this problem is to use Top-k queries whereby the user can specify a limited number (k) of the most relevant answers. In this paper, we present FD, a (Fully Distributed) framework for executing Top-k queries in unstructured P2P systems, with the objective of reducing network traffic. FD consists of a family of algorithms that are simple but effective. FD is completely distributed, does not depend on the existence of certain peers, and addresses the volatility of peers during query execution. We validated FD through implementation over a 64-node cluster and simulation using the BRITE topology generator and SimJava. Our performance evaluation shows that FD can achieve major performance gains in terms of communication and response time.
Archive | 2007
Michel J. Daydé; J. M. L. M. Palma; Alvaro L. G. A. Coutinho; Esther Pacitti; João Correia Lopes
1: Grid Computing.- An Opportunistic Algorithm for Scheduling Workflows on Grids.- A Service Oriented System for on Demand Dynamic Structural Analysis over Computational Grids.- Scalable Desktop Grid System.- Analyzing Overheads and Scalability Characteristics of OpenMP Applications.- Parallel Fuzzy c-Means Cluster Analysis.- Peer-to-Peer Models for Resource Discovery in Large-Scale Grids: A Scalable Architecture.- 2: Cluster Computing.- JaceV: A Programming and Execution Environment for Asynchronous Iterative Computations on Volatile Nodes.- Aspect Oriented Pluggable Support for Parallel Computing.- Model for Simulation of Heterogeneous High-Performance Computing Environments.- On Evaluating Decentralized Parallel I/O Scheduling Strategies for Parallel File Systems.- Distributed Security Constrained Optimal Power Flow Integrated to a DSM Based Energy Management System for Real Time Power Systems Security Control.- Metaserver Locality and Scalability in a Distributed NFS.- Top-k Query Processing in the APPA P2P System.- Posterior Task Scheduling Algorithms for Heterogeneous Computing Systems.- Design and Implementation of an Environment for Component-Based Parallel Programming.- Anahy: A Programming Environment for Cluster Computing.- DWMiner: A Tool for Mining Frequent Item Sets Efficiently in Data Warehouses.- A Parallel Implementation of the K Nearest Neighbours Classifier in Three Levels: Threads, MPI Processes and the Grid.- On the Use of the MMC Language to Utilize SIMD Instruction Set.- A Versatile Pipelined Hardware Implementation for Encryption and Decryption Using Advanced Encryption Standard.- 3: Numerical Methods.- Combinatorial Scientific Computing: The Enabling Power of Discrete Algorithms in Computational Science.- Improving the Numerical Simulation of an Airflow Problem with the BlockCGSI Algorithm.- EdgePack: A Parallel Vertex and Node Reordering Package for Optimizing Edge-Based Computations in Unstructured Grids.- Parallel Processing of Matrix Multiplication in a CPU and GPU Heterogeneous Environment.- Robust Two-Level Lower-Order Preconditioners for a Higher-Order Stokes Discretization with Highly Discontinuous Viscosities.- The Impact of Parallel Programming Models on the Performance of Iterative Linear Solvers for Finite Element Applications.- Efficient Parallel Algorithm for Constructing a Unit Triangular Matrix with Prescribed Singular Values.- A Rewriting System for the Vectorization of Signal Transforms.- High Order Fourier-Spectral Solutions to Self Adjoint Elliptic Equations.- Multiresolution Simulations Using Particles.- Evaluation of Several Variants of Explicitly Restarted Lanczos Eigensolvers and Their Parallel Implementations.- PyACTS: A High-Level Framework for Fast Development of High Performance Applications.- Sequential and Parallel Resolution of the Two-Group Transient Neutron Diffusion Equation Using Second-Degree Iterative Methods.- Enhancing the Performance of Multigrid Smoothers in Simultaneous Multithreading Architectures.- Block Iterative Algorithms for the Solution of Parabolic Optimal Control Problems.- Evaluation of Linear Solvers for Astrophysics Transfer Problems.- 4: Large Scale Simulations in Physics.- Scalable Cosmological Simulations on Parallel Machines.- Performance Evaluation of Scientific Applications on Modern Parallel Vector Systems.- Numerical Simulation of Three-Phase Flow in Heterogeneous Porous Media.- Simulation of Laser Propagation in a Plasma with a Frequency Wave Equation.- A Particle Gradient Evolutionary Algorithm Based on Statistical Mechanics and Convergence Analysis.- 5: Computing in Biosciences.- A Computational Framework for Cardiac Modeling Based on Distributed Computing and Web Applications.- Triangular Clique Based Multilevel Approaches to Identify Protein Functional Modules.- BioPortal: A Portal for Deployment of Bioinformatics Applications on Cluster and Grid Environments.- Workshop 1: Computational Grids and Clusters.- Adaptive Distributed Metamodeling.- Distributed General Logging Architecture for Grid Environments.- Interoperability Between UNICORE and ITBL.- Using Failure Injection Mechanisms to Experiment and Evaluate a Grid Failure Detector.- Semantic-Based Service Trading: Application to Linear Algebra.- Management of Services Based on a Semantic Description Within the GRID-TLSE Project.- Extending the Services and Sites of Production Grids by the Support of Advanced Portals.- Workshop 2: High-Performance Data Management in Grid Environments.- PSO-Grid Data Replication Service.- Execution Management of Scientific Models on Computational Grids.- Replica Refresh Strategies in a Database Cluster.- A Practical Evaluation of a Data Consistency Protocol for Efficient Visualization in Grid Applications.- Experiencing Data Grids.
ieee international conference on high performance computing data and analytics | 2004
Patrick Valduriez; Esther Pacitti
Peer-to-peer (P2P) computing offers new opportunities for building highly distributed data systems. Unlike client-server computing, P2P can operate without central coordination and offer important advantages such as a very dynamic environment where peers can join and leave the network at any time; direct and fast communication between peers, and scale up to large number of peers. However, most deployed P2P systems have severe limitations: file-level sharing, read-only access, simple search and poor scaling. In this paper, we discuss the issues of providing high-level data management services (schema, queries, replication, availability, etc.) in a P2P system. This implies revisiting distributed database technology in major ways. We illustrate how we address some of these issues in the APPA data management system under development in the Atlas group.
Journal of Grid Computing | 2007
Esther Pacitti; Patrick Valduriez; Marta Mattoso
Initially developed for the scientific community, Grid computing is now gaining much interest in important areas such as enterprise information systems. This makes data management critical since the techniques must scale up while addressing the autonomy, dynamicity and heterogeneity of the data sources. In this paper, we discuss the main open problems and new issues related to Grid data management. We first recall the main principles behind data management in distributed systems and the basic techniques. Then we make precise the requirements for Grid data management. Finally, we introduce the main techniques needed to address these requirements. This implies revisiting distributed database techniques in major ways, in particular, using P2P techniques.
international conference on parallel and distributed systems | 2005
Cédric Coulon; Esther Pacitti; Patrick Valduriez
In a database cluster, preventive replication can provide strong consistency without the limitations of synchronous replication. However, the original proposal (E. Pacitti et al., 2003) assumes full replication and has performance limitations. In this paper, we address these two limitations in order to scale up to large cluster configurations. Our first contribution is a refreshment algorithm that reduces the delay introduced by the algorithm and prevents inconsistencies for partially replicated databases. Our second contribution is an optimization that improves transaction throughput. We describe the implementation of our algorithm in our RepDB* prototype over a cluster of 64 nodes running PostgreSQL. Our experimental results using the TPC-C benchmark show that our algorithm has excellent scale up and speed up.
european conference on parallel processing | 2003
Esther Pacitti; M. Tamer Özsu; Cédric Coulon
We consider the use of a cluster of PC servers for Application Service Providers where applications and databases must remain autonomous. We use data replication to improve data availability and query load balancing (and thus performance). However, replicating databases at several nodes can create consistency problems, which need to be managed through special protocols. In this paper, we present a lazy preventive data replication solution that assures strong consistency without the constraints of eager replication. We first present a peer-to peer cluster architecture in which we identify the replication manager. Cluster nodes can support autonomous, heterogeneous databases that are considered as black boxes. Then we present the multi-master refresher algorithm and show all system components necessary for implementation. Next we describe our prototype on a cluster of 8 nodes and experimental results that show that our algorithm scales-up and introduces a negligible loss of data freshness (almost equal to mutual consistency).
international conference on management of data | 2007
Reza Akbarinia; Esther Pacitti; Patrick Valduriez
Distributed Hash Tables (DHTs) provide a scalable solution for data sharing in P2P systems. To ensure high data availability, DHTs typically rely on data replication, yet without data currency guarantees. Supporting data currency in replicated DHTs is difficult as it requires the ability to return a current replica despite peers leaving the network or concurrent updates. In this paper, we give a complete solution to this problem. We propose an Update Management Service (UMS) to deal with data availability and efficient retrieval of current replicas based on timestamping. For generating timestamps, we propose a Key-based Timestamping Service (KTS) which performs distributed timestamp generation using local counters. Through probabilistic analysis, we compute the expected number of replicas which UMS must retrieve for finding a current replica. Except for the cases where the availability of current replicas is very low, the expected number of retrieved replicas is typically small, e.g. if at least 35% of available replicas are current then the expected number of retrieved replicas is less than 3. We validated our solution through implementation and experimentation over a 64-node cluster and evaluated its scalability through simulation up to 10,000 peers using SimJava. The results show the effectiveness of our solution. They also show that our algorithm used in UMS achieves major performance gains, in terms of response time and communication cost, compared with a baseline algorithm.
Information Systems | 2007
Stéphane Gançarski; Hubert Naacke; Esther Pacitti; Patrick Valduriez
We consider the use of a database cluster for Application Service Provider (ASP). In the ASP context, applications and databases can be update-intensive and must remain autonomous. In this paper, we describe the Leganet system which performs freshness-aware transaction routing in a database cluster. We use multi-master replication and relaxed replica freshness to increase load balancing. Our transaction routing takes into account freshness requirements of queries at the relation level and uses a cost function that takes into account the cluster load and the cost to refresh replicas to the required level. We implemented the Leganet prototype on an 11-node Linux cluster running Oracle8i. Using experimentation and emulation up to 128 nodes, our validation based on the TPC-C benchmark demonstrates the performance benefits of our approach.
extending database technology | 2009
Manal El Dick; Esther Pacitti; Bettina Kemme
Many websites with a large user base, e.g., websites of nonprofit organizations, do not have the financial means to install large web-servers or use specialized content distribution networks such as Akamai. For those websites, we have developed Flower-CDN, a locality-aware P2P based content-distribution network (CDN) in which the users that are interested in a website support the distribution of its content. The idea is that peers keep the content they retrieve and later serve it to other peers that are close to them in locality. Our architecture is a hybrid between structured and unstructured networks. When a new client requests some content from a website, a locality-aware DHT quickly finds a peer in its neighborhood that has the content available. Additionally, all peers in a given locality that maintain content of a particular website build an unstructured content overlay. Within this overlay, peers gossip information about their content allowing the system to maintain accurate information despite churn. In our performance evaluation, we compare Flower-CDN with an existing P2P-CDN strictly based on DHT and not locality aware. Flower-CDN reduces lookup latency by a factor of 9 and transfer distance by a factor of 2. We also show that Flower-CDNs gossip has low overhead and can be adjusted according to hit ratio requirements and bandwidth availability.