Péter Urbán
École Polytechnique Fédérale de Lausanne
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Péter Urbán.
international conference on information networking | 2001
Péter Urbán; Xavier Défago; André Schiper
Designing, tuning, and analyzing the performance of distributed algorithms and protocols are complex tasks. A major factor that contributes to this complexity is the fact that there is no single environment to support all phases of the development of a distributed algorithm. This paper presents Neko, an easy to use Java platform that provides a uniform and extensible environment for the various phases of algorithm design and performance evaluation: prototyping, tuning, simulation, deployment, etc.
european dependable computing conference | 2002
Fernando Pedone; André Schiper; Péter Urbán; David Cavin
Agreement problems, such as consensus, atomic broadcast, and group membership, are central to the implementation of fault-tolerant distributed systems. Despite the diversity of algorithms that have been proposed for solving agreement problems in the past years, almost all solutions are Crash-Detection Based (CDB). We say that an algorithm is CDB if it uses some information about the status crashed/not crashed of processes. In this paper, we revisit the issue of non-CDB algorithms considering ordering oracles. Ordering oracles have a theoretical interest as well as a practical interest. To illustrate their use, we present solutions to consensus and atomic broadcast, and evaluate the performance of the atomic broadcast algorithm in a cluster of workstations.
international conference on computer communications and networks | 2000
Péter Urbán; Xavier Défago; André Schiper
Resource contention is widely recognized as having a major impact on the performance of distributed algorithms. Nevertheless, the metrics that are commonly used to predict their performance take little or no account of contention. We define two performance metrics for distributed algorithms that account for network contention as well as CPU contention. We then illustrate the use of these metrics by comparing four atomic broadcast algorithms, and show that our metrics allow for a deeper understanding of performance issues than conventional metrics.
dependable systems and networks | 2003
Péter Urbán; Ilya Shnayderman; André Schiper
Protocols that solve agreement problems are essential building blocks for fault tolerant distributed systems. While many protocols have been published, little has been done to analyze their performance, especially the performance of their fault tolerance mechanisms. In this paper, we present a performance evaluation methodology that can be generalized to analyze many kinds of fault-tolerant algorithms. We use the methodology to compare two atomic broadcast algorithms with different fault tolerance mechanisms: unreliable failure detectors and group membership. We evaluated the steady state latency in (1) runs with neither crashes nor suspicions, (2) runs with crashes and (3) runs with no crashes in which correct processes are wrongly suspected to have crashed, as well as (4) the transient latency after a crash. We found that the two algorithms have the same performance in Scenario 1, and that the group membership based algorithm has an advantage in terms of performance and resiliency in Scenario 2, whereas the failure detector based algorithm offers better performance in the other scenarios. We discuss the implications of our results to the design of fault tolerant distributed systems.
dependable systems and networks | 2002
A. Coccoli; Péter Urbán; Andrea Bondavalli; André Schiper
Protocols which solve agreement problems are essential building blocks for fault tolerant distributed applications. While many protocols have been published, little has been done to analyze their performance. This paper represents a starting point for such studies, by focusing on the consensus problem, a problem related to most other agreement problems. The paper analyzes the latency of a consensus algorithm designed for the asynchronous model with failure detectors, by combining experiments on a cluster of PCs and simulation using stochastic activity networks. We evaluated the latency in runs (1) with no failures nor failure suspicions, (2) with failures but no wrong suspicions and (3) with no failures but with (wrong) failure suspicions. We validated the adequacy and the usability of the stochastic activity network model by comparing experimental results with those obtained from the model. This has led us to identify limitations of the model and the measurements, and suggests new directions for evaluating the performance of agreement protocols.
international conference on parallel and distributed systems | 2002
Richard Ekwall; Péter Urbán; André Schiper
When processes on two different machines communicate, they most often do so using the TCP protocol. While TCP is appropriate for a wide range of applications, it has shortcomings in other application areas. One of these areas is fault tolerant distributed computing. For some of those applications, TCP does not address link failures adequately: TCP breaks the connection if connectivity is lost for some duration (typically minutes). This is sometimes undesirable. The paper proposes robust TCP connections, a solution to the problem of broken TCP connections. The paper presents a session layer protocol on top of TCP that ensures reconnection, and provides exactly-once delivery for all transmitted data. A prototype has been implemented as a Java library. The prototype has less than 10% overhead on TCP sockets with respect to the most important performance figures.
symposium on reliable distributed systems | 2001
Péter Urbán; Xavier Défago; André Schiper
Fault tolerance can be achieved in distributed systems by replication. However Fischer, Lynch and Paterson (1985) have proven an impossibility result about consensus in the asynchronous system model, and similar impossibility results exist for atomic broadcast and group membership. We investigate, with the aid of an experiment conducted in a LAN, whether these impossibility results set limits to the robustness of a replicated server exposed to extremely high loads. The experiment consists of client processes that send requests to a replicated server (three replicas) using an atomic broadcast primitive. It has parameters that allow us to control the load on the hosts and the network, as well as the timeout value used by our heartbeat failure detection mechanism. Our main observation is that the atomic broadcast algorithm never stops delivering messages, not even under arbitrarily high load and very small timeout values (1 ms). So, by trying to illustrate the practical impact of impossibility results, we discovered that we had implemented a very robust replicated service.
european dependable computing conference | 1996
Jörn Altmann; András Pataricza; Tamás Bartha; Péter Urbán; A. Petri
The paper presents a novel modelling technique for system-level fault diagnosis in massive parallel multiprocessors, based on a re-formulation of the problem of syndrome decoding to a constraint satisfaction problem (CSP). The CSP based approach is able to handle detailed and inhomogeneous functional fault models on a similar level as the Russel-Kime model [18]. Multiple-valued logic is used to describe system components having multiple fault modes. The granularity of the models can be adjusted to the diagnostic resolution of the target without altering the methodology. Two algorithms for the Parsytec GCel massively parallel system are used as illustrations in the paper: the centralized method uses a detailed system model, and provides a fine-granular diagnostic image for off-line evaluation. The distributed method makes fast decisions for reconfiguration control, using a simplified model.
Archive | 2000
Xavier Défago; André Schiper; Péter Urbán
The International Arab Journal of Information Technology | 2002
Naohiro Hayashibara; Péter Urbán; André Schiper; Takuya Katayama