Flaviu Cristian
University of California, San Diego
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Flaviu Cristian.
Distributed Computing | 1989
Flaviu Cristian
A probabilistic method is proposed for reading remote clocks in distributed systems subject to unbounded random communication delays. The method can achieve clock synchronization precisions superior to those attainable by previously published clock synchronization algorithms. Its use is illustrated by presenting a time service which maintains externally (and hence, internally) synchronized clocks in the presence of process, communication and clock failures.
IEEE Transactions on Parallel and Distributed Systems | 1999
Flaviu Cristian; Christof Fetzer
We propose a formal definition for thetimed asynchronous distributed system model. We present extensive measurements of actual message and process scheduling delays and hardware clock drifts. These...We propose a formal definition for the timed asynchronous distributed system model. We present extensive measurements of actual message and process scheduling delays and hardware clock drifts. These measurements confirm that this model adequately describes current distributed systems such as a network of workstations. We also give an explanation of why practically needed services, such as consensus or leader election, which are not implementable in the time-free model, are implementable in the timed asynchronous system model.
Distributed Computing | 1991
Flaviu Cristian
Reaching agreement on the identity of correctly functioning processors of a distributed system in the presence of random communication delays, failures and processor joins is a fundamental problem in fault-tolerant distributed systems. Assuming a synchronous communication network that is not subject to partition occurrences, we specify the processor-group membership problem and we propose three simple protocols for solving it. The protocols provide all correct processors with consistent views of the processor-group membership and guarantee bounded processor failure detection and join delays.
symposium on principles of database systems | 1985
Amr El Abbadi; Dale Skeen; Flaviu Cristian
A data management protocol for executing transactions on a replicated database is presented. The protocol ensures one-copy serializability. i.e., the concurrent execution of transactions on a replicated database is equivalent to some serial execution of the same transactions on a non-replicated database. The protocol tolerates a large class of failures, including: processor and communication link crashes, partitioning of the communication network, lost messages, and slow responses of processors and communication links. Processor and link recoveries are also handled. The protocol implements the reading of a replicated object efficiently by reading the nearest available copy of the object. When reads outnumber writes, the protocol performs better than other known protocols.
Real-time Systems | 1990
Flaviu Cristian
We propose a synchronous atomic broadcast protocol for distributed real-time systems based on redundant broadcast channels. The protocol can tolerate a finite number f of concurrent processor crash failures, channel adapter performance failures and channel omission failures. Its message cost is optimal: when no failures occur only f+1 messages are sent per broadcast. The cost implications of providing tolerance to other failure classes are also investigated.
acm sigops european workshop | 1990
Flaviu Cristian; Bob Dancey; Jon Dehn
The Advanced Automation System is a distributed real-time system under development by IBMs Systems Integration Division for the US Federal Aviation Administration. The system is intended to replace the present en-route and terminal approach US air traffic control computer systems over the next decade. High availability of air traffic control services is an essential requirement of the system. This paper discusses the general approach to fault-tolerance adopted in AAS, by reviewing some of the questions which were asked during the system design, various alternative solutions considered, and the reasons for the design choices made.
symposium on reliable distributed systems | 1991
Flaviu Cristian; Farnam Jahanian
The authors present a timestamp-based protocol for checkpointing the global state of a long-lived distributed computation in an environment in which processor clocks are approximately synchronized. The protocol is based on periodic checkpointing of local process states and logging of incoming messages during a short bounded interval. It tolerates process crash and performance failures as well as network omission and performance failures. The proposed approach has the advantage of optimistic logging protocols in that it does not require synchronous logging of each message on stable storage. The approach also has the advantage of pessimistic logging protocols in that it avoids the domino effect by recovering to the most recent successful local checkpoint.<<ETX>>
ieee international symposium on fault tolerant computing | 1988
Flaviu Cristian
The author describes his system model and failure assumptions by precisely specifying the processor group membership problem. He then gives two protocols for solving this problem. The protocols provide all correct processors with constituent views of the processor group membership. They also guarantee bounded processor failure detection and join processing delays despite any number of performance failures that do not cause network partitioning. The first protocol provides very fast processor failure detection but can require a significant message traffic overhead, even when no failures occur. To reduce this overhead, the author derives the second protocol, which has a (provable) minimal message overhead in the absence of failures but provides a longer failure detection delay and is more complex. He concludes by comparing his approach with other known approaches.<<ETX>>
IEEE Transactions on Software Engineering | 1984
Flaviu Cristian
The design of programs which are both correct and robust is investigated. It is argued that the notion of an exception is a valuable tool for structuring the specification, design, verification, and modification of such programs. The syntax and semantics of a language with procedures and exception handling are presented. A deductive system is proposed for proving total correctness and robustness properties of programs written in this language. The system is both sound and complete. It supports proof modularization, in that it allows one to reason separately about fault-free and fault-tolerant system properties. Since the programming languages considered closely resembles CLU or Ada, the presented deductive system is easily adaptable for verifying total correctness and robustness properties of programs written in these, or similar, languages.
symposium on reliable distributed systems | 1994
Flaviu Cristian; Christof Fetzer
We propose an improved probabilistic method for reading remote clocks in systems subject to unbounded communication delays and use this method to design a fault-tolerant probabilistic internal clock synchronization protocol. This protocol masks clock reading failures and arbitrary failures of processes. Because of probabilistic reading, our protocol achieves better synchronization precisions than those achievable by previously known deterministic algorithms. Another advantage of the proposed protocol is that it uses a linear, instead of quadratic, number of messages, and that message exchanges are staggered in time instead of all happening in narrow synchronization intervals. The drift rate of the synchronized clocks is optimal.<<ETX>>