Aleta Ricciardi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Aleta Ricciardi is active.

Explore More

Publication

Featured researches published by Aleta Ricciardi.

principles of distributed computing | 1991

Using process groups to implement failure detection in asynchronous environments

Aleta Ricciardi; Kenneth P. Birman

Agreement on the membership of a group of processes in a distributed system is a basic problem that arises in a wide range of applications. Such groups occur when a set of processes cooperate to perform some task, share memory, monitor one another, subdivide a computation, and so forth. The group membership problems is discussed as it relates to failure detection in asynchronous, distributed systems. A rigorous, formal specification for group membership is presented under this interpretation. A solution is then presented for this problem.

ieee international symposium on fault tolerant computing | 1993

Virtually-synchronous communication based on a weak failure suspector

Andre Schiper; Aleta Ricciardi

Failure detectors (or, more accurately, failure suspectors, or FS) appear to be a fundamental service upon which to build fault-tolerant, distributed applications. It is shown that an FS with very weak semantics (i.e. that delivers failure and recovery information in no specific order) suffices to implement virtually synchronous communication (VSC) in an asynchronous system subject to process crash failures and network partitions. The VSC paradigm is particularly useful in asynchronous systems and greatly simplifies building fault-tolerant applications that mask failures by replicating processes. The authors suggest a three-component architecture to implement virtually synchronous communication: (1) at the lowest level, the FS component; on top of it, (2a) a component that defines new views, and (2b) a component that reliably multicasts messages within a view.

international workshop on variable structure systems | 1993

Understanding partitions and the 'no partition' assumption

Aleta Ricciardi; Andre Schiper; Kenneth P. Birman

Discusses partitions in asynchronous message-passing systems. In such systems, slow processes and slow links can lead to virtual partitions that are indistinguishable from real ones. To overcome the impossibility of detecting crashed processes in an asynchronous system, the system model incorporates a failure suspector to detect (possibly erroneously) process failures. Based on failure suspicions, the authors give a definition of partitions that accounts for real partitions as well as virtual ones. It is shown that under certain assumptions about the process behavior, any incorrect failure suspicion inevitably partitions the system. It is then shown how to interpret the absence-of-partition assumption.<<ETX>>

Proceedings of the International Conference | 1996

The NILE System Architecture: Fault-Tolerant, Wide-Area Access to Computing and Data Resources

Aleta Ricciardi; Michael Ogg; Eric Rothfus

Nile is a multidisciplinary project building a distributed computing environment for HEP. It provides wide-area, fault-tolerant, integrated access to processing and data resources for collaborators of the CLEO experiment, though the goals and principles are applicable to many domains. Nile has three main objectives: a realistic distributed system architecture design, the design of a robust data model, and a Fast-Track implementation providing a prototype design environment which will also be used by CLEO physicists. This paper focuses on the software and wide-area system architecture design and the computing issues involved in making Nile services highly-available. 1 The Challenge and Goals The main goals of the Nile project are to build a scalable environment that gives access to a widely distributed set of resources for storing and processing HEP events, to increase the processing speed of computations, and to broaden access to event data so that analyses can be performed at geographically dispersed sites. The main obstacles in achieving these goals are the amount of data recorded by the CLEO experimentt1], and the data processing demands of the computing environment in the form of CPU cycles, network bandwidth and latency, and storage. To add to the complexity, diierent forms of computation impose diierent burdens on processing resources (either CPU bound or I/O bound). In the CLEO II experiment, a typical hadronic event is 8 kB, and grows to 20 kB when the results of event reconstruction are added to the event record. About 10 6 events are recorded each day, resulting in the production of about 1 TB of analyzed data per year. The actual amount of data transferred for each event during analysis depends upon the details of the Nile Data Modell3]. Approximately 3,000 SPECints of distributed CPU power per year are necessary for continuous event reconstruction on the incoming data. Another 7,000 SPECints are necessary for continuous Monte Carlo simulation of events, and 2,000 SPECints for analysis. Current plans for the improvement of the CESR storage ring and the CLEO III upgradee2] will increase these requirements by a factor of ten within four years.

international workshop on distributed algorithms | 1992

The Cost of Order in Asynchronous Systems

Aleta Ricciardi; Kenneth P. Birman; Patrick Stephenson

We consider the Group Membership Problem (GMP) in asynchronous systems. This problem consists of maintaining a list of processes belonging to the system, and updating it as processes join (are started) and leave (terminate or fail). Our investigations led to four independent properties that characterize instances of this problem. We closely examine three membership services, comparing the message cost to implement them, as well as their fault-tolerance and ability to adapt to environmental changes. We also examine their relative merits by comparing the cost to a distributed application that employs each of the membership services. We show that in typical system executions Strong GMP is less expensive to implement, is always more responsive to dynamic aspects in the environment, and allows applications to accomplish more work with less effort. As Strong GMP is the sole instance providing a linear order on membership changes, these results emphasize the benefits of providing Order as well as the cost of not providing it when it is available so cheaply.

principles of distributed computing | 1995

Architecture decisions for wide area applications

Michael Ogg; Aleta Ricciardi

A circuit which locks the signaling circuit of a telephone station. A microprocessor disconnects a keypad from an associated tone generator upon detection of operation of a lock button. The microprocessor connects the keypad to the tone generator upon detection of operation of the lock button and a predetermined unlock code provided by the keypad.

Proceedings of the International Conference | 1996

The NILE Data Model

Michael Ogg; Aleta Ricciardi

Nile is a multi-disciplinary project building a distributed computing environment for HEP. Nile will provide fault-tolerant, integrated access to processing and data resources for collaborators of the CLEO experiment, though the goals and principles are applicable to many domains. Nile currently has three main objectives: a realistic distributed system architecture design, the design of a robust data model, and a Fast-Track implementation providing a prototype design environment to be used by CLEO physicists. In this paper, we describe the Data Model, its design issues, and its interactions with the Nile System Architecture.

Archive | 2003