Florin Sultan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Florin Sultan is active.

Explore More

Publication

Featured researches published by Florin Sultan.

international conference on distributed computing systems | 2002

Migratory TCP: connection migration for service continuity in the Internet

Florin Sultan; Kiran Srinivasan; Deepa Iyer; Liviu Iftode

Todays Internet services are commonly built over TCP, the standard Internet connection-oriented reliable transport protocol. The endpoint naming scheme of TCP, based on network layer (IP) addresses, creates an implicit binding between a service and the IP address of a server providing it, throughout the lifetime of a client connection. This makes a TCP client prone to all adverse conditions that may affect the server endpoint or the internetwork in between, after the connection is established: congestion or failure in the network, server overloaded, failed or under DoS attack. Studies that quantify the effects of network stability and route availability demonstrate that connectivity failures can significantly impact Internet services. As a result, although highly available servers can be deployed, sustaining continuous service remains a problem. We propose cooperative service model, in which a pool of similar servers, possibly geographically distributed across the Internet, cooperate in sustaining a service by migration of client connections within the pool. The control traffic between servers, needed to support migrated connections, can be carried either over the Internet or over a private network. From clients viewpoint, at any point during the lifetime of its service session, the remote endpoint of its connection may transparently migrate between servers.

conference on high performance computing (supercomputing) | 2000

Scalable Fault-Tolerant Distributed Shared Memory

Florin Sultan; Liviu Iftode; Thu D. Nguyen

This paper shows how a state-of-the-art software distributed shared-memory (DSM) protocol can be efficiently extended to tolerate single-node failures. In particular, we extend a home-based lazy release consistency (HLRC) DSM system with independent check- pointing and logging to volatile memory, targeting shared-memory computing on very large LAN-based clusters. In these environments, where global coordination may be expensive, independent checkpointing becomes critical to scalability. However, independent checkpointing is only practical if we can control the size of the log and checkpoints in the absence of global coordination. In this paper we describe the design of our fault-tolerant DSM system and present our solutions to the problems of checkpoint and log management. We also present experimental results showing that our fault tolerance support is light-weight, adding only low messaging, logging and checkpointing overheads, and that our management algorithms can be expected to effectively bound the size of the checkpoints and logs or real applications.

IEEE Internet Computing | 2005

Recovering Internet service sessions from operating system failures

Florin Sultan; Aniruddha Bohra; Stephen Smaldone; Yufei Pan; Pascal Gallard; Iulian Neamtiu; Liviu Iftode

Current Internet service architectures lack support for salvaging stateful client sessions when the underlying operating system fails due to hangs, crashes, deadlocks, or panics. The backdoors (BD) system is designed to detect such failures and recover service sessions in clusters of Internet servers by extracting lightweight state associated with client service sessions from server memory. The BD architecture combines hardware and software mechanisms to enable accurate monitoring and remote healing actions, even in the presence of failures that render a system unavailable.

symposium on reliable distributed systems | 2003

Service continuations: an operating system mechanism for dynamic migration of Internet service sessions

Florin Sultan; Aniruddha Bohra; Liviu Iftode

We propose service continuations (SC), an operating system mechanism that supports seamless dynamic migration of Internet service sessions between cooperating multi-process servers. Service continuations provide a server application with a simple and easy to use abstraction, and a means to migrate the service state along with the serviced connection. SC supports transparent resumption of service to the client of another server, and guaranteed integrity and consistency of communication channels used by server processes. SC is a generic, application independent mechanism that can be used to provide service continuity and availability for todays complex Internet services. We have implemented SC in FreeBSD and used them successfully in three real servers: the Apache Web server, the PostgreSQL transactional database server, and the Icecast streaming server. We present results of an experimental evaluation showing that using SC adds negligible run-time overhead to existing servers and that SC enables efficient dynamic migration of client sessions.

workshop on hot topics in operating systems | 2001

Transport layer support for highly-available network services

Florin Sultan; Kiran Srinivasan; Liviu Iftode

We advocate a transport layer protocol for highly available network services by means of transparent migration of the server endpoint of a live connection between cooperating servers that provide the same service. We propose a transport protocol that: (i) offers a better alternative than the simple retransmission to the same server, which may be suffering from overload or a DoS attack, may be down, or may not be easily reachable due to congestion; and (ii) decouples a given service from the unique/fixed identity of its provider. Our protocol can be viewed as an extension to the existing TCP, and compatible with it.

IEEE Transactions on Parallel and Distributed Systems | 2002

Lazy garbage collection of recovery state for fault-tolerant distributed shared memory

Florin Sultan; Thu D. Nguyen; Liviu Iftode

In this paper, we address the problem of garbage collection in a single-failure fault-tolerant home-based lazy release consistency (HLRC) distributed shared-memory (DSM) system based on independent checkpointing and logging. Our solution uses laziness in garbage collection and exploits consistency constraints of the HLRC memory model for low overhead and scalability. We prove safe bounds on the state that must be retained in the system to guarantee correct recovery after a failure. We devise two algorithms for garbage collection of checkpoints and logs, checkpoint garbage collection (CGC), and lazy log trimming (LLT). The proposed approach targets large-scale distributed shared-memory computing on local-area clusters of computers. In such systems, using global synchronization or extra communication for garbage collection is inefficient or simply impractical due to system scale and temporary disconnections in communication. The challenge lies in controlling the size of the logs and the number of checkpoints without global synchronization while tolerating transient disruptions in communication. Our garbage collection scheme is completely distributed, does not force processes to synchronize, does not add extra messages to the base DSM protocol, and uses only the available DSM protocol information. Evaluation results for real applications show that it effectively bounds the number of past checkpoints to be retained and the size of the logs in stable storage.

Journal of Parallel and Distributed Computing | 1996

A Hyperbolic Model for Communication in Layered Parallel Processing Environments

Ion Stoica; Florin Sultan; David Keyes

We introduce a model for communication costs in parallel processing environments, called the hyperbolic model, which generalizes two-parameter dedicated-link models in an analytically simple way. The communication system is modeled as a directed communication graph in which terminal nodes represent the application processes and internal nodes, called communication blocks (CBs), reflect the layered structure of the underlying communication architecture. ACBis characterized by a two-parameter hyperbolic function of the message size that represents the service time needed for processing the message. Rules are given for reducing a communication graph consisting of manyCBs to an equivalent two-parameter form, while maintaining a good approximation for the service time. We demonstrate a tight fit of the estimates of the cost of communication based on the model with actual measurements of the communication and synchronization time between end processes. We compare the hyperbolic model with other two-parameter models and, in appropriate limits, show its compatibility with the LogP model.

Information Sciences | 1997

A simple algorithm for computing minimum spanning trees in the Internet

Hussein M. Abdel-Wahab; Ion Stoica; Florin Sultan; K. Wilson

Abstract A central problem in wide-area networks is to efficiently multicast a message to all members ( hosts ) of a target group. One of the most effective methods to multicast a message is to send the message along the edges of a spanning tree connecting all the members of the group. In this paper, we propose a new fully distributed algorithm to build a minimum spanning tree (MST) in a generic communication network. During the execution, the algorithm maintains a collection of disjoint trees spanning all the group members. Every tree, which initially consists of only one node, independently expands by joining the closest tree, until all of the nodes are connected in a single tree. The resulting communication topology is both robust (there are no singularities subject to failures) and scalable (every node stores a limited amount of local information that is independent of the size of the network).

european conference on parallel processing | 1996

Evaluating the Hyperbolic Model on a Variety of Architectures

Ion Stoica; Florin Sultan; David E. Keyes

We illustrate the application of the hyperbolic model, which generalizes standard two-parameter dedicated-link models for communication costs in message-passing environments, to four rather different distributed-memory architectures: Ethernet NOW, FDDI NOW, IBM SP2, and Intel Paragon. We first evaluate the parameters of the model from simple communication patterns. Then over-all communication time estimates, which compare favorably with experimental measurements, are deduced for the message traffic in a scientific application code. For transformational computing on dedicated systems, for which message traffic is describable in terms of a finite number of regular patterns, the model offers a good compromise between the competing objectives of flexibility, tractability, and reliability of prediction.

Infotech@Aerospace | 2005

Defending Network-Centric Systems Using Backdoors

Liviu Iftode; Arati Baliga; Aniruddha Bohra; Stephen Smaldone; Florin Sultan

As computing systems are increasingly depending on networking, they are also becoming more vulnerable to networking malfunctioning or misuse. Human intervention is not a solution when computer system monitoring and repairing must be done fast and reliably regardless of scale, networking availability, or system impairing. Future network-centric systems must be built around a defensive architecture that allows computers to take care of themselves. In this paper, we argue that the solution to building self-defending computer architectures is a Backdoor, which can support automated observation and intervention on a computer system’s memory without involving its operating system. Backdoors can therefore execute even when the functionality of the operating system of a critical system has been severely compromised and the system is no longer accessible through the primary network. Backdoors can be realized in hardware over a programmable network interface or in software over a virtual machine monitor.

Explore More