Naghmeh Ivaki | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Naghmeh Ivaki is active.

Explore More

Publication

Featured researches published by Naghmeh Ivaki.

international conference on parallel and distributed systems | 2014

Session-based fault-tolerant design patterns

Naghmeh Ivaki; Filipe Araujo; Fernando J. Barros

Despite offering reliability against dropped and reordered packets, the widely adopted Transmission Control Protocol (TCP) provides nearly no recovery options for longterm network outages. When the network fails, developers must rollback the application to some coherent state on their own, using error-prone solutions. Overcoming this limitation is, therefore, a deeply investigated and challenging problem. Existing solutions range from transport-layer to application-layer protocols, including additions to TCP, usually transparent to the application. None of these solutions is perfect, because they all impact TCPs simplicity, performance or ubiquity, if not all. To avoid these shortcomings, we contain TCP connection crashes inside a single session layer exposed as a sockets interface. Based on this interface, we create a blocking and a non-blocking fault-tolerant design pattern. We explore the blocking design in an open source File Transfer Protocol (FTP) server and perform a thorough evaluation of performance, complexity and overhead of both designs. Our results show that using one of the patterns to tolerate TCP connection crashes, in new or existing applications, involves a very limited effort and negligible penalties.

ieee international conference on cloud computing technology and science | 2014

A Fault-Tolerant Session Layer with Reliable One-Way Messaging and Server Migration Facility

Naghmeh Ivaki; Serhiy Boychenko; Filipe Araujo

Despite being extremely successful, TCP has a number of shortcomings when network disruptions occur, or when peers do not follow a request-reply interaction: it does not handle connection crashes, event-driven communication or application migration. In many cases, programmers must engineer their own solutions to write reliable distributed applications. To overcome these limitations, we propose FTSL, a Fault-Tolerant Session Layer that works on top of TCP. Besides offering a full-duplex connection, FTSL owns a number of distinctive features: it tolerates TCP connection crashes, it provides highly decoupled reliable patterns for one-way communication, and it enables server-side migration. While the first two greatly simplify distributed systems programming for a wide range of applications, the latter enables cloud systems managers to move a server application for load balance or maintenance, without moving the entire virtual machine. We present the FTSL protocol, its implementation, and resort to performance to show that FTSL imposes a reasonable overhead for the guarantees it provides.

pacific rim international symposium on dependable computing | 2012

A Middleware for Exactly-Once Semantics in Request-Response Interactions

Naghmeh Ivaki; Filipe Araujo; Raul Barbosa

Although the need for the exactly-once request-response interaction pattern is ubiquitous in distributed systems, making it work in practice is anything but simple. Ensuring the at-most-once part of the invocation is relatively easy. Unfortunately, the same is not true for the at-least-once guarantee, which depends on the recovery from crashes of the client, the server and the network. This is what makes the exactly-once interaction so difficult in practice: client and server must log their actions into stable storage, and they must be able to restart the network connections. In this paper, we present a middleware that implements the exactly-once request-response pattern, in presence of network and endpoints crashes. The main contribution of our work is to release the programmer from the complex tasks of recovering from message losses and network crashes.

pacific rim international symposium on dependable computing | 2014

Design of Multi-threaded Fault-Tolerant Connection-Oriented Communication

Naghmeh Ivaki; Filipe Araujo; Fernando J. Barros

Fault-tolerance is vital for dependable distributed applications that can deliver service, even in the presence of faults. Over the last few decades, above all protocols proposed to offer reliability and fault-tolerance, TCP grew to become one of the cornerstones of the Internet. However, despite emulating reliable communication in distributed environments, TCP does not handle connection failures when the connectivity is lost for some time, even if both endpoints are still running. When this occurs, developers must rollback the peers to some coherent state, many times with error-prone, ad hoc, or custom application-level solutions. In this paper, we refine the Acceptor-Connector design pattern to tackle the TCP unreliability problem. The pattern decouples the failure-related processing from the connection and service processing, efficiently handling different connections and their possible crashes concurrently, thereby yielding more reusable, extensible, and efficient distributed communication. The solution we propose incorporates proven multi-threaded solutions and a buffering scheme that discards the need for an application-layer acknowledgment scheme. This simplifies the development of reliable connection-oriented applications using the ubiquitous TCP protocol.

dependable systems and networks | 2013

Towards evaluating the impact of data quality on service applications

Naghmeh Ivaki; Nuno Laranjeiro; Marco Vieira

Service applications frequently make use of a relational database to store and retrieve data and rely on the correctness of this data to deliver service to clients. Despite this, relational databases do not provide support for complex data integrity restrictions, which have to be controlled by the application. As such, bugs present in service applications can easily lead to the storage of incorrect data that, at random instants can cause applications to fail and stop delivering service, which can severely impact clients, other applications, and even the reputation or finance of the service provider. The goal of this work is to set the basis for an approach that is able to assess how vulnerable a service application can be to incorrect data. We expect that the results can also be used to suggest solutions for applications showing failures in presence of poor data and to define problem prevention techniques during the development of new applications.

network computing and applications | 2016

Towards designing reliable messaging patterns

Naghmeh Ivaki; Nuno Laranjeiro; Filipe Araujo

Reliable communication is nowadays pervasively supported by TCP, which is poorly adapted for message-based communications, because it offers a streaming channel with no mechanisms to encapsulate messages. Moreover, TCP does not tolerate connection crashes. Thus, whenever reliable message-based communication is needed, developers either use heavy-weight middleware, like Java Message Service (JMS), or develop their own custom error-prone solutions for recovering from crashes. In this paper, we introduce two TCP-based design patterns that address these limitations, and facilitate the development of light-weight and reliable message-based applications. Our design solutions are modular, in the sense that they build on top of each other.

acm symposium on applied computing | 2015

A taxonomy of reliable request-response protocols

Naghmeh Ivaki; Nuno Laranjeiro; Filipe Araujo

Reliable request-response interactions, in which the server never executes a given request more than once, are being used to support business and safety-critical operations in diverse sectors, such as banking, E-commerce, or healthcare. This form of interactions can be quite difficult to implement, because the client, server, or communication channel may fail, potentially requiring diverse and complex recovery procedures, which may result in duplicate messages being processed at the server. In this paper we address the following question: could we provide a meaningful taxonomy of reliable request-response protocols? We generate valid sequences of client and server actions, organize the generated sequences into a prefix tree, and classify them according to their reliability semantics and memory requirements. The tree reveals three families of protocols matching common real-world implementations that try to deliver exactly-once or at-most-once. The strict organization of the protocols provides a solid foundation for creating correct services, and we show that it also serves to easily identify fallacies and pitfalls of existing implementations.

ieee international conference on services computing | 2017

Design Patterns for Reliable One-Way Messaging

Naghmeh Ivaki; Nuno Laranjeiro; Filipe Araujo

The one-way messaging pattern, in which a message sender does not expect any response, is fast and convenient for many applications, but whenever reliable communication is needed, developers either use heavy-weight middleware, such as JMS, or implement request-response interactions, based on TCP. However, TCP is poorly adapted to one-way messaging, because it offers a streaming channel with no mechanisms to encapsulate or track messages. Moreover, TCP does not tolerate connection crashes, forcing developers to come up with their own custom, error-prone solutions, to recover from crashes. In this paper, we propose three TCP-based design patterns that address these limitations, and facilitate developing light-weight and reliable one-way message-based applications. Our solutions are correct, modular, and involve low programming complexity.

Journal of the Brazilian Computer Society | 2017

Testing data-centric services using poor quality data: from relational to NoSQL document databases

Nuno Laranjeiro; Seyma Nur Soydemir; Naghmeh Ivaki; Jorge Bernardino

Businesses are nowadays deploying their services online, reaching out to clients all around the world. Many times deployed as web applications or web services, these business-critical systems typically perform large amounts of database operations; thus, they are dependent on the quality of the data to provide correct service to clients. Research and practice have shown that the quality of the data in an enterprise system gradually decreases overtime, bringing in diverse reliability issues to the applications that are using the data to provide services. These issues range from simple incorrect operations to aborted operations or severe system failures. In this paper, we present an approach to test data-centric services in presence of poor quality data. The approach has been designed to consider relational and NoSQL database nodes used by the system under test and is based on the injection of poor quality data on the database–application interface. The results indicate the effectiveness of the approach in discovering issues, not only at the application-level, but also in the middleware being used, contributing to the development of more reliable services.

Journal of Systems and Software | 2017

A survey on reliable distributed communication

Naghmeh Ivaki; Nuno Laranjeiro; Filipe Araujo

Abstract From entertainment to personal communication, and from business to safety-critical applications, the world increasingly relies on distributed systems. Despite looking simple, distributed systems hide a major source of complexity: tolerating faults and component crashes is very difficult, due to the incompleteness of (remote) knowledge. The need to overcome this problem, and provide different guarantees to applications, sparked a huge research effort and resulted in a large body of communication protocols, and middleware. Thus, it is worthwhile to survey the state of the art in distributed systems, with a particular emphasis on reliable communication. We discuss key concepts in reliable communication, such as interaction patterns (e.g., one-way vs. request-response, synchronous vs. asynchronous), reliability semantics (e.g., at-least-once, at-most-once), and reliability targets (e.g., message, conversation), and we analyze a wide set of current communication solutions, which map to the different concepts. Building on the concepts, we analyze applications that have different reliable communication needs. As a result, we observe that, in most cases, elaborate communication solutions offering superior guarantees are purely academic efforts that cannot compete with the popularity and maturity of established, albeit poorer solutions. Based on our analysis, we identify and discuss open research topics in this area.

Explore More