Elias Procópio Duarte
Federal University of Paraná
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Elias Procópio Duarte.
IEEE Transactions on Computers | 1998
Elias Procópio Duarte; Takashi Nanya
Consider a system composed of N nodes that can be faulty or fault-free. The purpose of distributed system-level diagnosis is to have each fault-free node determine the state of all nodes of the system. This paper presents a Hierarchical Adaptive Distributed System-level Diagnosis (Hi-ADSD) algorithm, which is a fully distributed algorithm that allows every fault-free node to achieve diagnosis in, at most, (log/sub 2/ N)/sup 2/ testing rounds. Nodes are mapped into progressively larger logical clusters, so that tests are run in a hierarchical fashion. Each node executes its tests independently of the other nodes, i.e., tests are run asynchronously. All the information that nodes exchange is diagnostic information. The algorithm assumes no link faults, a fully-connected network and imposes no bounds on the number of faults. Both the worst-case diagnosis latency and correctness of the algorithm are formally proved. As an example application, the algorithm was implemented on a 37-node Ethernet LAN, integrated to a network management system based on SNMP (Simple Network Management Protocol). Experimental results of fault and repair diagnosis are presented. This implementation by itself is also a significant contribution, for, although fault management is a key functional area of network management systems, currently deployed applications often implement only rudimentary diagnosis mechanisms. Furthermore, experimental results are given through simulation of the algorithm for large systems of 64 nodes and 512 nodes.
ACM Computing Surveys | 2011
Elias Procópio Duarte; Roverli Pereira Ziwich; Luiz Carlos Pessoa Albini
The growing complexity and dependability requirements of hardware, software, and networks demand efficient techniques for discovering disruptive behavior in those systems. Comparison-based diagnosis is a realistic approach to detect faulty units based on the outputs of tasks executed by system units. This survey integrates the vast amount of research efforts that have been produced in this field, from the earliest theoretical models to new promising applications. Key results also include the quantitative evaluation of a relevant reliability metric—the diagnosability—of several popular interconnection network topologies. Relevant diagnosis algorithms are also described. The survey aims at clarifying and uncovering the potential of this technology, which can be applied to improve the dependability of diverse complex computer systems.
international conference on parallel and distributed systems | 2000
Elias Procópio Duarte; Alessandro Brawerman; Luiz Carlos Pessoa Albini
The components of a fault-tolerant distributed system must be capable to accurately determine which components of the system are faulty and which are fault-free. In this paper, we present a new distributed algorithm for event diagnosis in fully-connected networks. An event is defined as a faulty node becoming fault-free, or vice versa. Previous hierarchical algorithms considered a static fault situation, in which an event can only occur after a previous event has been fully diagnosed. The new algorithm is capable of achieving the diagnosis of dynamic events as long as the nodes stay in a given state for a period of time long enough for all testers to detect that state. Each node running the algorithm keeps a timestamp for the state of each other node in the system. This timestamp is implemented as a counter, which is incremented every time a node changes its state. In this way, each tester may obtain information about a given node in the system from more than one tested node without causing any inconsistencies, i.e. without taking an older state for a newer one. Nodes run a hierarchical testing strategy, which is a hypercube when all nodes are fault-free. When a fault-free node is tested, the tester gets diagnostic information about N/2 nodes for a system of N nodes. In spite of the overhead of keeping and transferring timestamps, the new algorithm significantly reduces the average latency when compared to other similar approaches, presenting a new option for practical diagnosis implementation.
intelligent systems design and applications | 2007
Bogdan Tomoyuki Nassu; Takashi Nanya; Elias Procópio Duarte
Topology discovery is a key task for several computer network applications such as diagnosis, routing and network management. Traditional approaches for topology discovery cannot always be used in dynamic and decentralized networks, such as unstructured peer-to-peer networks and wireless ad hoc networks. This paper introduces a strategy based on mobile agents and swarm intelligence for topology discovery in such environments. The proposed strategy is inspired by ant colonies, employing simple agents that disseminate information about the topology and communicate through stigmergy. Experimental results show that the nodes obtain descriptions which are very close to the real network topology. It is also shown that the stigmergy-based method for the selection of agent destinations produces better results than a random selection, and that the number of agents can be dynamically adjusted as the size of the network changes.
international conference on distributed computing systems | 2001
Elias Procópio Duarte; A.L. dos Santos
A network management system must be fault-tolerant in order to provide the required fault management functionality. It is often useful to examine MIB objects of a faulty agent in order to determine why it is faulty. This paper presents a new framework for replicating of SNMP management objects in local area networks. The framework is based on groups of agents that communicate with each other using reliable multicast. A group of agents provides fault-tolerant object functionality. A SNMP service is proposed that allows replicated MIB objects of a faulty agent of a given group to be accessed through fault-free agents of that group. The presented framework allows the dynamic definition of agent groups, and management objects to be replicated in each group. A practical fault-tolerant tool for local area network fault management was implemented and is presented. The system employs SNMP agents that interact with a group communication tool. As an example, we show how the examination of TCP-related objects of faulty agents have been used in the fault diagnosis process. The impact of replication on network performance is evaluated as well as a probabilistic analysis of replicated object consistency.
IEEE Transactions on Parallel and Distributed Systems | 2012
Elias Procópio Duarte; Andréa Weber; Keiko Verônica Ono Fonseca
This work introduces the Distributed Network Reachability (DNR) algorithm, a distributed system-level diagnosis algorithm that allows every node of a partitionable arbitrary topology network to determine which portions of the network are reachable and unreachable. DNR is the first distributed diagnosis algorithm that works in the presence of network partitions and healings caused by dynamic fault and repair events. Both crash and timing faults are assumed, and a faulty node is indistinguishable of a network partition. Every link is alternately tested by one of its adjacent nodes at subsequent testing intervals. Upon the detection of a new event, the new diagnostic information is disseminated to reachable nodes. New events can occur before the dissemination completes. Any time a new event is detected or informed, a working node may compute the network reachability using local diagnostic information. The bounded correctness of DNR is proved, including the bounded diagnostic latency, bounded startup and accuracy. Simulation results are presented for several random and regular topologies, showing the performance of the algorithm under highly dynamic fault situations.
Computers & Security | 2010
Egon Hilgenstieler; Elias Procópio Duarte; Glenn Mansfield-Keeni; Norio Shiratori
IP traceback is used to determine the source and path traversed by a packet received from the Internet. In this work we first show that the Source Path Isolation Engine (SPIE), a classical log-based IP traceback system, can return misleading attack graphs in some particular situations, which may even make it impossible to determine the real attacker. We show that by unmasking the TTL field SPIE returns a correct attack graph that precisely identifies the route traversed by a given packet allowing the correct identification of the attacker. Nevertheless, an unmasked TTL poses new challenges in order to preserve the confidentiality of the communication among the systems components. We solve this problem presenting two distributed algorithms for searching across the network overlay formed by the packet log bases. Two other extensions to SPIE are proposed that improve the efficiency of source discovery: separate logs are kept for each router interface improving the distributed search procedure; an efficient dynamic log paging strategy is employed, which is based on the actual capacity factor instead of the fixed time interval originally employed by SPIE. The system was implemented and experimental results are presented.
International Journal of Network Management | 2008
Elias Procópio Duarte; Martin A. Musicante; Henrique Denes H. Fernandes
This work presents ANEMONA: A language for programming NEtwork MONitoring Applications. The compilation of an ANEMONA program generates code for configuring a policy repository and the corresponding policy deployment and event monitoring. The language allows the definition of expressions of managed objects that are monitored, as well as triggers that when fired may indicate the occurrence of associated events, which are also defined by the language. A translator for the language was implemented that generates code for configuring both the policy repository and deployment. The current implementation of the language employs the Expression MIB and Event MIB. Experimental results are presented, including an ANEMONA program that detects TCP Syn Flooding attacks, and a program for detecting steep variations in the utilization of monitored links.
genetic and evolutionary computation conference | 2005
Bogdan Tomoyuki Nassu; Elias Procópio Duarte; Aurora T. R. Pozo
The size and complexity of systems based on multiple processing units demand techniques for the automatic diagnosis of their state. System-level diagnosis consists in determining which units of a system are faulty and which are fault-free. Elhadef and Ayeb have proposed a specialized genetic algorithm (GA) that can be used to accomplish diagnosis. This work extends their approach, describing and comparing several evolutionary algorithms for system-level diagnosis. Implemented algorithms include a simple genetic algorithm, a specialized GA both with and without crossover and specialized versions of the compact GA and Population-Based Incremental Learning both with and without negative examples. These algorithms had their performance evaluated using four metrics: the average number of generations needed to find the solution, the average fitness after up to 500 generations, the percentage of tests that found the optimal solution and the average time until the solution was found. An analysis of experimental results shows that more sophisticated algorithms converge faster to the optimal solution.
dependable systems and networks | 2002
Elias Procópio Duarte; L.C. Erpen De Bona
This work presents a dependable fully distributed network management tool based on the Internet standard network management protocol, SNMP (Simple Network Management Protocol). Multiple SNMP agents running the Hi-ADSD with Timestamps, a Hierarchical Distributed System-Level Diagnosis algorithm with Timestamps, monitor themselves and a configurable set of network services and devices, issuing controlling commands depending on the results. The system is dependable in the sense that it continues working even if only one agent is fault-free. A MIB (Management Information Base) allows the definition of test procedures specific for each managed entity. The system presents a configurable Web interface that allows the human manager to monitor the network from any agent. Practical results are presented, including the construction of a resilient Web server built on top of the tool.