Claudio Luis de Amorim

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Claudio Luis de Amorim is active.

Explore More

Publication

Featured researches published by Claudio Luis de Amorim.

international parallel and distributed processing symposium | 2006

D1HT: a distributed one hop hash table

Luiz Rodolpho Monnerat; Claudio Luis de Amorim

Distributed hash tables (DHTs) have been used in a variety of applications, but most DHTs so far have opted to solve lookups with multiple hops, which sacrifices performance in order to keep little routing information and minimize maintenance traffic. In this paper, we introduce D1HT, a novel single hop DHT that is able to maximize performance with reasonable maintenance traffic overhead even for huge and dynamic peer-to-peer (P2P) systems. We formally define the algorithm we propose to detect and notify any membership change in the system, prove its correctness and performance properties, and present a quarantine-like mechanism to reduce the overhead caused by volatile peers. Our analyses show that D1HT has reasonable maintenance bandwidth requirements even for very large systems, while presenting at least twice less bandwidth overhead than previous single hop DHT

architectural support for programming languages and operating systems | 1996

Hiding communication latency and coherence overhead in software DSMs

Ricardo Bianchini; Leonidas I. Kontothanassis; Raquel Pinto; M. De Maria; M. Abud; Claudio Luis de Amorim

In this paper we propose the use of a PCI-based programmable protocol controller for hiding communication and coherence overheads in software DSMs. Our protocol controller provides three different types of overhead tolerance: a) moving basic communication and coherence tasks away from computation processors; b) prefetching of diffs; and c) generating and applying diffs with hardware assistance. We evaluate the isolated and combined impact of these features on the performance of TreadMarks. We also compare performance against two versions of the Shrimp-based AURC protocol. Using detailed execution-driven simulations of a 16-node network of workstations, we show that the greatest performance benefits provided by our protocol controller come from our hardware-supported diffs. Reducing the burden of communication and coherence transactions on the computation processor is also beneficial but to a smaller extent. Prefetching is not always profitable. Our results show that our protocol controller can improve running time performance by up to 50% for TreadMarks, which means that it can double the TreadMarks speedups. The overlapping implementation of TreadMarks performs as well or better than AURC for 5 of our 6 applications. We conclude that the simple hardware support we propose allows for the implementation of high-performance software DSMs at low cost. Based on this conclusion, we are building the NCP2 parallel system at COPPE/UFRJ.

international conference on supercomputing | 1998

Data prefetching for software DSMs

Ricardo Bianchini; Raquel Pinto; Claudio Luis de Amorim

In this paper we propose and evaluate the Adaptive++ technique, a novel runtime-only data prefetching strategy for software-based distributed shared-memory systems (software DSMs). Adaptive++ improves the performance of regular parallel applications running on software DSMs by using the past history of memory access faults to adapt between repeated-phase and repeated-stride prefetching modes. Adaptive++ does not issue prefetches during periods when the application is not exhibiting one of these two types of behavior and is thus behaving irregularly. Through detailed execution-driven simulations of several applications, we show that our prefetching technique is very successful at reducing the data access overheads of regular applications running on the TreadMarks software DSM. Adaptive++ also reduces the overhead of applications that are not strictly regular but that exhibit periods of regularity. In terms of overall performance, our results show that Adaptive++ can provide speedup improvements as significant as 34% on 16 processors. A direct comparison against two runtime-only prefetching techniques proposed thus far shows that Adaptive++ is consistently competitive in terms of performance, while being able to optimize a larger set of applications. Our main conclusion is that Adaptive++ should definitely be considered by software DSM designers as an effective way of tolerating the overhead of remote data accesses.

Concurrency and Computation: Practice and Experience | 2002

Java for high‐performance network‐based computing: a survey

Marcelo Lobosco; Claudio Luis de Amorim; Orlando Loques

There has been an increasing research interest in extending the use of Java towards high‐performance demanding applications such as scalable Web servers, distributed multimedia applications, and large‐scale scientific applications. However, extending Java to a multicomputer environment and improving the low performance of current Java implementations pose great challenges to both the systems developer and application designer. In this survey, we describe and classify 14 relevant proposals and environments that tackle Javas performance bottlenecks in order to make the language an effective option for high‐performance network‐based computing. We further survey significant performance issues while exposing the potential benefits and limitations of current solutions in such a way that a framework for future research efforts can be established. Most of the proposed solutions can be classified according to some combination of three basic parameters: the model adopted for inter‐process communication, language extensions, and the implementation strategy. In addition, where appropriate to each individual proposal, we examine other relevant issues, such as interoperability, portability, and garbage collection. Copyright

international conference on parallel processing | 1997

The affinity entry consistency protocol

Cristiana Bentes Seidel; Ricardo Bianchini; Claudio Luis de Amorim

In this paper we propose a novel software-only distributed shared memory system (SW-DSM), the Affinity Entry Consistency (AEC) protocol. The protocol is based on Entry Consistency but, unlike previous approaches, does not require the explicit association of shared data to synchronization variables, uses the page as its coherence unit, and generates the set of modifications (in the form of diffs) made to shared pages eagerly. The AEC protocol hides the overhead of generating and applying diffs behind synchronization delays, and uses a novel technique, Lock Acquirer Prediction (LAP), to tolerate the overhead of transferring diffs through the network. LAP attempts to predict the next acquirer of a lock at the time of the release, so that the acquirer can be updated even before requesting ownership of the lack. Using execution-driven simulation of real applications, we show that LAP performs very well under AEC; LAP predictions are within the 80-97% range of accuracy. Our results also show that LAP improves performance by 7-28% for our applications. In addition we find that most of the diff creation overhead in the AEC protocol can usually be overlapped with synchronization latencies. A comparison against simulated TreadMarks shows that AEC outperforms TreadMarks by as much as 47%. We conclude that LAP is a useful technique for improving the performance of update-based SW-DSMs, while AEC is an efficient implementation of the Entry Consistency model.

global communications conference | 2009

Peer-to-Peer Single Hop Distributed Hash Tables

Luiz Rodolpho Monnerat; Claudio Luis de Amorim

Efficiently locating information in large-scale distributed systems is a challenging problem to which Peer-to-Peer (P2P) Distributed Hash Tables (DHTs) can provide a highly scalable and cost-effective solution. However, there is very little experience on using DHTs in performance sensitive environments such as High Performance Computing (HPC) datacenters, and there is no published experimental comparison among low-latency DHTs. To fill this gap, we conducted an in-depth performance comparison of three proposed low-latency single-hop DHTs namely 1h-Calot, D1HT, and OneHop. Specifically, we compared experimentally the lookup latency and CPU use of D1HT with those of 1h-Calot by running each of them concurrently with the normal workload production for a subset of 1,800 nodes of a heavy-loaded HPC datacenter. In addition, we carried out an analytical performance comparison among the three single-hop DHTs for system sizes of up to 10 million nodes. The results showed that D1HT consistently had the smallest overhead and in most cases it required one order of magnitude less bandwidth than 1h-Calot and OneHop. Overall, the combination of our experimental and analytical results suggests that D1HT can provide a very effective solution for a broad range of environments, from large-scale HPC datacenters to widely deployed Internet P2P applications such as BitTorrent with up to one million peers. This ability to support such a wide range of environments may allow D1HT to be used as an inexpensive and scalable commodity software substrate for large-scale distributed applications.

symposium on computer architecture and high performance computing | 2002

GloVE: a distributed environment for low cost scalable VoD systems

Leonardo Bidese de Pinho; Claudio Luis de Amorim; Edison Ishikawa

In this paper we introduce a scalable Video-on-Demand (VoD) system called GloVE (Global Video Environment) in which active clients cooperate to create a shareable video cache that is used as the primary source of video content for subsequent client requests. In this way, GloVE servers bandwidth does not limit the number of simultaneous clients that can watch a video since once its content is in the cooperative video cache (CVC) it can be directly transmitted from the cache rather than the VoD server Also, GloVE follows the peer-to-peer approach, allowing the use of low-cost PCs as video servers. In addition, GloVE supports video servers without multicast capability and videos in any stored format. We analyze preliminary performance results of GloVE implemented in a PC server using a Fast Ethernet interconnect and small video buffers at the clients. Our results confirm that while the GloVE-based server uses only a single video channel to deliver a highly popular video simultaneously to N clients, conventional VoD servers require as much as N times more channels.

international conference on computational science | 2002

Performance Evaluation of Fast Ethernet, Giganet, and Myrinet on a Cluster

Marcelo Lobosco; Vítor Santos Costa; Claudio Luis de Amorim

This paper evaluates the performance of three popular technologies used to interconnect machines on clusters: Fast Ethernet, Myrinet and Giganet. To achieve this purpose, we used the NAS Parallel Benchmarks. Surprisingly, for the LU application, the performance of Fast Ethernet was better than Myrinet. We also evaluate the performance gains provided by VIA, a user lever communication protocol, when compared with TCP/IP, a traditional, stackedbased communication protocol. The impacts caused by the use of Remote DMA Write are also evaluated. The results show that Fast Ethernet, when combined with a high performance communication protocol, such as VIA, has a good cost-benefit ratio, and can be a good choice to connect machines on a small cluster environment where bandwidth is not crucial for applications.

international conference on networking | 2001

Cooperative Video Caching for Interactive and Scalable VoD Systems

Edison Ishikawa; Claudio Luis de Amorim

In this work, we introduce a novel technique called Cooperative Video Caching (CVC) that enables scalable and interactive Videoon-Demand (SI-VoD) servers to be developed. The key insight of CVC is to exploit the client buffers as a large global distributed memory that can be randomly accessed over the communication network. CVC uses patching and chaining but in such a way that minimizes communication traffic, eliminates the chaining interruption problem, and most important, enables VCR-like interaction. Through detailed CVC simulations our results show that CVC-based servers are potentially scalable. In addition, we show that interactive CVC servers significantly outperform conventional interactive VoD servers. These preliminary results suggest that CVC is a promising technique to develop SI-VoD servers.

ieee international conference on high performance computing data and analytics | 2003

Glove: A Distributed Environment for Scalable Video-on-Demand Systems

Leonardo Bidese de Pinho; Edison Ishikawa; Claudio Luis de Amorim

In this paper, we introduce a scalable Video-on-Demand (VoD) system called GloVE (Global Video Environment) in which active clients cooperate to create a shareable video cache that is used as the primary source of video content for subsequent client requests. In this way, GloVE servers bandwidth does not limit the number of simultaneous clients that can playback a certain video since that as long as its content is stored in the cooperative video cache (CVC), new requests for the video can be serviced from CVC thus alleviating the demand on the VoD server. In addition, the GloVEs peer-to-peer model allows the use of low-cost PCs as video servers besides requiring no multicast capability and transmitting videos stored in any well-defined format. Our experimental results show that GloVE-based servers are capable of delivering popular videos at prime time in a scalable way to large audiences.

Explore More