Sisira Weeratunga | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sisira Weeratunga is active.

Explore More

Publication

Featured researches published by Sisira Weeratunga.

Supercomputing, 1991. Supercomputing '91. Proceedings of the 1991 ACM/IEEE Conference on | 2009

The NAS parallel benchmarks summary and preliminary results

David H. Bailey; Eric Barszcz; John T. Barton; D. S. Browning; Robert L. Carter; Leonardo Dagum; Rod Fatoohi; Paul O. Frederickson; T. A. Lasinski; Robert Schreiber; Horst D. Simon; V. Venkatakrishnan; Sisira Weeratunga

No abstract available

conference on high performance computing (supercomputing) | 1994

Performance evaluation of three distributed computing environments for scientific applications

Rod Fatoohi; Sisira Weeratunga

Presents performance results for three distributed computing environments using the three simulated computational fluid dynamics applications in the NAS Parallel Benchmark suite. These environments are the Distributed Computing Facility (DCF) cluster, the LACE cluster, and an Intel iPSC/860 machine. The DCF is a prototypic cluster of loosely-coupled SGI R3000 machines connected by Ethernet. The LACE cluster is a tightly-coupled cluster of 92 IBM RS6000/560 machines connected by Ethernet as well as by either FDDI or an IBM Allnode switch. Results of several parallel algorithms for the three simulated applications are presented and analyzed, based on the interplay between the communication requirements of an algorithm and the characteristics of the communication network of a distributed system.<<ETX>>

conference on high performance computing supercomputing | 1991

The NAS parallel benchmarks

David H. Bailey; Eric Barszcz; Horst D. Simon; V. Venkatakrishnan; Sisira Weeratunga; John T. Barton; D. S. Browning; Robert L. Carter; Leonardo Dagum; Rod Fatoohi; Paul O. Frederickson; T. A. Lasinski; Robert Schreiber

No abstract available

distributed memory computing conference | 1990

Performance Results on the Intel Touchstone Gamma Prototype

David H. Bailey; Eric Barszcz; Rod Fatoohi; Horst D. Simon; Sisira Weeratunga

This paper describes the Intel Touchstone Gamma Prototype, a distributed memory MIMD parallel computer based on the new Intel i860 floating point processor. With 128 nodes, this system has a theoretical peak performance of over seven GFLOPS. This paper presents some initial performance results on this system, including results for individual node computation, message passing and complete applications using multiple nodes. The highest rate achieved on a multiprocessor Fortran application program is 844 MFLOPS. Overview of the Touchstone Gamma System In spring of 1989 DARPA and Intel Scientific Computers announced the Touchstone project. This project calls for the development of a series of prototype machines by Intel Scientific Computers, based on hardware and software technologies being developed by Intel in collaboration with research teams at CalTech, MIT, UC Berkeley, Princeton, and the University of Illinois. The eventual goal of this project is the Sigma prototype, a 150 GFLOPS peak parallel supercomputer, with 2000 processing nodes. One of the milestones towards the Sigma prototype is the Gamma prototype. At the end of December 1989, the Numerical Aerodynamic Simulation (NAS) Systems Division at NASA Ames Research Center took delivery of one of the first two Touchstone Gamma systems, and it became available for testing in January 1990. The Touchstone Gamma system is based on the new 64 bit i860 microprocessor by Intel [4]. The i860 has over 1 million transistors and runs at 40 MHz (the initial Touchstone Gamma systems were delivered with 33 MHz processors, but these have since been upgraded to 40 MHz). The theoretical peak speed is 80 MFLOPS in 32 bit floating point and 60 MFLOPS for 64 bit floating point operations. The i860 features 32 integer address registers, with 32 bits each, and 16 floating point registers with 64 bits each (or 32 floating point registers with 32 bits each). It also features an 8 kilobyte onchip data cache and a 4 kilobyte instruction cache. There is a 128 bit data path between cache and registers. There is a 64 bit data path between main memory and registers. The i860 has a number of advanced features to facilitate high execution rates. First of all, a number of important operations, including floating point add, multiply and fetch from main memory, are pipelined operations. This means that they are segmented into three stages, and in most cases a new operation can be initiated every 25 nanosecond clock period. Another advanced feature is the fact that multiple instructions can be executed in a single clock period. For example, a memory fetch, a floating add and a floating multiply can all be initiated in a single clock period. A single node of the Touchstone Gamma system consists of the i860, 8 megabytes (MB) of dynamic random access memory, and hardware for communication to other nodes. The Touchstone Gamma system at NASA Ames consists of 128 computational nodes. The theoretical peak performance of this system is thus approximately 7.5 GFLOPS on 64 bit data. The 128 nodes are arranged in a seven dimensional hypercube using the direct connect routing module and the hypercube interconnect technology of the iPSC/2. The point to point aggregate bandwidth of the interconnect system, which is 2.8 MB/sec per channel, is the same as on the iPSC/2. However the latency for the message passing is reduced from about 350 microseconds to about 90 microseconds. This reduction is mainly obtained through the increased speed of the i860 on the Touchstone Gamma machine, when compared to

11th Computational Fluid Dynamics Conference | 1993

Dynamic overset grid communication on distributed memory parallel processors

Eric Barszcz; Sisira Weeratunga; Robert L. Meakin

A parallel distributed memory implementation of intergrid communication for dynamic overset grids is presented. Included are discussions of various options considered during development. Results are presented comparing an Intel iPSC/860 to a single processor Cray Y-MP. Results for grids in relative motion show the iPSC/860 implementation to be faster than the Cray implementation.

12th Computational Fluid Dynamics Conference | 1995

Moving body overset grid applications on distributed memory MIMD computers

Sisira Weeratunga; Eric Barszcz; Kalpana Chawla

The high fidelity analysis of the flow fields around modern aerospace vehicles often requires the accurate computation of the unsteady, three dimensional viscous flow fields. An added element of complexity is introduced when the unsteady flow field is triggered either wholly or partially by the relative motion of one or more components of the vehicular aggregate with respect to the others. In recent years, overset grid methods have proven t o be an efficient and versatile approach for simulating such unsteady flow fields. Contemporaneously, some of the current generation distributed memory MIMD computers have proven to be highly costeffective alternatives to the conventional vector supercomputers for a variety of computational aeroscience applications. This study investigates one of the approaches for implementing such a moving body overset grid RANS flow solver on one of the distributed memory MIMD computers, i.e., a 160 node IBM SP2. Also, performance data for two realistic aircraft configurations representative of those encountered in present day moving body aerodynamic simulations are presented and analyzed. The results confirm the feasibility of carrying out such complex, large scale computational aeroscience simulation in a cost effective manner using IBM SP2. *Research Scientist, Computer Sciences Corporation, Member AIAA. ~ N A S A Ames Research Center. tResearch Scientist, Overset Methods, Inc. Member AIAA. Copyright 01995 by the American Institute of Aeronautics and Astronautics, Inc. No copyright is asserted in the United States under Title 17, U.S. Code. The U.S. Government has a royaltyIntroduction The analysis and detailed design of the next generation of high-performance aircraft, particularly in the critical regions of their flight envelopes, often requires the computation of unsteady, three-dimensional viscous flow fields around them. The task is further complicated when the unsteady flow field is induced by the relative motion of one or more components of the aircraft with respect to the others. When these designs are subject to strict requirements with regard to cost, fuel consumption and noise pollution, they demand the use of advanced time-accurate, ReynoldsAveraged Navier-Stokes (RANS) flow solvers to resolve complex phenomena such as aerodynamic interference, nonlinear acoustics, vortical wakes and viscous effects. However, the computation of these high physical fidelity unsteady flow fields around the complete aircraft configurations are invariably accompanied by greatly increased demands for computational resources. These enormous computational costs tend to erode the utility of such large-scale simulations during the detailed design phase of the aircraft. On the other hand, the recent advances in some of the computer hardware technologies opens up exciting new possibilities for providing this much needed, high quality computational resources in sufficiently large quantities, at an affordable cost. Notable among such technologies are the advent of the mass-produced, 64-bit, high performance, Reduced Instruction Set Computing (RISC) microprocessor chip sets, high density Dynamic Random Access Memory (DRAM) chips and high-speed interconnect networks that are readily scalable to hundreds of nodes. The current generation of high-speed, distributed-memory, multiple-instruction stream, multiple-data stream (DM-MIMD) computers such as Cray T3D and the IBM SP2 are prime examples of highly parallel processor architecture designs that incorporate the very best in many of the above techno-

parallel computing | 1990

Implementation of two projection methods on a shared memory multiprocessor: DEC VAX 6240

Chandrika Kamath; Sisira Weeratunga

Abstract In this paper, we compare the relative performance of two iterative schemes, based on projection techniques, on a shared memory multiprocessor - VAX 6240. We consider the CG accelerated Block-SSOR method and the CG accelerated Symmetric-Kaczmarz method for the solution of large non-symmetric systems of linear equations. We show that the regular structure of many matrices can be exploited by the CG-accelerated Block-SSOR method to provide good speedup in a multiprocessing environment. However, the CG accelerated Symmetric-Kaczmarz method, while being a viable alternative on a scalar machine, is unable to benefit from multiprocessing.

conference on high performance computing (supercomputing) | 1991

The NAS parallel benchmarks—summary and preliminary results

No abstract available

ieee international conference on high performance computing data and analytics | 1991

The Nas Parallel Benchmarks

SC | 1991

The NAS parallel benchmarkssummary and preliminary results

David H. Bailey; Eric Barszcz; John T. Barton; Dave Browning; Robert L. Carter; Leonardo Dagum; Rod Fatoohi; Paul O. Frederickson; Tom Lasinski; Robert Schreiber; Horst D. Simon; V. Venkatakrishnan; Sisira Weeratunga

Explore More