Kalyan S. Perumalla | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kalyan S. Perumalla is active.

Explore More

Publication

Featured researches published by Kalyan S. Perumalla.

modeling, analysis, and simulation on computer and telecommunication systems | 2003

Large-scale network simulation: how big? how fast?

Richard M. Fujimoto; Kalyan S. Perumalla; Alfred Park; Hao Wu; Mostafa H. Ammar; George F. Riley

Parallel and distributed simulation tools are emerging that offer the ability to perform detailed, packet-level simulations of large-scale computer networks on an unprecedented scale. The state-of-the-art in large-scale network simulation is characterized quantitatively. For this purpose, a metric based on the number of packet transmissions that can be processed by a simulator per second of wallclock time (PTS) is used as a means to quantitatively assess packet-level network simulator performance. An approach to realizing scalable network simulations that leverages existing sequential simulation models and software is described. Results from a recent performance study are presented concerning large-scale network simulation on a variety of platforms ranging from workstations to cluster computers to supercomputers. These experiments include runs utilizing as many as 1536 processors yielding performance as high as 106 million PTS. The performance of packet-level simulations of web and ftp traffic, and denial of service attacks on networks containing millions of network nodes are briefly described, including a run demonstrating the ability to simulate a million web traffic flows in near real-time. New opportunities and research challenges to fully exploit this capability are discussed.

ACM Transactions on Modeling and Computer Simulation | 1999

Efficient optimistic parallel simulations using reverse computation

Christopher D. Carothers; Kalyan S. Perumalla; Richard M. Fujimoto

In optimistic parallel simulations, state-saving techniques have traditionally been used to realize rollback. In this article, we propose reverse computation as an alternative approach, and compare its execution performance against that of state-saving. Using compiler techniques, we describe an approach to automatically generate reversible computations, and to optimize them to reap the performance benefits of reverse computation transparently. For certain fine-grain models, such as queuing network models, we show that reverse computation can yield significant improvement in execution speed coupled with significant reduction in memory utilization, as compared to traditional state-saving. On sample models using reverse computation, we observe as much as a six-fold improvement in execution speed over traditional state-saving.

workshop on parallel and distributed simulation | 2005

/spl mu/sik - a micro-kernel for parallel/distributed simulation systems

Kalyan S. Perumalla

A novel micro-kernel approach to building parallel/distributed simulation systems is presented. Using this approach, a unified system architecture is developed for incorporating multiple types of simulation processes. The processes hold potential to employ a variety of synchronization mechanisms, and could even alter their choice of mechanism dynamically. Supported mechanisms include traditional lookahead-based conservative and state saving-based optimistic execution approaches. Also supported are newer mechanisms such as reverse computation-based optimistic execution and aggregation-based event processing, all within a single parsimonious application programming interface. The internal implementation and a preliminary performance evaluation of this interface are presented in /spl mu/sik, which is an efficient parallel/distributed realization of the microkernel architecture in C/sup ++/.

winter simulation conference | 2006

Parallel and distributed simulation: traditional techniques and recent advances

Kalyan S. Perumalla

This tutorial on parallel and distributed simulation systems reviews some of the traditional synchronization techniques and presents some recent advances

ACM Transactions on Modeling and Computer Simulation | 2004

A federated approach to distributed network simulation

George F. Riley; Mostafa H. Ammar; Richard M. Fujimoto; Alfred Park; Kalyan S. Perumalla; Donghua Xu

We describe an approach and our experiences in applying federated simulation techniques to create large-scale parallel simulations of computer networks. Using the federated approach, the topology and the protocol stack of the simulated network is partitioned into a number of submodels, and a simulation process is instantiated for each one. Runtime infrastructure software provides services for interprocess communication and synchronization (time management). We first describe issues that arise in homogeneous federations where a sequential simulator is federated with itself to realize a parallel implementation. We then describe additional issues that must be addressed in heterogeneous federations composed of different network simulation packages, and describe a dynamic simulation backplane mechanism that facilitates interoperability among different network simulators. Specifically, the dynamic simulation backplane provides a means of addressing key issues that arise in federating different network simulators: differing packet representations, incomplete implementations of network protocol models, and differing levels of detail among the simulation processes. We discuss two different methods for using the backplane for interactions between heterogeneous simulators: the cross-protocol stack method and the split-protocol stack method. Finally, results from an experimental study are presented for both the homogeneous and heterogeneous cases that provide evidence of the scalability of our federated approach on two moderately sized computing clusters. Two different homogeneous implementations are described: Parallel/Distributed ns (pdns) and the Georgia Tech Network Simulator (GTNetS). Results of a heterogeneous implementation federating ns with GloMoSim are described. This research demonstrates that federated simulations are a viable approach to realizing efficient parallel network simulation tools.

distributed simulation and real time applications | 2000

Design of high performance RTI software

Richard M. Fujimoto; Thom McLean; Kalyan S. Perumalla; Ivan Tacic

This paper describes the implementation of RTI-Kit, a modular software package to realize runtime infrastructure (RTI) software for distributed simulations such as those for the High Level Architecture. RTI-Kit software spans a wide variety of computing platforms, ranging from tightly coupled machines such as shared memory multiprocessors and cluster computers to distributed workstations connected via a local area or wide area network. The time management, data distribution management, and underlying algorithms and software are described.

simulation tools and techniques for communications, networks and system | 2010

Efficient simulation of agent-based models on multi-GPU and multi-core clusters

Brandon G. Aaby; Kalyan S. Perumalla; Sudip K. Seal

An effective latency-hiding mechanism is presented in the parallelization of agent-based model simulations (ABMS) with millions of agents. The mechanism is designed to accommodate the hierarchical organization as well as heterogeneity of current state-of-the-art parallel computing platforms. We use it to explore the computation vs. communication trade-off continuum available with the deep computational and memory hierarchies of extant platforms and present a novel analytical model of the tradeoff. We describe our implementation and report preliminary performance results on two distinct parallel platforms suitable for ABMS: CUDA threads on multiple, networked graphical processing units (GPUs), and pthreads on multi-core processors. Message Passing Interface (MPI) is used for inter-GPU as well as inter-socket communication on a cluster of multiple GPUs and multi-core processors. Results indicate the benefits of our latency-hiding scheme, delivering as much as over 100-fold improvement in runtime for certain benchmark ABMS application scenarios with several million agents. This speed improvement is obtained on our system that is already two to three orders of magnitude faster on one GPU than an equivalent CPU-based execution in a popular simulator in Java. Thus, the overall execution of our current work is over four orders of magnitude faster when executed on multiple GPUs.

IEEE Communications Magazine | 1998

Parallel simulation techniques for large-scale networks

Sandeep N. Bhatt; Richard M. Fujimoto; Andy Ogielski; Kalyan S. Perumalla

Simulation has always been an indispensable tool in the design and analysis of telecommunication networks. Due to performance limitations of the majority of simulators, usually network simulations have been done for rather small network models and for short timescales. In contrast, many difficult design problems facing todays network engineers concern the behavior of very large hierarchical multihop networks carrying millions of multiprotocol flows over long timescales. Examples include scalability and stability of routing protocols, packet losses in core routers, of long-lasting transient behavior due to observed self-similarity of traffic patterns. Simulation of such systems would greatly benefit from application of parallel computing technologies, especially now that multiprocessor workstations and servers have become commonly available. However, parallel simulation has not yet been widely embraced by the telecommunications community due to a number of difficulties. Based on our accumulated experience in parallel network simulation projects, we believe that parallel simulation technology has matured to the point that it is ready to be used in industrial practice of network simulation. This article highlights work in parallel simulations of networks and their promise.

workshop on parallel and distributed simulation | 2006

Discrete-event Execution Alternatives on General Purpose Graphical Processing Units (GPGPUs)

Kalyan S. Perumalla

Graphics cards, traditionally designed as accelerators for computer graphics, have evolved to support more general-purpose computation. General Purpose Graphical Processing Units (GPGPUs) are now being used as highly efficient, cost-effective platforms for executing certain simulation applications. While most of these applications belong to the category of timestepped simulations, little is known about the applicability of GPGPUs to discrete event simulation (DES). Here, we identify some of the issues & challenges that the GPGPU stream-based interface raises for DES, and present some possible approaches to moving DES to GPGPUs. Initial performance results on simulation of a diffusion process show that DES-style execution on GPGPU runs faster than DES on CPU and also significantly faster than time-stepped simulations on either CPU or GPGPU.

computing frontiers | 2007

Scaling time warp-based discrete event execution to 104 processors on a Blue Gene supercomputer

Kalyan S. Perumalla

Lately, important large-scale simulation applications, such as emergency/event planning and response, are emerging that are based on discrete event models. The applications are characterized by their scale (several millions of simulated entities), their fine-grained nature of computation (microseconds per event), and their highly dynamic inter-entity event interactions. The desired scale and speed together call for highly scalable parallel discrete event simulation (PDES) engines. However, few such parallel engines have been designed or tested on platforms with thousands of processors. Here an overview is given of a unique PDES engine that has been designed to support Time Warp-style optimistic parallel execution as well as a more generalized mixed, optimistic-conservative synchronization. The engine is designed to run on massively parallel architectures with minimal overheads. A performance study of the engine is presented, including the first results to date of PDES benchmarks demonstrating scalability to as many as 16,384 processors, on an IBM Blue Gene supercomputer. The results show, for the first time, the promise of effectively sustaining very large scale discrete event execution on up to 104 processors.

Explore More