Srikanth B. Yoginath | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Srikanth B. Yoginath is active.

Explore More

Publication

Featured researches published by Srikanth B. Yoginath.

workshop on parallel and distributed simulation | 2008

Parallel Vehicular Traffic Simulation using Reverse Computation-based Optimistic Execution

Srikanth B. Yoginath; Kalyan S. Perumalla

Vehicular traffic simulations are useful in applications such as emergency management and homeland security planning tools. High speed of traffic simulations translates directly to speed of response and level of resilience in those applications. Here, a parallel traffic simulation approach is presented that is aimed at reducing the time for simulating emergency vehicular traffic scenarios. Three unique aspects of this effort are: (1) exploration of optimistic simulation applied to vehicular traffic simulation (2) addressing reverse computation challenges specific to optimistic vehicular traffic simulation (3) achieving absolute (as opposed to self- relative) speedup with a sequential speed equal to that of a fast, de facto standard sequential simulator for emergency traffic. The design and development of the parallel simulation system is presented, along with a performance study that demonstrates excellent sequential performance as well as parallel performance.

workshop on parallel and distributed simulation | 2009

GPU-based Real-Time Execution of Vehicular Mobility Models in Large-Scale Road Network Scenarios

Kalyan S. Perumalla; Brandon G. Aaby; Srikanth B. Yoginath; Sudip K. Seal

A methodology and its associated algorithms are presented for mapping a novel, field-based vehicular mobility model onto graphical processing unit computational platform for simulating mobility in large-scale road networks. Of particular focus is the achievement of real-time execution, on desktop platforms, of vehicular mobility on road networks comprised of millions of nodes and links, and multi-million counts of simultaneously active vehicles. The methodology is realized in a system called GARFIELD, whose implementation details and performance study are described. The runtime characteristics of a prototype implementation are presented that show real-time performance in simulations of networks at the scale of a few states of the US road networks.

ACM Transactions on Modeling and Computer Simulation | 2015

Efficient Parallel Discrete Event Simulation on Cloud/Virtual Machine Platforms

Srikanth B. Yoginath; Kalyan S. Perumalla

Cloud and Virtual Machine (VM) technologies present new challenges with respect to performance and monetary cost in executing parallel discrete event simulation (PDES) applications. Due to the introduction of overall cost as a metric, the traditional use of the highest-end computing configuration is no longer the most obvious choice. Moreover, the unique runtime dynamics and configuration choices of Cloud and VM platforms introduce new design considerations and runtime characteristics specific to PDES over Cloud/VMs. Here, an empirical study is presented to help understand the dynamics, trends, and trade-offs in executing PDES on Cloud/VM platforms. Performance and cost measures obtained from multiple PDES applications executed on the Amazon EC2 Cloud and on a high-end VM host machine reveal new, counterintuitive VM--PDES dynamics and guidelines. One of the critical aspects uncovered is the fundamental mismatch in hypervisor scheduler policies designed for general Cloud workloads versus the virtual time ordering needed for PDES workloads. This insight is supported by experimental data revealing the gross deterioration in PDES performance traceable to VM scheduling policy. To overcome this fundamental problem, the design and implementation of a new deadlock-free scheduler algorithm are presented, optimized specifically for PDES applications on VMs. The scalability of our scheduler has been tested in up to 128 VMs multiplexed on 32 cores, showing significant improvement in the runtime relative to the default Cloud/VM scheduler. The observations, algorithmic design, and results are timely for emerging Cloud/VM-based installations, highlighting the need for PDES-specific support in high-performance discrete event simulations on Cloud/VM platforms.

principles of advanced discrete simulation | 2013

Empirical evaluation of conservative and optimistic discrete event execution on cloud and VM platforms

Srikanth B. Yoginath; Kalyan S. Perumalla

Virtual machine (VM) technologies, especially those offered via Cloud platforms, present new dimensions with respect to performance and cost in executing parallel discrete event simulation (PDES) applications. Due to the introduction of overall cost as a metric, the choice of the highest-end computing configuration is no longer the most economical one. Moreover, runtime dynamics unique to VM platforms introduce new performance characteristics, and the variety of possible VM configurations give rise to a range of choices for hosting a PDES run. Here, an empirical study of these issues is undertaken to guide an understanding of the dynamics, trends and trade-offs in executing PDES on VM/Cloud platforms. Performance results and cost measures are obtained from actual execution of a range of scenarios in two PDES benchmark applications on the Amazon Cloud offerings and on a high-end VM host machine. The data reveals interesting insights into the new VM-PDES dynamics that come into play and also leads to counter-intuitive guidelines with respect to choosing the best and second-best configurations when overall cost of execution is considered. In particular, it is found that choosing the highest-end VM configuration guarantees neither the best runtime nor the least cost. Interestingly, choosing a (suitably scaled) low-end VM configuration provides the least overall cost without adversely affecting the total runtime.

modeling, analysis, and simulation on computer and telecommunication systems | 2012

Taming Wild Horses: The Need for Virtual Time-Based Scheduling of VMs in Network Simulations

Srikanth B. Yoginath; Kalyan S. Perumalla; Brian J. Henz

The next generation of scalable network simulators employ virtual machines (VMs) to act as high-fidelity models of traffic producer/consumer nodes in simulated networks. However, network simulations could be inaccurate if VMs are not scheduled according to virtual time, especially when many VMs are hosted per simulator core in a multi-core simulator environment. Since VMs are by default free-running, on the outset, it is not clear if, and to what extent, their untamed execution affects the results in simulated scenarios. Here, we provide the first quantitative basis for establishing the need for generalized virtual time scheduling of VMs in network simulators, based on an actual prototyped implementations. To exercise breadth, our system is tested with disparate applications: (a) a set of message passing parallel programs, (b) a computer worm propagation phenomenon, and (c) a mobile ad-hoc wireless network simulation. We define and use error metrics and benchmarks in scaled tests to empirically report the poor match of traditional, fairness-based VM scheduling to VM-based network simulation, and also clearly show the better performance of our simulation-specific scheduler, with up to 64 VMs hosted on a 12-core simulator node.

workshop on parallel and distributed simulation | 2011

Efficiently Scheduling Multi-Core Guest Virtual Machines on Multi-Core Hosts in Network Simulation

Srikanth B. Yoginath; Kalyan S. Perumalla

Virtual machine (VM)-based simulation is a method used by network simulators to incorporate realistic application behaviors by executing actual VMs as high-fidelity surrogates for simulated end-hosts. A critical requirement in such a method is the simulation time-ordered scheduling and execution of the VMs. Prior approaches such as time dilation are less efficient due to the high degree of multiplexing possible when multiple multi-core VMs are simulated on multi-core host systems. We present a new simulation time-ordered scheduler to efficiently schedule multi-core VMs on multi-core real hosts, with a virtual clock realized on each virtual core. The distinguishing features of our approach are: (1) customizable granularity of the VM scheduling time unit on the simulation time axis, (2) ability to take arbitrary leaps in virtual time by VMs to maximize the utilization of host (real) cores when guest virtual cores idle, and (3) empirically determinable optimality in the tradeoff between total execution (real) time and time-ordering accuracy levels. Experiments show that it is possible to get nearly perfect time-ordered execution, with a slight cost in total run time, relative to optimized non-simulation VM schedulers. Interestingly, with our time-ordered scheduler, it is also possible to reduce the time-ordering error from over 50% of non-simulation scheduler to less than 1% realized by our scheduler, with almost the same run time efficiency as that of the highly efficient non-simulation VM schedulers.

International Journal of Simulation and Process Modelling | 2009

Reversible discrete event formulation and optimistic parallel execution of vehicular traffic models

Srikanth B. Yoginath; Kalyan S. Perumalla

Vehicular traffic simulations are useful in applications such as emergency planning and traffic management, for rapid response and resilience. Here, a parallel traffic simulation approach is presented that reduces the time for simulating emergency vehicular traffic scenarios. We use a reverse computation-based optimistic execution approach to parallel execution of microscopic, vehicular-level models of traffic. The unique aspects of this effort are: exploration of optimistic simulation of vehicular traffic; addressing the related reverse computation challenges; achieving absolute, as opposed to self-relative, speedup. The design, development and performance study of the parallel simulation system is presented, demonstrating excellent sequential and parallel performance. A speed up of nearly 20 on 32 processors is observed on a vehicular network of 65,000 intersections and 13 million vehicles.

Journal of Physics: Conference Series | 2006

High performance statistical computing with parallel R: applications to biology and climate modelling

Nagiza F. Samatova; Marcia L. Branstetter; Auroop R. Ganguly; Robert L. Hettich; Shiraj Khan; Guruprasad Kora; Jiangtian Li; Xiaosong Ma; Chongle Pan; Arie Shoshani; Srikanth B. Yoginath

Ultrascale computing and high-throughput experimental technologies have enabled the production of scientific data about complex natural phenomena. With this opportunity, comes a new problem – the massive quantities of data so produced. Answers to fundamental questions about the nature of those phenomena remain largely hidden in the produced data. The goal of this work is to provide a scalable high performance statistical data analysis framework to help scientists perform interactive analyses of these raw data to extract knowledge. Towards this goal we have been developing an open source parallel statistical analysis package, called Parallel R, that lets scientists employ a wide range of statistical analysis routines on high performance shared and distributed memory architectures without having to deal with the intricacies of parallelizing these routines.

Simulation | 2012

Interactive, graphical processing unit-based evaluation of evacuation scenarios at the state scale

Kalyan S. Perumalla; Brandon G. Aaby; Srikanth B. Yoginath; Sudip K. Seal

In large-scale scenarios, transportation modeling and simulation is severely constrained by simulation time. For example, few real-time simulators scale to evacuation traffic scenarios at the level of an entire state, such as Louisiana (approximately 1 million links) or Florida (2.5 million links). New simulation approaches are needed to overcome severe computational demands of conventional (microscopic or mesoscopic) modeling techniques. Here, a new modeling and execution methodology is explored that holds the potential to provide a tradeoff among the level of behavioral detail, the scale of transportation network, and real-time execution capabilities. A novel, field-based modeling technique and its implementation on graphical processing units are presented. Although additional research with input from domain experts is needed for refining and validating the models, the techniques reported here afford interactive experience at very large scales of multi-million road segments. Illustrative experiments on a few state-scale networks are described based on an implementation of this approach in a software system called GARFIELD. Current modeling capabilities and implementation limitations are described, along with possible use cases and future research.

high performance computing and communications | 2016

Performance of Point and Range Queries for In-memory Databases Using Radix Trees on GPUs

Maksudul Alam; Srikanth B. Yoginath; Kalyan S. Perumalla

In in-memory database systems augmented by hardware accelerators, accelerating the index searching operations can greatly increase the runtime performance of database queries. Recently, adaptive radix trees (ART) have been shown to provide very fast index search implementation on the CPU. Here, we focus on an accelerator-based implementation of ART. We present a detailed performance study of our GPU-based adaptive radix tree (GRT) implementation over a variety of key distributions, synthetic benchmarks, and actual keys from music and book data sets. The performance is also compared with other index-searching schemes on the GPU. GRT on modern GPUs achieves some of the highest rates of index searches reported in the literature. For point queries, a throughput of up to 106 million and 130 million lookups per second is achieved for sparse and dense keys, respectively. For range queries, GRT yields 600 million and 1000 million lookups per second for sparse and dense keys, respectively, on a large dataset of 64 million 32-bit keys.

Explore More