Nandakishore Santhi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nandakishore Santhi is active.

Explore More

Publication

Featured researches published by Nandakishore Santhi.

winter simulation conference | 2015

The simian concept: parallel discrete event simulation with interpreted languages and just-in-time compilation

Nandakishore Santhi; Stephan Eidenbenz; Jason Liu

We introduce Simian, a family of open-source Parallel Discrete Event Simulation (PDES) engines written using Lua and Python. Simian reaps the benefits of interpreted languages-ease of use, fast development time, enhanced readability and a high degree of portability on different platforms-and, through the optional use of Just-In-Time (JIT) compilation, achieves high performance comparable with the state-of-the-art PDES engines implemented using compiled languages such as C or C++. This paper describes the main design concepts of Simian, and presents a benchmark performance study, comparing four Simian implementations (written in Python and Lua, with and without using JIT) against a traditionally compiled simulator, MiniSSF, written in C++. Our experiments show that Simian in Lua with JIT outperforms MiniSSF, sometimes by a factor of three under high computational workloads.

international symposium on information theory | 2007

On Algebraic Decoding of q-ary Reed-Muller and Product Reed-Solomon Codes

Nandakishore Santhi

We consider a list decoding algorithm recently proposed by Pellikaan-Wu [8] for q-ary Reed-Muller codes R M q(l, m, n) of length n les qm when I les q. A simple and easily accessible correctness proof is given which shows that this algorithm achieves a relative error-correction radius of taules (1 - radiclqm-1 /n). This is an improvement over the proof using one-point Algebraic-Geometric codes given in [8]. The described algorithm can be adapted to decode Product-Reed- Solomon codes. We then propose a new low complexity recursive algebraic decoding algorithm for Reed-Muller and product-Reed-Solomon codes. Our algorithm achieves a relative error correction radius of tau < Pii=1 m(1 - radicki/q). This technique is then proved to outperform the Pellikaan-Wu method in both complexity and error correction radius over a wide range of code rates.

principles of advanced discrete simulation | 2016

An Integrated Interconnection Network Model for Large-Scale Performance Prediction

Kishwar Ahmed; Mohammad Abu Obaida; Jason Liu; Stephan Eidenbenz; Nandakishore Santhi; Guillaume Chapuis

Interconnection network is a critical component of high-performance computing architecture and application co-design. For many scientific applications, the increasing communication complexity poses a serious concern as it may hinder the scaling properties of these applications on novel architectures. It is apparent that a scalable, efficient, and accurate interconnect model would be essential for performance evaluation studies. In this paper, we present an interconnect model for predicting the performance of large-scale applications on high-performance architectures. In particular, we present a sufficiently detailed interconnect model for Crays Gemini 3-D torus network. The model has been integrated with an implementation of the Message-Passing Interface (MPI) that can mimic most of its functions with packet-level accuracy on the target platform. Extensive experiments show that our integrated model provides good accuracy for predicting the network behavior, while at the same time allowing for good parallel scaling performance.

Simulation | 2016

Discrete event performance prediction of speculatively parallel temperature-accelerated dynamics

Richard J. Zamora; Arthur F. Voter; Danny Perez; Nandakishore Santhi; Susan M. Mniszewski; Sunil Thulasidasan; Stephan Eidenbenz

Due to its unrivaled ability to predict the dynamical evolution of interacting atoms, molecular dynamics (MD) is a widely used computational method in theoretical chemistry, physics, biology, and engineering. Despite its success, MD is only capable of modeling timescales within several orders of magnitude of thermal vibrations, leaving out many important phenomena that occur at slower rates. The temperature-accelerated dynamics (TAD) method overcomes this limitation by thermally accelerating the state-to-state evolution captured by MD. Due to the algorithmically complex nature of the serial TAD procedure, implementations have yet to improve performance by parallelizing the concurrent exploration of multiple states. Here we utilize a discrete-event-based application simulator to introduce and explore a new speculatively parallel TAD (SpecTAD) method. We investigate the SpecTAD algorithm, without a full-scale implementation, by constructing an application simulator proxy (SpecTADSim). Following this method, we discover that a non-trivial relationship exists between the optimal SpecTAD parameter set and the number of CPU cores available at run-time. Furthermore, we find that a majority of the available SpecTAD boost can be achieved within an existing TAD application using relatively simple algorithm modifications.

winter simulation conference | 2015

Parameterized benchmarking of parallel discrete event simulation systems: communication, computation, and memory

Eunjung Park; Stephan Eidenbenz; Nandakishore Santhi; Guillaume Chapuis; Bradley W. Settlemyer

We introduce La-pdes, a parameterized benchmark application for measuring parallel and serial discrete event simulation (PDES) performance. Applying a holistic view of PDES system performance, La-pdes tests the performance factors of (i) the (P)DES engine in terms of event queue efficiency, synchronization mechanism, and load-balancing schemes; (ii) available hardware in terms of handling computationally intensive loads, memory size, cache hierarchy, and clock speed; and (iii) interaction with communication middleware (often MPI) through message buffering. La-pdes consists of seven scenarios for individual performance factors and an agglomerative stress evaluation scenario. The scenarios are implemented through concrete values of input parameters to La-pdes, which include number of entities and events, endtime, inter-send time distributions, computational and event load distributions, memory use distributions, cache-friendliness, and event queue sizes. We demonstrate through instrumentation that La-pdes assumptions regarding distributions are realistic and we present results of the eight scenarios on the PDES engine Simian.

winter simulation conference | 2010

Cybersim: geographic, temporal, and organizational dynamics of malware propagation

Nandakishore Santhi; Guanhua Yan; Stephan Eidenbenz

Cyber-infractions into a nations strategic security envelope pose a constant and daunting challenge. We present the modular CyberSim tool which has been developed in response to the need to realistically simulate at a national level, software vulnerabilities and resulting malware propagation in online social networks. CyberSim suite (a) can generate realistic scale-free networks from a database of geocoordinated computers to closely model social networks arising from personal and business email contacts and online communities; (b) maintains for each host a list of installed software, along with the latest published vulnerabilities; (c) allows to designate initial nodes where malware gets introduced; (d) simulates using distributed discrete event-driven technology, the spread of malware exploiting a specific vulnerability, with packet delay and user online behavior models; (e) provides a graphical visualization of spread of infection, its severity, businesses affected etc to the analyst. We present sample simulations on a national level network with millions of computers.

international conference on cluster computing | 2017

AMM: Scalable Memory Reuse Model to Predict the Performance of Physics Codes

Gopinath Chennupati; Nandakishore Santhi; Stephan Eidenbenz; Sunil Thulasidasan

As the US Department of Energy (DOE) invests in exascale computing, scalable performance modeling of physics codes on CPUs remains a hard challenge in computational codesign due to advanced design features of processors such as the memory hierarchy, instruction pipelining, and speculative execution. Reuse distance is a powerful (but unscalable) characteristic that helps to predict cache hit-rates. We propose, Analytical Memory Model (AMM), a novel hardware model based on cache memory hierarchies. AMM efficiently computes close approximations of reuse distance distributions through a combination of static analysis of basic code blocks and sampling from very small code instances. The results show that AMM accurately predicts reuse profiles of scientific mini-applications (for example, matrix multiplication). Coupling AMM with the Performance Prediction Toolkit (PPT), we further show a scalable runtime prediction of scientific codes on Intel Xeon.

performance evaluation methodolgies and tools | 2016

GPU Performance Prediction Through Parallel Discrete Event Simulation and Common Sense

Guillaume Chapuis; Stephan Eidenbenz; Nandakishore Santhi

We present the GPU Module of a Performance Prediction Toolkit developed at Los Alamos National Laboratory, which enables code developers to efficiently test novel algorithmic ideas particularly for large-scale computational physics codes. The GPU Module is a heavily-parameterized model of the GPU hardware that takes as input a sequence of abstracted instructions that the user provides as a representation of the application or can also be read in from the GPU intermediate representation PTX format. These instructions are then executed in a discrete event simulation framework of the entire computing infrastructure that can include multi-GPU and also multi-node components as typically found in high performance computing applications. Our GPU Module aims at a trade-off between the cycle-accuracy of GPU simulators and the fast execution times of analytical models. This trade-off is achieved by simulating at cycle level only a portion of the computations and using this partial runtime to analytically predict the total execution of the modeled application. We present GPU models that we validate against three different benchmark applications that cover the range from bandwidth- to cycle-limited. Our runtime predictions are within an error of 20%. We then predict performance of a next-generation GPU (Nvidia’s Pascal) for the same benchmark applications.

international conference on cluster computing | 2017

A Probabilistic Monte Carlo Framework for Branch Prediction

Bhargava Kalla; Nandakishore Santhi; Abdel-Hameed A. Badawy; Gopinath Chennupati; Stephan Eidenbenz

Branch prediction is crucial in improving the throughput of microprocessors. It reduces branching stalls in the pipeline, which helps to maintain the instruction execution flow. Of these instructions, conditional branches are non-trivial in determining the microprocessor performance and throughput. Modern microprocessors accurately predict the branches using advanced branch prediction techniques. Appropriately estimating the branch mis-predictions benefits to improve the overall performance of an application through effectively saving the CPU cycles. In general, collecting branch prediction statistics using state-of-the-art simulators is time consuming and not scalable. We present a novel Monte Carlo simulation framework that predicts branch mis-prediction rate. Our framework produces results that suggest that the mis-prediction rates on three scientific applications are similar (with an average difference of 0.3%) to that of a Markov model of a 2-bit saturating branch predictor.

winter simulation conference | 2015

Simian integrated framework for parallel discrete event simulation on GPUs

Guillaume Chapuis; Stephan Eidenbenz; Nandakishore Santhi; Eunjung Park

Discrete Event Simulation (DES) allows the modelling of ever more complex systems in a variety of domains ranging from biological systems to road networks. The increasing need to model larger systems stresses the demand for efficient parallel implementations of DES engines. Recently, Graphics Processing Units have emerged as an efficient alternative to Central Processing Units for the computation of some problems. Although substantial speedups can be achieved by using GPUs, writing an efficient implementations of given suitable problems often requires in-depth knowledge of the architecture. We present a new framework integrated in the Simian engine, which allows to make efficient use of GPUs for computationally intense sections of code. This framework allows modellers to offset some or all handlers to the GPU by efficiently grouping and scheduling these handlers. As a case-study, we implement a population activity simulation that takes into account evolving traffic conditions in a simulated urban area.

Explore More