Is this you? Create Your Porfile

Alok Prakash

Nanyang Technological University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alok Prakash is active.

Explore More

Publication

Featured researches published by Alok Prakash.

design automation conference | 2014

Integrated CPU-GPU Power Management for 3D Mobile Games

Anuj Pathania; Qing Jiao; Alok Prakash; Tulika Mitra

Modern system-on-chips (SoC) integrate CPU and GPU for immersive 3D gaming experience. These games require both the CPU and GPU to work in tandem, resulting in high power consumption. In the past, Dynamic Voltage Frequency Scaling (DVFS) has been exploited for embedded CPU to save power during game play; but it is only recently that embedded GPUs have attained DVFS capabilities that provide additional opportunities. In this paper, we propose a power management approach that takes a unified view of the CPU-GPU DVFS, resulting in reduced power consumption for latest 3D mobile games compared to an independent CPU-GPU power management approach.

design automation conference | 2015

Power-Performance Modelling of Mobile Gaming Workloads on Heterogeneous MPSoCs

Anuj Pathania; Alexandru Eugen Irimiea; Alok Prakash; Tulika Mitra

Games have emerged as one of the most popular applications on mobile platforms. Recent platforms are now equipped with Heterogeneous Multiprocessor System-on-Chips (HMPSoCs) tightly integrating CPUs and GPUs on the same chip. This configuration enables high-end gaming on the platform but at the cost of high power consumption rapidly draining the underlying limited-capacity battery. The HMPSoCs are capable of independent Dynamic Voltage and Frequency Scaling (DVFS) for CPUs and GPUs for reduction in platforms power consumption. State-of-the-art power manager for mobile games on HMPSoCs oversimplifies the complex CPU-GPU interplay. In this paper, we develop power-performance models predicting the impact of DVFS on mobile gaming workloads. Based on our models, we propose an efficient power management strategy and implement it on an Odroid-XU+E mobile platform. Measurements on the platform show that our power manager provides on average 20% increase in performance per watt when compared to the state-of-the-art.

rapid system prototyping | 2009

Efficient Heuristics for Minimizing Communication Overhead in NoC-based Heterogeneous MPSoC Platforms

Amit Kumar Singh; Wu Jigang; Alok Prakash; Thambipillai Srikanthan

The number of tasks executing in MPSoC platform can exceed the available resources, requiring efficient run-time mapping strategies to meet the real-time constraints of the applications. This paper describes two new run-time mapping heuristics for mapping applications onto NoC-based Heterogeneous Multiprocessor Systems-on-Chip (MPSoC). The heuristics proposed in this paper attempt to map the tasks of an application in close proximity to each other so as to minimize the communication overhead. In addition, they have been shown to alleviate NoC congestion bottlenecks to maximize overall computation performance. Based on our evaluations to map applications with varying number of tasks onto an 8× 8 platform, we demonstrate that the new mapping heuristics are capable of reducing the total execution time, channel load and latency of applications when compared to state-of-the-art run-time mapping heuristics reported in the literature. Moreover, we show that the proposed heuristics are highly scalable and provide for high-speed realization justifying their applicability to complex MPSoC platforms.

design automation conference | 2016

Lin-analyzer: a high-level performance analysis tool for FPGA-based accelerators

Guanwen Zhong; Alok Prakash; Yun Liang; Tulika Mitra; Smail Niar

The increasing complexity of FPGA-based accelerators, coupled with time-to-market pressure, makes high-level synthesis (HLS) an attractive solution to improve designer productivity by abstracting the programming effort above registertransfer level (RTL). HLS offers various architectural design options with different trade-offs via pragmas (loop unrolling, loop pipelining, array partitioning). However, non-negligible HLS runtime renders manual or automated HLS-based exhaustive architectural exploration practically infeasible. To address this challenge, we present Lin-Analyzer, a high-level accurate performance analysis tool that enables rapid design space exploration with various pragmas for FPGA-based accelerators without requiring RTL implementations.

international conference on computer design | 2015

Energy-efficient execution of data-parallel applications on heterogeneous mobile platforms

Alok Prakash; Siqi Wang; Alexandru Eugen Irimiea; Tulika Mitra

State-of-the-art mobile system-on-chips (SoC) include heterogeneity in various forms for accelerated and energy-efficient execution of diverse range of applications. The modern SoCs now include programmable cores such as CPU and GPU with very different functionality. The SoCs also integrate performance heterogeneous cores with different power-performance characteristics but the same instruction-set architecture such as ARM big.LITTLE. In this paper, we first explore and establish the combined benefits of functional heterogeneity and performance heterogeneity in improving power-performance behavior of data parallel applications. Next, given an application specified in OpenCL, we present a static partitioning strategy to execute the application kernel across CPU and GPU cores along with voltage-frequency setting for individual cores so as to obtain the best power-performance tradeoff. We achieve over 19% runtime improvement by exploiting the functional and performance heterogeneities concurrently. In addition, energy saving of 36% is achieved by using appropriate voltage-frequency setting without significantly degrading the runtime improvement from concurrent execution.

design automation conference | 2016

Improving mobile gaming performance through cooperative CPU-GPU thermal management

Alok Prakash; Hussam Amrouch; Muhammad Shafique; Tulika Mitra; Jörg Henkel

State-of-the-art thermal management techniques independently throttle the frequencies of high-performance multi-core CPU and powerful graphics processing units (GPU) on heterogeneous multiprocessor system-on-chips deployed in latest mobile devices. For graphics-intensive gaming applications, this approach is inadequate because both the CPU and the GPU contribute towards the overall application performance (frames per second or FPS) as well as the on-chip temperature. The lack of coordination between CPU and GPU induces recurrent frequency throttling to maintain on-chip temperature below the permissible limit. This leads to significantly degraded application performance and large variation in temperature over time. We propose a control-theory based dynamic thermal management technique that cooperatively scales CPU and GPU frequencies to meet the thermal constraint while achieving high performance for mobile gaming. Experimental results with six popular Android games on a commercial mobile platform show an average 19% performance improvement and over 90% reduction in temperature variance compared to the original Linux approach.

design, automation, and test in europe | 2017

Secure Cyber-Physical Systems: Current trends, tools and open research problems

Anupam Chattopadhyay; Alok Prakash; Muhammad Shafique

To understand and identify the attack surfaces of a Cyber-Physical System (CPS) is an essential step towards ensuring its security. The growing complexity of the cybernetics and the interaction of independent domains such as avionics, robotics and automotive is a major hindrance against a holistic view CPS. Furthermore, proliferation of communication networks have extended the reach of CPS from a user-centric single platform to a widely distributed network, often connecting to critical infrastructure, e.g., through smart energy initiative. In this manuscript, we reflect on this perspective and provide a review of current security trends and tools for secure CPS. We emphasize on both the design and execution flows and particularly highlight the necessity of efficient attack surface detection. We provide a detailed characterization of attacks reported on different cyber-physical systems, grouped according to their application domains, attack complexity, attack source and impact. Finally, we review the current tools, point out their inadequacies and present a roadmap of future research.

design, automation, and test in europe | 2017

Design Space exploration of FPGA-based accelerators with multi-level parallelism

Guanwen Zhong; Alok Prakash; Siqi Wang; Yun Liang; Tulika Mitra; Smail Niar

Applications containing compute-intensive kernels with nested loops can effectively leverage FPGAs to exploit fine-and coarse-grained parallelism. HLS tools used to translate these kernels from high-level languages (e.g., C/C−−), however, are inefficient in exploiting multiple levels of parallelism automatically, thereby producing sub-optimal accelerators. Moreover, the large design space resulting from the various combinations of fineand coarse-grained parallelism options makes exhaustive design space exploration prohibitively time-consuming with HLS tools. Hence, we propose a rapid estimation framework, MPSeeker, to evaluate performance/area metrics of various accelerator options for an application at an early design phase. Experimental results show that MPSeeker can rapidly (in minutes) explore the complex design space and accurately estimate performance/area of various design points to identify the near-optimal (95.7% performance of the optimal on average) combination of parallelism options.

Microprocessors and Microsystems | 2013

FPGA-aware techniques for rapid generation of profitable custom instructions

Alok Prakash; Siew Kei Lam; Christopher T. Clarke; Thambipillai Srikanthan

Abstract Instruction set extension of FPGA based reconfigurable processors provides an effective means to meet the increasingly strict design constraints of embedded systems. We have shown in our previous works [20,21] that the usage of FPGA architectural constraints for pruning the design space during enumeration of custom instructions/patterns not only leads to notable reduction in the time taken to identify custom instructions but can also result in the selection of profitable custom instructions when the area is highly constrained. However when area constraint is relaxed, the previously proposed methods failed to perform better than traditional methods. In this paper, we propose a heuristic to identify profitable custom instructions for designs with arbitrary area constraints . The proposed heuristic relies on a new pruning criterion to enumerate patterns with high size-to-hardware-area ratio. We also proposed a suitable algorithm to select profitable custom instructions from the enumerated patterns. The proposed template selection algorithm takes advantage of the FPGA area-time measures of the enumerated patterns, which can be easily inferred from the FPGA-aware enumeration strategy. Experimental results show that the proposed methods in this paper result in custom instructions that achieve an average performance gain of 76.23% over current state-of-the-art approaches.

field-programmable logic and applications | 2009

Rapid design exploration framework for application-aware customization of soft core processors

Alok Prakash; Siew Kei Lam; Amit Kumar Singh; Thambipillai Srikanthan

Off-the-shelf soft core processors are becoming increasingly popular in embedded systems design today as they provide for application specific customization, in particular through instruction subsetting. However, choosing the right processor configuration remains a challenge as the search space becomes prohibitively large when the configurable options increase. In this paper we propose a framework to rapidly explore the processor configuration design space for a given application. Unlike existing approaches that require timeconsuming synthesis process, the proposed method relies only on a single-pass output of the LLVM compiler infrastructure. Experimental results based on widely used benchmarks show that the proposed framework can reliably predict the actual performance and area trends of various configurable options.

Explore More