Hiren D. Patel | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hiren D. Patel is active.

Explore More

Publication

Featured researches published by Hiren D. Patel.

compilers, architecture, and synthesis for embedded systems | 2008

Predictable programming on a precision timed architecture

Ben Lickly; Isaac Liu; Sungjun Kim; Hiren D. Patel; Stephen A. Edwards; Edward A. Lee

In a hard real-time embedded system, the time at which a result is computed is as important as the result itself. Modern processors go to extreme lengths to ensure their function is predictable, but have abandoned predictable timing in favor of average-case performance. Real-time operating systems provide timing-aware scheduling policies, but without precise worst-case execution time bounds they cannot provide guarantees. We describe an alternative in this paper: a SPARC-based processor with predictable timing and instruction-set extensions that provide precise timing control. Its pipeline executes multiple, independent hardware threads to avoid costly, unpredictable bypassing, and its exposed memory hierarchy provides predictable latency. We demonstrate the effectiveness of this precision-timed (PRET) architecture through example applications running in simulation.

international conference on hardware/software codesign and system synthesis | 2011

PRET DRAM controller: bank privatization for predictability and temporal isolation

Jan Reineke; Isaac Liu; Hiren D. Patel; Sungjun Kim; Edward A. Lee

Hard real-time embedded systems employ high-capacity memories such as Dynamic RAMs (DRAMs) to cope with increasing data and code sizes of modern designs. However, memory controller design has so far largely focused on improving average-case performance. As a consequence, the latency of memory accesses is unpredictable, which complicates the worst-case execution time analysis necessary for hard real-time embedded systems. Our work introduces a novel DRAM controller design that is predictable and that significantly reduces worst-case access latencies. Instead of viewing the DRAM device as one resource that can only be shared as a whole, our approach views it as multiple resources that can be shared between one or more clients individually. We partition the physical address space following the internal structure of the DRAM device, i.e., its ranks and banks, and interleave accesses to the blocks of this partition. This eliminates contention for shared resources within the device, making accesses temporally predictable and temporally isolated. This paper describes our DRAM controller design and its integration with a precision-timed (PRET) architecture called PTARM. We present analytical bounds on the latency and throughput of the proposed controller, and confirm these via simulation.

asia and south pacific design automation conference | 2010

SCGPSim: a fast SystemC simulator on GPUs

Mahesh Nanjundappa; Hiren D. Patel; Bijoy A. Jose; Sandeep K. Shukla

The main objective of this paper is to speed up the simulation performance of SystemC designs at the RTL abstraction level by exploiting the high degree of parallelism afforded by todays general purpose graphics processors (GPGPUs). Our approach parallelizes SystemCs discrete-event simulation (DES) on GPGPUs by transforming the model of computation of DES into a model of concurrent threads that synchronize as and when necessary. Unlike the cooperative threading model employed in the SystemC reference implementation, our threading model is capable of executing in parallel on the large number of simple processing units available on GPUs. Our simulation infrastructure is called SCGPSim1 and it includes a source-to-source (S2S) translator to transform synthesizable SystemC models into parallelly executable programs targeting an NVIDIA GPU. The translator retains the simulation semantics of the original designs by applying semantics preserving transformations. The resulting transformed models mapped onto the massively parallel architecture of GPUs improve simulation efficiency quite substantially. Preliminary experiments with varying-sized examples such as AES, ALU, and FIR have shown simulation speed-ups ranging from 30x to 100x. Considering that our transformations are not yet optimized, we believe that optimizing them will improve the simulation performance even further.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2005

Towards a heterogeneous simulation kernel for system-level models: a SystemC kernel for synchronous data flow models

Hiren D. Patel; Sandeep K. Shukla

As SystemC gains popularity as a modeling language of choice for system-on-chip (SoC) designs, heterogeneous modeling in SystemC and efficient simulation become increasingly important. However, in the current reference implementation, all SystemC models are simulated through a nondeterministic discrete-event (DE) simulation kernel that schedules events at run time mimicking other models of computation (MoCs) using DE, which may get cumbersome. This sometimes results in too many delta cycles hindering the simulation performance of the model. SystemC also uses this simulation kernel as the target simulation engine. This makes it difficult to express different MoCs naturally in SystemC. In an SoC model, different components may need to be naturally expressible in different MoCs. These components may be amenable to static scheduling-based simulation or other presimulation optimization techniques. The goal is to create a simulation framework for heterogeneous SystemC models and to gain efficiency and ease of use within the framework of SystemC reference implementation. In this paper, a synchronous data flow (SDF) kernel extension for SystemC is introduced. Experimental results showing improvement in simulation time are also presented.

asia and south pacific design automation conference | 2008

Exploring power management in multi-core systems

Reinaldo A. Bergamaschi; Guoling Han; Alper Buyuktosunoglu; Hiren D. Patel; Indira Nair; Gero Dittmann; Geert Janssen; Nagu R. Dhanwada; Zhigang Hu; Pradip Bose; John A. Darringer

Power dissipation has become a critical design metric in microprocessor-based system design. In a multi-core system, running multiple applications, power and performance can be dynamically traded off using an integrated power management (PM) unit. This PM unit monitors the performance and power of each core and dynamically adjusts the individual voltages and frequencies in order to maximize system performance under a given power budget (usually set by the operating system). This paper presents a performance and power analysis methodology, featuring a simulation model for multi-core systems that can be easily reconfigured for different scenarios and a PM infrastructure for the exploration and analysis of PM algorithms. Two algorithms have been implemented: one for discrete and one for continuous power modes based on non-linear programming. Extensive experiments are reported, illustrating the effect of power management both at the core and the chip level.

asia and south pacific design automation conference | 2012

Parallel simulation of mixed-abstraction SystemC models on GPUs and multicore CPUs

Rohit Sinha; Aayush Prakash; Hiren D. Patel

This work presents a methodology that parallelizes the simulation of mixed-abstraction level SystemC models across multicore CPUs, and graphics processing units (GPUs) for improved simulation performance. Given a SystemC model, we partition it into processes suitable for GPU execution and CPU execution. We convert the processes identified for GPU execution into GPU kernels with additional SystemC wrapper processes that invoke these kernels. The wrappers enable seamless communication of events in all directions between the GPUs and CPUs. We alter the OSCI SystemC simulation kernel to allow parallel execution of processes. Hence, we co-simulate in parallel, the SystemC processes on multiple CPUs, and the GPU kernels on the GPUs; exploit both the CPUs, and GPUs for faster simulation. We experiment with synthetic benchmarks and a set-top box case study.

international conference on computer design | 2009

A disruptive computer design idea: Architectures with repeatable timing

Stephen A. Edwards; Sungjun Kim; Edward A. Lee; Isaac Liu; Hiren D. Patel; Martin Schoeberl

This paper argues that repeatable timing is more important and more achievable than predictable timing. It describes microarchitecture approaches to pipelining and memory hierarchy that deliver repeatable timing and promise comparable or better performance compared to established techniques. Specifically, threads are interleaved in a pipeline to eliminate pipeline hazards, and a hierarchical memory architecture is outlined that hides memory latencies.

real time technology and applications symposium | 2015

A framework for scheduling DRAM memory accesses for multi-core mixed-time critical systems

Mohamed Hassan; Hiren D. Patel; Rodolfo Pellizzoni

Mixed-time critical systems are real-time systems that accommodate both hard real-time (HRT) and soft realtime (SRT) tasks. HRT tasks mandate a gurantee on the worstcase latency, while SRT tasks have average-case bandwidth (BW) demands. Memory requests in mixed-time critical systems usually have different transaction sizes based on whether the issuer task is HRT or SRT. For example, HRT tasks often issue requests with a cache line size. On the other side, SRT tasks may issue requests with a size of KBs. Requests from multimedia cores, cores controlling network interfaces and direct memory accesses (DMAs) are obvious examples of these large-size requests. Based on these observations, we promote in this work a new approach to schedule memory requests. This approach retains locality within large-size requests to minimize the worst-case latency, while maintaining the average-case BW as high as required. To achieve this target, we introduce a novel and compact time-division-multiplexing scheduler that is adequate for mixed-time critical systems. We also present a novel framework that constructs optimal offchip DRAM memory controller schedules for multi-core mixedtime critical systems. These schedules are loaded to the memory controller during boot-time. Based on the proposed schedule, we provide a detailed static analysis that guarantees predictability. We compare the proposed controller against state-of-the-art realtime memory controllers using synthetic experiments as well as a practical use-case from multimedia systems.

design automation conference | 2007

Model-driven validation of SystemC designs

Hiren D. Patel; Sandeep K. Shukla

Functional test generation for dynamic validation of current system level designs is a challenging task. Manual test writing or automated random test generation techniques are often used for such validation practices. However, directing tests to particular reachable states of a SystemC model is often difficult, especially when these models are large and complex. In this work, we present a model-driven methodology for generating directed tests that take the SystemC model under validation to specific reachable states. This allows the validation to uncover very specific scenarios which lead to different corner cases. Our formal modeling is done entirely within the Microsoft SpecExplorer tool to describe the specification of the system under validation in the notation of AsmL. We also exploit SpecExplorers abilities for state space exploration for our test generations, and its APIs for connecting the model to implementation programs to drive the validation of SystemC models with the generated test cases.

great lakes symposium on vlsi | 2004