Devendra Rai | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Devendra Rai is active.

Explore More

Publication

Featured researches published by Devendra Rai.

compilers, architecture, and synthesis for embedded systems | 2012

Scenario-based design flow for mapping streaming applications onto on-chip many-core systems

Lars Schor; Iuliana Bacivarov; Devendra Rai; Hoeseok Yang; Shin-Haeng Kang; Lothar Thiele

The next generation of embedded software has high performance requirements and is increasingly dynamic. Multiple applications are typically sharing the system, running in parallel in different combinations, starting and stopping their individual execution at different moments in time. The different combinations of applications are forming system execution scenarios. In this paper, we present the distributed application layer, a scenario-based design flow for mapping a set of applications onto heterogeneous on-chip many-core systems. Applications are specified as Kahn process networks and the execution scenarios are combined into a finite state machine. Transitions between scenarios are triggered by behavioral events generated by either running applications or the run-time system. A set of optimal mappings are precalculated during design-time analysis. Later, at run-time, hierarchically organized controllers monitor behavioral events, and apply the precalculated mappings when starting new applications. To handle architectural failures, spare cores are allocated at design-time. At run-time, the controllers have the ability to move all processes assigned to a faulty physical core to a spare core. Finally, we apply the proposed design flow to design and optimize a picture-in-picture software.

design, automation, and test in europe | 2011

Worst-case temperature analysis for real-time systems

Devendra Rai; Hoeseok Yang; Iuliana Bacivarov; Jian-Jia Chen; Lothar Thiele

With the evolution of todays semiconductor technology, chip temperature increases rapidly mainly due to the growth in power density. For modern embedded real-time systems, it is crucial to estimate maximal temperatures in order to take mapping or other design decisions to avoid burnout, and still be able to guarantee meeting real-time constraints. This paper provides answers to the question: When work-conserving scheduling algorithms, such as earliest-deadline-first (EDF), rate-monotonie (RM), deadline-monotonic (DM), are applied, what is the worst-case peak temperature of a real-time embedded system under all possible scenarios of task executions? We propose an analytic framework, which considers a general event model based on network and real-time calculus. This analysis framework has the capability to handle a broad range of uncertainties in terms of task execution times, task invocation periods, and jitter in task arrivals. Simulations show that our framework is a cornerstone to design real-time systems that have guarantees on both schedulability and maximal temperatures.

compilers, architecture, and synthesis for embedded systems | 2012

Power agnostic technique for efficient temperature estimation of multicore embedded systems

Devendra Rai; Hoeseok Yang; Iuliana Bacivarov; Lothar Thiele

Temperature plays an increasingly important role in the overall performance and reliability of a computing system. Multi- and many-core systems provide an opportunity to manage the overall temperature profile by cleverly designing the application-to-core mapping and the associated scheduling policies. An uncontrolled temperature profile may lead to an unplanned performance loss, since the system activates protective mechanisms such as voltage and/or frequency scaling to cool itself. Similarly, deep thermal cycles with high frequency lead to severe deterioration in the overall reliability of the system. Design space exploration tools are often used to optimize binding and scheduling choices based on a given set of constraints and objectives, thus motivating the need for fast and accurate temperature estimation techniques. We argue that the currently available techniques are not an ideal fit to design space exploration tools, and suggest a system level technique which is based on application fingerprinting. It does not need any information about the processor floorplan, the physical and thermal structure, or about power consumption. Instead, its temperature estimation is based on a set of application-specific calibration runs and associated temperature measurements using available built-in sensors. We show that a given application possesses a unique thermal signature on the system it executes on, which provides a computationally fast method to calculate accurate temperature traces. Extensive experimental studies show that our technique can estimate temperature on all cores of a system to within

Journal of Systems Architecture | 2016

Dynamic many-process applications on many-tile embedded systems and HPC clusters

Pier Stanislao Paolucci; Andrea Biagioni; Luis Gabriel Murillo; Frédéric Rousseau; Lars Schor; Laura Tosoratto; Iuliana Bacivarov; Robert Lajos Buecs; Clément Deschamps; Ashraf El-Antably; Roberto Ammendola; Nicolas Fournel; Ottorino Frezza; Rainer Leupers; Francesca Lo Cicero; Alessandro Lonardo; Michele Martinelli; Elena Pastorelli; Devendra Rai; Davide Rossetti; Francesco Simula; Lothar Thiele; P. Vicini; Jan Henrik Weinstock

5^{o}C

international symposium on parallel and distributed processing and applications | 2014

EURETILE Design Flow: Dynamic and Fault Tolerant Mapping of Multiple Applications Onto Many-Tile Systems

Lars Schor; Iuliana Bacivarov; Luis Gabriel Murillo; Pier Stanislao Paolucci; Frédéric Rousseau; Ashraf El Antably; Robert Lajos Buecs; Nicolas Fournel; Rainer Leupers; Devendra Rai; Lothar Thiele; Laura Tosoratto; P. Vicini; Jan Henrik Weinstock

, and is three orders of magnitude faster than state of the art numerical simulators like \emph{Hotspot.}

design automation conference | 2013

Distributed stable states for process networks: algorithm, analysis, and experiments on intel SCC

Devendra Rai; Lars Schor; Nikolay Stoimenov; Lothar Thiele

In the next decade, a growing number of scientific and industrial applications will require power-efficient systems providing unprecedented computation, memory, and communication resources. A promising paradigm foresees the use of heterogeneous many-tile architectures. The resulting computing systems are complex: they must be protected against several sources of faults and critical events, and application programmers must be provided with programming paradigms, software environments and debugging tools adequate to manage such complexity. The EURETILE (European Reference Tiled Architecture Experiment) consortium conceived, designed, and implemented: 1- an innovative many-tile, many-process dynamic fault-tolerant programming paradigm and software environment, grounded onto a lightweight operating system generated by an automated software synthesis mechanism that takes into account the architecture and application specificities; 2- a many-tile heterogeneous hardware system, equipped with a high-bandwidth, low-latency, point-to-point 3D-toroidal interconnect. The inter-tile interconnect processor is equipped with an experimental mechanism for systemic fault-awareness; 3- a full-system simulation environment, supported by innovative parallel technologies and equipped with debugging facilities. We also designed and coded a set of application benchmarks representative of requirements of future HPC and Embedded Systems, including: 4- a set of dynamic multimedia applications and 5- a large scale simulator of neural activity and synaptic plasticity. The application benchmarks, compiled through the EURETILE software tool-chain, have been efficiently executed on both the many-tile hardware platform and on the software simulator, up to a complexity of a few hundreds of software processes and hardware cores.

design automation conference | 2014

An Efficient Real Time Fault Detection and Tolerance Framework Validated on the Intel SCC Processor

Devendra Rai; Pengcheng Huang; Nikolay Stoimenov; Lothar Thiele

EURETILE investigates foundational innovations in the design of massively parallel tiled computing systems by introducing a novel parallel programming paradigm and a multi-tile hardware architecture. Each tile includes multiple general-purpose processors, specialized accelerators, and a fault-tolerant distributed network processor, which connects the tile to the inter-tile communication network. This paper focuses on the EURETILE software design flow, which provides a novel programming environment to map multiple dynamic applications onto a many-tile architecture. The elaborated high-level programming model specifies each application as a network of autonomous processes, enabling the automatic generation and optimization of the architecture-specific implementation. Behavioral and architectural dynamism is handled by a hierarchically organized runtime-manager running on top of a lightweight operating system. To evaluate, debug, and profile the generated binaries, a scalable many-tile simulator has been developed. High system dependability is achieved by combining hardware-based fault awareness strategies with software-based fault reactivity strategies. We demonstrate the capability of the design flow to exploit the parallelism of many-tile architectures with various embedded and high performance computing benchmarks targeting the virtual EURETILE platform with up to 192 tiles.

design, automation, and test in europe | 2015

A calibration based thermal modeling technique for complex multicore systems

Devendra Rai; Lothar Thiele

Technology scaling is a common trend in current embedded systems. It has promoted the use of multi-core, multiprocessor, and distributed platforms. Such systems usually require run-time migration of distributed applications between the different nodes of the platform in order to balance the workload or to tolerate faults. Before an application can be migrated, it needs to be brought to a stable state such that restarting the application after migration does not violate its functional correctness. An application in a stable state does not change its context any further, and therefore, stabilization is a prerequisite for any application migration. Process networks are a common model of computation for specifying distributed applications. However, most results on the migration of process networks do not provide an algorithm to put a general process network into a stable state, suitable for migration. This paper proposes a technique which efficiently and correctly brings a process network executing on a distributed system to a known stable state. The correctness of the technique is independent of the temporal characteristics of the system and the topology of the process network. The required modifications of a process network are lightweight and preserve its original functionality. A model characterizing the timing properties of the technique is provided. The feasibility and efficiency of the proposed approach and the respective model are validated with experimental results on Intels SCC platform.

european conference on parallel processing | 2013

Reliable and Efficient Execution of Multiple Streaming Applications on Intel’s SCC Processor

Lars Schor; Devendra Rai; Hoeseok Yang; Iuliana Bacivarov; Lothar Thiele

We present a new framework that efficiently detects and tolerates timing faults in real time systems. Timing faults are observed when the inputs and/or outputs of a given system fail to meet their desired timing properties, such as I/O rates. Most current approaches either rely on heartbeat monitoring which is too restrictive; or on statistical or inexact methods which are not suitable for embedded real time systems. Current approaches based on the abstract real time model of the given application are resource intensive, and may not be suitable for embedded systems. Our framework utilizes active replication, and is based on already existing timing models for real time applications to develop fault detection and tolerance strategies. The approach does not require any timekeeping at runtime, and is efficient in terms of computational resources used. Experiments using three realistic applications on the Intel Baremetal SCC demonstrate the efficiency of our framework, both in memory and computational resources used.

european conference on parallel processing | 2013

Designing Applications with Predictable Runtime Characteristics for the Baremetal Intel SCC

Devendra Rai; Lars Schor; Nikolay Stoimenov; Iuliana Bacivarov; Lothar Thiele

A calibration based method to construct fast and accurate thermal models of the state-of-the-art multicore systems is presented. Such models are usually required during Design Space Exploration (DSE) exercises to evaluate various task-to-core mapping, associated scheduling and processor speed-scaling options for their overall impact on the system temperature. Current approaches require modeling the thermal characteristics of the target processor using numerical simulators, which assume accurate information about several critical parameters (e.g., the processor floorplan). Such parameters are not readily available, forcing the system designers to use time and cost intensive, and possibly error-prone techniques such as using heat maps for reverse-engineering such parameters. Additionally, advanced power and temperature management algorithms commonly found in the state-of-the-art processors must also be accurately modeled. This paper proposes a calibration based method for constructing the complete system thermal model of a target processor without requiring any hard-to-get information such as the detailed processor floorplan or system power traces. Taking an example of a sufficiently complex Intel Xeon 8-core processor, we show that our approach yields an accurate thermal model, which is also lightweight both in terms of memory and compute requirements to be practically feasible for DSE over current processors.

Explore More