Is this you? Create Your Porfile

Aryabartta Sahu

Indian Institute of Technology Guwahati

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Aryabartta Sahu is active.

Explore More

Publication

Featured researches published by Aryabartta Sahu.

Computing | 2017

Scheduling chained multiprocessor tasks onto large multiprocessor system

Tarun K. Agrawal; Aryabartta Sahu; Manojit Ghose; R. Sharma

In this paper, we proposed an effective approach for scheduling of multiprocessor unit time tasks with chain precedence on to large multiprocessor system. In this work, we considered splitable and non-splitable multiprocessor tasks, which is a new and interesting dimension to the generalized scheduling problem. The proposed longest chain maximum processor scheduling algorithm is proved to be optimal for uniform chains and monotone (non-increasing/non-decreasing) chains for both splitable and non-splitable multiprocessor unit time tasks chain. Scheduling arbitrary chains of non-splitable multiprocessor unit time tasks is proved to be NP-complete problem. But scheduling arbitrary chains of splitable multiprocessor unit time tasks is still an open problem to be proved whether it is NP-complete or can be solved in polynomial time. We have used three heuristics (a) maximum criticality first, (b) longest chain maximum criticality first and (c) longest chain maximum processor first for scheduling of arbitrary chains. We have also compared the performance of all three scheduling heuristics and found out that the proposed longest chain maximum processor first performs better in most of the cases. Also we have evaluated the performance of the mentioned heuristics by scheduling scientific work-flows on real multi-processor server platform and analyzed power and performance trade-off of the same scheduling policies.

ieee india conference | 2014

Scheduling of multi-phase applications on to mesh multicore architecture

Desai Suresh Kumar Amrutlal; Aryabartta Sahu

In this paper, we proposed a new approach for off-line scheduling and binding of multi-phase applications onto multicore where cores are connected using mesh topology. Multi-phase application can be easily modeled as chain of multiprocessor task. Our scheduling strategy uses critical tasks first heuristic along with dynamic programming to schedule the chain of multiprocessor task on to multiprocessor system. We modeled binding (phases to processors) problem in information theoretic model, geometric model and sequence alignment model. Finally using chain creation and efficient chain mapping strategy (based on mixed of all three model) to bind scheduled phases on to processor grid to reduce overall data movement overhead. Experiment results shown that our scheduling strategy takes up to 31% less makespan time than the simple critical first strategy to schedule all applications. Binding strategy reduces up to 49% data movement overhead compared to naive and others binding strategies.

Journal of Electronic Testing | 2018

Automation of Test Program Synthesis for Processor Post-silicon Validation

Vasudevan Madampu Suryasarman; Santosh Biswas; Aryabartta Sahu

Software-based self-testing (SBST) is introduced for at-speed testing of processors, which is difficult with any of the external testing techniques. Evolutionary approaches are used for the automatic synthesis of SBST programs. However, a number of hard-to-detect faults remain unidentified by these autogenerated test programs. Also, these approaches have considered fault models which have low correlation with the gate-level fault models. This paper presents a greed-based strategy, where the instruction sequences that detect the freshly identified faults are preserved throughout the evolutionary process to identify the hard-to-test faults of the processor. Subsequently, the overall coverage is also improved. A selection probability is estimated from the testability properties of the processor components and assigned to every instruction to accelerate the test synthesis. The range of performance and scalability are comprehensively evaluated on a configurable MIPS processor and a full-fledged 7-stage pipeline SPARC V8 Leon3 soft processor using behavioral fault models. The efficacy of our approach was explained by demonstrating the correlation between behavioral faults and gate-level faults of MIPS processor for the proposed scheme. Experimental results show that improved coverages of 96.32% for the MIPS processor and 95.8% for the Leon3 processor are achieved in comparison with the conventional methods, which have about 90% coverage on the average.

parallel and distributed computing: applications and technologies | 2016

Energy Efficient Scheduling of Real Time Tasks on Large Systems

Manojit Ghose; Aryabartta Sahu; Sushanta Karmakar

High processing capabilities of todays large systems are also used for real time applications, where executing tasks before their deadline is essential. On the other hand, with increase in the processing capability, energy consumption also increases for such systems. Thus energy efficient execution of real time tasks in such large systems has found to be promising research area in recent time. Scheduling tasks in such large systems using only low level power construct like DVFS is not efficient. In this paper, we have exploited the power consumption pattern of the recent commercial processors and derived a simple power model with a higher granularity for systems have large number of processor with each processor having multi-threading feature. We have then proposed an energy efficient scheduling technique namely, smart allocation policy for executing a set of aperiodic independent real time tasks on large system such that no task misses it deadline. We have analyzed the instantaneous power consumption and the overall energy consumption of the proposed policy along with other five baseline policies for a wide variety of synthetic data sets and real trace data. As execution time of tasks has a significant impact on scheduling and on the overall performance of the system, we have considered six different execution time models of task for our experiment. Experimental evaluation reveals that our proposed policy performs significantly better than baseline policies for all the variations of synthetic data and for real trace data.

ieee india conference | 2016

Energy efficient online scheduling of aperiodic real time task on large multi-threaded multiprocessor systems

Manojit Ghose; Aryabartta Sahu; Sushanta Karmakar

In recent time, reduction of energy consumption has become an important issue as compared to minimizing execution time, specially in large multi-threaded multiprocessor systems where compute capability is sufficiently high. In such large systems, energy aware scheduling using only low level power constructs like DVFS technique may not be suitable and thus designing energy efficient scheduling techniques becomes essential which use power constructs at a higher granularity. In this paper, we have derived a simple power model designed at a higher granularity for such large systems having multi-threaded processors. We have proposed an online task scheduling policy namely, smart allocation policy for scheduling aperiodic real time tasks onto large multi-threaded multiprocessor systems to reduce overall energy consumption of the system without missing deadline of any task. We have analyzed the instantaneous power consumption and the overall energy consumption of the proposed task allocation policy along with other five baseline policies for a wide variety of synthetic data sets and real trace data. Experimental results show that our proposed policy achieves an average energy reduction of 60% (maximum up to 92%) for synthetic data set and 30% (maximum up to 45%) for real data sets as compared to baseline policies.

design, automation, and test in europe | 2016

Thermal aware scheduling and mapping of multiphase applications onto chip multiprocessor

Aryabartta Sahu

Thermal hot spot and high temperature gradient degrades the reliability and performance of chip multiprocessor. This is an important issue in the current days high transistor density chip multiprocessor. In this paper, we explored the benefits of different temperature aware scheduling and mapping approaches of applications onto chip multiprocessor to reduce the peak temperature. As most applications run time exhibit phase wise behavior, we have exploited the run time phase wise power consumption behavior of the applications to schedule and map the applications on to multicore chip to reduce peak temperature. We have evaluated five scheduling approaches (critical path, modified critical path, energy capped critical path, naive load balancing, and task partitioning and scheduling (TPS)) and five mapping approaches (random, greedy, row-col, checker board and boundary fix checker board) for both synthetic data and real benchmarks on assumed 8 × 8 chip multiprocessor. We have taken benefit of both (a) optimal scheduling of tree or chain of unit time tasks on multiprocessor using critical path heuristics and (b) phase wise behavior of applications. Result shows that greedy based mapping approach perform badly as compared to simple low overhead (without incurring extra cost of temperature sensing or prediction) location exchange based approaches when the effect of temperature of neighbor processors is significant. Boundary fix checker board mapping approach achieves up to 40% reduction in peak temperature as compared to costly greedy mapping approach. Also our results shows critical path based scheduling in combination with location based mapping can reduce peak temperature of chip significantly without much increasing the execution time in executing phase wise applications on chip multiprocessor.

parallel and distributed computing: applications and technologies | 2014

Online Scheduling of Applications on 3D Stacked Large Chip Multiprocessor

Bhoopendra Kumar; Aryabartta Sahu

Performance of 3D stacked memory is impressive in multicore system. In three dimensional stacked large chip multiprocessor (3D LCMP), memory and memory network are integrated on top of the processors and processor network. In this paper, we have proposed an online strategy for mapping applications tasks and data onto 3D LCMP platforms. To meet performance constraints every application demand a set of resources (may be number of processor and amount of memory). An important criteria in allocation is to allocate all required resources as near as possible to reduce the communication overhead. We have proposed approximate nearest approach to allocate resources of a layer with respect to other resource layer (either processor or memory). In our work, we have compared three different strategies to allocate resources to application: in first one processor allocation is preferred over memory allocation, in second one memory allocation is preferred over processor and in third one priority is set depend on requirement of applications. Our experimental analysis shows 21% in average and up to 30% improvement over state of art one layer resource allocation. On demand priority based resource allocation improve 26% over simple processor or memory priority based resource allocation.

parallel and distributed computing: applications and technologies | 2014

Benchmarking and Analysis of Variations of Work Stealing Scheduler on Clustered System

Saurav Kumar; Aryabartta Sahu

Classical work stealing is an efficient dynamic load-balancing technique in shared memory multiprocessor or multicore system. But the performance of the same classical work scheduling on cluster chip multicore is not appreciable. So modification to this is necessary to improve performance. In this paper, we have discussed many earlier proposed modifications, and also proposed some simplistic modifications to suite targeted clustered environment. We have described a methodology to evaluate all the variations of work stealing analytically and experimentally on multiprocessor simulator and on real platform. Our methodology of evaluation include designing of novel parametric synthetic benchmark, which can be used to mimic behavior (or profile) of many real life benchmarks. The designed synthetic benchmark caters a wide range of application profiles to evaluate the design space of both variations of work stealing algorithms and clustered chip multiprocessor. In this work, we found that if the number of available parallelism of the targeted application is higher and data sharing between tasks is high then one of the proposed modification of work stealing (probabilistic based victim search and threshold on size of migratable task) outperform the rest of the modifications.

parallel and distributed computing: applications and technologies | 2014

Comparison of Binding Approaches of Scheduled Multiphase Application onto Linear Multicore Architecture

Sahil Kumar; Nitesh Singal; Aryabartta Sahu

As almost all applications run-time characteristics exhibit time varying phase behavior. So scheduling and binding strategy considering this behavior of applications plays an important role in achieving high throughput and less power consumption. In this paper, we have considered binding of already scheduled multiphase application on to linear multicore architecture. This approaches bind the scheduled applications on nearby cores and hence reduces the overall data movement. We have modeled over all data communication overhead of application on a linear architecture and use this model in binding. Also we have proposed and evaluated four different approaches for binding the multi-phase applications on linear multicore architecture. The proposed approach are (a) random iterative refinement, (b) biggest block left-right approach (c) biggest block center-center approach and (d) hierarchical binding using perfect minimum cost matching. Result shows that hierarchical binding using minimum cost perfect matching based approach outperform rest of the approaches.

ieee india conference | 2014

DDGSim: GPU based simulator for large multicore with bufferless NoC

Navin Kumar; Aryabartta Sahu

In large scale chip multicore, last level cache management and core interconnection network play important roles in performance and power consumption. And in large scale chip multicore, mesh interconnect is used widely due to scalability and simplicity of design. As interconnection network occupied significant area and consumes significant percent of system power, bufferless network is an appealing alternative design to reduce power consumption and hardware cost. We have designed and implemented a simulator for simulation of distributed cache management of large chip multicore where cores are connected using bufferless interconnection network. Also, we have redesigned and implemented the DDGSim, which is a GPU compatible parallel version of the same simulator using CUDA programming model. We have simulated target large chip multicore with up to 43,000 cores and achieved up to 25 times speedup on NVIDIA GeForce GTX 690 GPU over serial simulation.

Explore More