Is this you? Create Your Porfile

Anuj Pathania

Karlsruhe Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Anuj Pathania is active.

Explore More

Publication

Featured researches published by Anuj Pathania.

design automation conference | 2014

Integrated CPU-GPU Power Management for 3D Mobile Games

Anuj Pathania; Qing Jiao; Alok Prakash; Tulika Mitra

Modern system-on-chips (SoC) integrate CPU and GPU for immersive 3D gaming experience. These games require both the CPU and GPU to work in tandem, resulting in high power consumption. In the past, Dynamic Voltage Frequency Scaling (DVFS) has been exploited for embedded CPU to save power during game play; but it is only recently that embedded GPUs have attained DVFS capabilities that provide additional opportunities. In this paper, we propose a power management approach that takes a unified view of the CPU-GPU DVFS, resulting in reduced power consumption for latest 3D mobile games compared to an independent CPU-GPU power management approach.

architectural support for programming languages and operating systems | 2014

Price theory based power management for heterogeneous multi-cores

Thannirmalai Somu Muthukaruppan; Anuj Pathania; Tulika Mitra

Heterogeneous multi-cores that integrate cores with different power performance characteristics are promising alternatives to homogeneous systems in energy- and thermally constrained environments. However, the heterogeneity imposes significant challenges to power-aware scheduling. We present a price theory-based dynamic power management framework for heterogeneous multi-cores that co-ordinates various energy savings opportunities, such as dynamic voltage/frequency scaling, load balancing, and task migration in tandem, to achieve the best power-performance characteristics. Unlike existing centralized power management frameworks, ours is distributed and hence scalable with minimal runtime overhead. We design and implement the framework within Linux operating system on ARM big.LITTLE heterogeneous multi-core platform. Experimentalevaluation confirms the advantages of our approach compared to the state-of-the-art techniques for power management in heterogeneous multi-cores.

design automation conference | 2015

Power-Performance Modelling of Mobile Gaming Workloads on Heterogeneous MPSoCs

Anuj Pathania; Alexandru Eugen Irimiea; Alok Prakash; Tulika Mitra

Games have emerged as one of the most popular applications on mobile platforms. Recent platforms are now equipped with Heterogeneous Multiprocessor System-on-Chips (HMPSoCs) tightly integrating CPUs and GPUs on the same chip. This configuration enables high-end gaming on the platform but at the cost of high power consumption rapidly draining the underlying limited-capacity battery. The HMPSoCs are capable of independent Dynamic Voltage and Frequency Scaling (DVFS) for CPUs and GPUs for reduction in platforms power consumption. State-of-the-art power manager for mobile games on HMPSoCs oversimplifies the complex CPU-GPU interplay. In this paper, we develop power-performance models predicting the impact of DVFS on mobile gaming workloads. Based on our models, we propose an efficient power management strategy and implement it on an Odroid-XU+E mobile platform. Measurements on the platform show that our power manager provides on average 20% increase in performance per watt when compared to the state-of-the-art.

IEEE Transactions on Computers | 2017

Power Density-Aware Resource Management for Heterogeneous Tiled Multicores

Heba Khdr; Santiago Pagani; Ericles Rodrigues Sousa; Vahid Lari; Anuj Pathania; Frank Hannig; Muhammad Shafique; Jürgen Teich; Jörg Henkel

Increasing power densities have led to the dark silicon era, for which heterogeneous multicores with different power and performance characteristics are promising architectures. This paper focuses on maximizing the overall system performance under a critical temperature constraint for heterogeneous tiled multicores, where all cores or accelerators inside a tile share the same voltage and frequency levels. For such architectures, we present a resource management technique that introduces power density as a novel system level constraint, in order to avoid thermal violations. The proposed technique then assigns applications to tiles by choosing their degree of parallelism and the voltage/frequency levels of each tile, such that the power density constraint is satisfied. Moreover, our technique provides runtime adaptation of the power density constraint according to the characteristics of the executed applications, and reacting to workload changes at runtime. Thus, the available thermal headroom is exploited to maximize the overall system performance.

international symposium on low power electronics and design | 2015

Power management for mobile games on asymmetric multi-cores

Anuj Pathania; Santiago Pagani; Muhammad Shafique; Jörg Henkel

Gaming on mobile platforms is highly power hungry and rapidly drains the limited-capacity battery. In multi-threaded gaming, each thread has different processing requirements and even a single slow thread may lead to Quality of Service (QoS) violations. Further, modern mobile platforms are equipped with asymmetric multi-core processors, so that different cores exhibit diverse power and performance properties. These asymmetric cores along with different Dynamic Power Management (DPM) techniques enable a high degree of power efficiency in mobile gaming. The default Linux power manager (i.e. “Governor”) of asymmetric multi-cores performs power-wise inefficient for mobile games as it over allocates resources for processing threads by being oblivious to the QoS. The state-of-the-art Governor for mobile gaming does not account for multi-threaded gaming workloads, which are mainstream in mobile gaming. In this work, we present a power-performance characterization of multi-threaded mobile games by executing them on a real-world mobile platform with an asymmetric multi-core. This analysis is leveraged to propose a QoS-aware Governor running a lightweight online heuristic that holistically accounts for thread-to-core mapping and DPM. This solution, when integrated into the platforms Operating System (OS), provides 12% improved power efficiency on average.

design, automation, and test in europe | 2016

Distributed fair scheduling for many-cores

Anuj Pathania; Vanchinathan Venkataramani; Muhammad Shafique; Tulika Mitra; Jörg Henkel

Transition of embedded processors from multi-cores to many-cores continues unabated. Many-cores execute tens of tasks in parallel and in some contexts, it is crucial that the processing cores are distributed fairly amongst the tasks. Traditional queue-based centralized fair schedulers designed for multi-cores will have excessive overhead on many-cores due to the enlarged optimization search-space. Further, the processing requirements of executing tasks may vary under different phases of their execution necessitating lightweight dynamic fair schedulers to regularly perform partial reallocation of the cores. We introduce a distributed dynamic fair scheduler that can scale up with the increase in number of cores because it disburses the processing overhead of scheduling amongst all the cores. Based on observations made for task executions on many-cores, we propose an optimal solution under certain constraints for the fair scheduling problem, which in general is NP-Hard.

design automation conference | 2016

Distributed scheduling for many-cores using cooperative game theory

Anuj Pathania; Vanchinathan Venkataramani; Muhammad Shafique; Tulika Mitra; Jörg Henkel

Many-cores are envisaged to include hundreds of processing cores etched on to a single die and will execute tens of multithreaded tasks in parallel to exploit their massive parallel processing potential. A task can be sped up by assigning it to more than one core. Moreover, processing requirements of tasks are in a constant state of flux and some of the cores assigned to a task entering a low processing requirement phase can be transferred to a task entering high requirement phase, maximizing overall performance of the system. This scheduling problem of partial core reallocations can be solved optimally in polynomial time using a dynamic programming based scheduler. Dynamic programming is an inherently centralized algorithm that uses only one of the available cores for scheduling-related computations and hence is not scalable. In this work, we introduce a distributed scheduler that disburses all scheduling-related computations throughout the many-core allowing it to scale up. We prove that our proposed scheduler is optimal and hence converges to the same solution as the centralized optimal scheduler. Our simulations show that the proposed distributed scheduler can result in 1000x reduction in per-core processing overhead in comparison to the centralized scheduler and hence is more suited for scheduling on many-cores.

design, automation, and test in europe | 2017

Scalable probabilistic power budgeting for many-cores

Anuj Pathania; Heba Khdr; Muhammad Shafique; Tulika Mitra; Jörg Henkel

Many-core processors exhibit hundreds to thousands of cores, which can execute lots of multi-threaded tasks in parallel. Restrictive power dissipation capacity of a many-core prevents all its executing tasks from operating at their peak performance together. Furthermore, the ability of a task to exploit part of the power budget allocated to it depends upon its current execution phase. This mandates careful rationing of the power budget amongst the tasks for full exploitation of the many-core. Past research proposed power budgeting techniques that redistribute power budget amongst tasks based on up-to-date information about their current phases. This phase information needs to be constantly propagated throughout the system and processed, inhibiting scalability. In this work, we propose a novel probabilistic technique for power budgeting which requires no exchange of phase information yet provides mathematical guarantees on judicial use of the TDP. The proposed probabilistic technique reduces the power budgeting overheads by 97.13% in comparison to a non-probabilistic approach, while providing almost equal performance on simulated thousand-core system.

IEEE Transactions on Parallel and Distributed Systems | 2017

Energy Efficiency for Clustered Heterogeneous Multicores

Santiago Pagani; Anuj Pathania; Muhammad Shafique; Jian-Jia Chen; Jörg Henkel

Heterogeneous multicore systems clustered in multiple Voltage Frequency Islands (VFIs) are the next-generation solution for power and energy efficient computing systems. Due to the heterogeneity, the power consumption and execution time of a task changes not only with Dynamic Voltage and Frequency Scaling (DVFS), but also according to the task-to-island assignment, presenting major challenges for power management and energy minimization techniques. This paper focuses on energy minimization of periodic real-time tasks (or performance-constrained tasks) on such systems, in which the cores in an island are homogeneous and share the same voltage and frequency, but different islands have different types and numbers of cores and can be executed at other voltages and frequencies. We present an efficient algorithm to minimize the total energy consumption while satisfying the timing constraints of all tasks. Our technique consists of the coordinated selection of the voltage and frequency levels for each island, together with a task partitioning strategy that considers the energy consumption of the task executing on different islands and at different frequencies, as well as the impact of the frequency and the underlying core architecture to the resulting execution time. Every task is then mapped to the most energy efficient island for the selected voltage and frequency levels, and to a core inside the island such that the workloads of the cores in a VFI are balanced. We experimentally evaluate our technique and compare it to state-of-the-art solutions, resulting in average in 25 percent less energy consumption (and up to 87 percent for some cases), while guaranteeing that all tasks meet their deadlines.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2017

Optimal Greedy Algorithm for Many-Core Scheduling

Anuj Pathania; Vanchinathan Venkatramani; Muhammad Shafique; Tulika Mitra; Jörg Henkel

In this paper, we propose an optimal greedy algorithm for the problem of run-time many-core scheduling. The previously best known centralized optimal algorithm proposed for the problem is based on dynamic programming. A dynamic programming-based scheduler has high overheads which grow fast with increase in both the number of cores in the many-cores as well as number of tasks independently executing on them. We show in this paper that the inherent concavity of extractable instructions per cycle in tasks with increase in number of allocated cores allows for an alternative greedy algorithm. The proposed algorithm significantly reduces the run-time scheduling overheads, while maintaining theoretical optimality. In practice, it reduces the problem solving time 10 000x to provide near-optimal solutions.

Explore More