Narayan Ranganathan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Narayan Ranganathan is active.

Explore More

Publication

Featured researches published by Narayan Ranganathan.

architectural support for programming languages and operating systems | 1998

An empirical study of decentralized ILP execution models

Narayan Ranganathan; Manoj Franklin

Recent fascination for dynamic scheduling as a means for exploiting instruction-level parallelism has introduced significant interest in the scalability aspects of dynamic scheduling hardware. In order to overcome the scalability problems of centralized hardware schedulers, many decentralized execution models are being proposed and investigated recently. The crux of all these models is to split the instruction window across multiple processing elements (PEs) that do independent, scheduling of instructions. The decentralized execution models proposed so far can be grouped under 3 categories, based on the criterion used for assigning an instruction to a particular PE. They are: (i) execution unit dependence based decentralization (EDD), (ii) control dependence based decentralization (CDD), and (iii) data dependence based decentralization (DDD). This paper investigates the performance aspects of these three decentralization approaches. Using a suite of important benchmarks and realistic system parameters, we examine performance differences resulting from the type of partitioning as well as from specific implementation issues such as the type of PE interconnect.We found that with a ring-type PE interconnect, the DDD approach performs the best when the number of PEs is moderate, and that the CDD approach performs best when the number of PEs is large. The currently used approach---EDD---does not perform well for any configuration. With a realistic crossbar, performance does not increase with the number of PEs for any of the partitioning approaches. The results give insight into the best way to use the transistor budget available for implementing the instruction window.

High Performance Parallelism Pearls#R##N#Volume 2: Multicore and Many-core Programming Approaches | 2016

Visual Search Optimization

Prashanth Thinakaran; Diana Guttman; Mahmut T. Kandemir; Meenakshi Arunachalam; Rahul Khanna; Praveen Yedlapalli; Narayan Ranganathan

This chapter presents an image-matching application that can take advantage of many-core architectures. Different parallelization strategies are explored that can take advantage of inter- and intraimage parallelism. The two main metrics that determine the application performance, tree creation time and search time, were studied in the context of scalability. Important insights obtained from a profiler-based analysis help identify the challenges in scalability of DB threads. The scalability with respect to increasing DBThreads with optimal KD-trees is shown to lead to 5.8× speedup in create time and 2.8× speedup in search time in the case of 120 threads when compared to single-threaded Xeon Phi performance.

Archive | 2013