Karthi Duraisamy | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Karthi Duraisamy is active.

Explore More

Publication

Featured researches published by Karthi Duraisamy.

international symposium on quality electronic design | 2015

Enhancing performance of wireless NoCs with distributed MAC protocols

Karthi Duraisamy; Ryan Gary Kim; Partha Pratim Pande

Wireless NoC is an emerging paradigm to design high-bandwidth and energy-efficient communication backbones for massive multicore chips. The achievable performance of this type of on-chip interconnect infrastructure depends on the efficiency of the Media Access Control (MAC) protocol that arbitrates between the competing wireless nodes. In this work we propose the design of a distributed MAC protocol suitable for wireless NoC architectures. Compared to the widely used token passing scheme, a distributed MAC protocol improves scalability, provides better performance and lower overall energy dissipation. Depending on the traffic pattern, the proposed distributed MAC provides up to 23% improvement in energy delay product (EDP) when compared to the existing token passing scheme.

compilers, architecture, and synthesis for embedded systems | 2016

Hybrid network-on-chip architectures for accelerating deep learning kernels on heterogeneous manycore platforms

Wonje Choi; Karthi Duraisamy; Ryan Gary Kim; Janardhan Rao Doppa; Partha Pratim Pande; Radu Marculescu; Diana Marculescu

In recent years, designing specialized manycore heterogeneous architectures for deep learning kernels has become an area of great interest. However, the typical on-chip communication infrastructures employed on conventional manycore platforms are unable to handle both CPU and GPU communication requirements efficiently. Hence, in this paper, our aim is to enhance the performance of heterogeneous manycore architectures through the design of a hybrid NoC consisting of both wireline and wireless links. To this end, we specifically target the resource-intensive backpropagation algorithm commonly used as the training method in deep learning. For backpropagation, the proposed hybrid NoC achieves 1.9× reduction in network latency and improves the network throughput by a factor of 2 with respect to a highly optimized mesh NoC. These network level improvements translate into 25% savings in full system energy-delay-product (EDP). This demonstrates the capability of the proposed hybrid and heterogeneous manycore architecture in accelerating deep learning kernels in an energy-efficient manner.

IEEE Transactions on Very Large Scale Integration Systems | 2017

Multicast-Aware High-Performance Wireless Network-on-Chip Architectures

Karthi Duraisamy; Yuankun Xue; Paul Bogdan; Partha Pratim Pande

Today’s multiprocessor platforms employ the network-on-chip (NoC) architecture as the preferable communication backbone. Conventional NoCs are designed predominantly for unicast data exchanges. In such NoCs, the multicast traffic is generally handled by converting each multicast message to multiple unicast transmissions. Hence, applications dominated by multicast traffic experience high queuing latencies and significant performance penalties when running on systems designed with unicast-based NoC architectures. Various multicast mechanisms such as XY-tree multicast and path multicast have already been proposed to enhance the performance of the traditional wireline mesh NoC incorporating multicast traffic. However, even with such added features, the multihop nature of the wireline mesh NoC leads to high network latencies and thus limits the achievable system performance. In this paper, to sustain the high-bandwidth and high-throughput requirements of emerging applications, we propose the design of a wireless NoC (WiNoC) architecture incorporating necessary multicast support. By integrating congestion-aware multicast routing with network coding, the WiNoC is able to efficiently handle heavy multicast injections. For applications running with a broadcast-heavy Hammer cache coherence protocol, the proposed multicast-aware WiNoC achieves an average of 47% reduction in message latency compared with the XY-tree-based multicast-aware mesh NoC. This network level improvement translates into a 26% saving in full-system energy delay product.

ACM Transactions in Embedded Computing Systems | 2016

High-Performance and Energy-Efficient Network-on-Chip Architectures for Graph Analytics

Karthi Duraisamy; Hao Lu; Partha Pratim Pande; Ananth Kalyanaraman

With its applicability spanning numerous data-driven fields, the implementation of graph analytics on multicore platforms is gaining momentum. One of the most important components of a multicore chip is its communication backbone. Due to inherent irregularities in data movements manifested by graph-based applications, it is essential to design efficient on-chip interconnection architectures for multicore chips performing graph analytics. In this article, we present a detailed analysis of the traffic patterns generated by graph-based applications when mapped to multicore chips. Based on this analysis, we explore the design-space for the Network-on-Chip (NoC) architecture to enable an efficient implementation of graph analytics. We principally consider three types of NoC architectures, viz., traditional mesh, small-world, and high-radix networks. We demonstrate that the small-world-network-enabled wireless NoC (WiNoC) is the most suitable platform for executing the considered graph applications. The WiNoC achieves an average of 38% and 18% full-system Energy Delay Product savings compared to wireline-mesh and high-radix NoCs, respectively.

IEEE Transactions on Very Large Scale Integration Systems | 2016

Network-on-Chip-Enabled Multicore Platforms for Parallel Model Predictive Control

Xian Li; Karthi Duraisamy; Paul Bogdan; Turbo Majumder; Partha Pratim Pande

Internet-of-Things architecture aims to provide smart connectivity not only with existing computers, but also with new context-aware computing resources, extending soon beyond von Neumann devices for the purpose of mining, prediction, and control of cyber and physical components. These cyber-physical systems (CPSs) not only lead to the accumulation of large amounts of data that can be used to build comprehensive mathematical models, but also raise the quest for real-time analysis and control in diverse application domains, such as environment, healthcare, avionics, smart interconnected automobiles, and smart buildings. Endowing the CPS with a higher degree of distributed smartness and cognition (adaptation) to process massive amounts of data requires efficient control modules. In addition, the prohibitive nature of power consumption, data movement, and memory bandwidth issues calls for a shift of processing the decision-making strategies from within large supercomputing centers closer to the actual sensing site via many distributed networks-on-chip (NoCs)-based multicore platforms. Toward this end, in this paper, we propose an efficient NoC-based multicore architecture capable of solving large-scale nonlinear model predictive control (NMPC) problems. By carefully analyzing the spatiotemporal workload characteristics of the NMPC problems, we propose the design of an efficient NoC architecture. Our proposed NoC architecture achieves up to 29% improvement in latency and 28% improvement in energy dissipation over the conventional mesh NoC-based counterpart.

compilers, architecture, and synthesis for embedded systems | 2015

High performance and energy efficient wireless NoC-enabled multicore architectures for graph analytics

Karthi Duraisamy; Hao Lu; Partha Pratim Pande; Ananth Kalyanaraman

With its applicability spanning numerous data-driven fields, the implementation of graph analytics on multicore platforms is gaining momentum. The most important component of a multicore chip is its communication backbone. Due to the inherent irregularities in data movements manifested by graph based applications, it is essential to design an efficient on-chip interconnect for multicore chips performing graph analytics. In this paper we present a detailed analysis of the traffic patterns generated by graph-based applications when mapped to multicore chips. Based on this analysis, we present the design of wireless Network-on-Chip (WiNoC)-enabled multicore platforms for efficient implementation of graph analytics. When compared to traditional wireline mesh architecture, WiNoC enables a faster data exchange among the computing cores, leading to reduced execution times and lower energy dissipation. We demonstrate that depending on the particular graph application, the WiNoC reduces the execution time up to 35% and lowers the energy dissipation up to 40% when compared to traditional wireline mesh.

Foundations and Trends in Electronic Design Automation | 2016

Fast Uncovering of Graph Communities on a Chip: Toward Scalable Community Detection on Multicore and Manycore Platforms

Ananth Kalyanaraman; Mahantesh Halappanavar; Daniel G. Chavarría-Miranda; Hao Lu; Karthi Duraisamy; Partha Pratim Pande

Graph representations are pervasive in scientific and social computing.They serve as vital tools to model the interplay among differentinteracting entities.In this paper, we visit the problem of community detection, which isone of the most widely used graph operations toward scientific discovery.Community detection refers to the process of identifying tightlyknitsubgroups of vertices in a large graph. These sub-groups or communitiesrepresent vertices that are tied together through commonstructure or function. Identification of communities could help in understandingthe modular organization of complex networks. However,owing to large data sizes and high computational costs, performingcommunity detection at scale has become increasingly challenging.Here, we present a detailed review and analysis of some of the leadingcomputational methods and implementations developed for executingcommunity detection on modern day multicore and manycorearchitectures. Our goals are to: a define the problem of community detectionand highlight its scientific significance; b relate to challengesin parallelizing the operation on modern day architectures; c providea detailed report and logical organization of the approaches that havebeen designed for various architectures; and d finally, provide insightsinto the strengths and suitability of different architectures for communitydetection, and a preview into the future trends of the area. It is ourhope that this detailed treatment of community detection on parallelarchitectures can serve as an exemplar study for extending the applicationof modern day multicore and manycore architectures to othercomplex graph applications.

IEEE Transactions on Computers | 2017

A Reconfigurable Wireless NoC for Large Scale Microbiome Community Analysis

Xian Li; Karthi Duraisamy; Joe Baylon; Turbo Majumder; Guopeng Wei; Paul Bogdan; Deukhyoun Heo; Partha Pratim Pande

Understanding the role of competition and cooperation among multiple interacting species of microorganisms that constitute the microbiome and decipher how they enforce homeostasis or trigger diseases requires the development of multi-scale computational models capable of capturing both intra-cell processing (i.e., gene-to-protein interactions) and inter-cell interactions. The multi-scale interdependency that governs the interactions from genes to proteins within a cell and from molecular messengers to cells to microbial communities within the environment raises numerous computation and communication challenges. Internal cell processing cannot be simulated without knowledge of the surroundings. Similarly, cell-cell communication cannot be fully abstracted without stated of internal processing and diffusion effects of molecular messengers. To address the compute- and communication-intensive nature of modeling microbial communities, in this paper, we propose a novel reconfigurable NoC-based manycore architecture capable of simulating a large scale microbial community. The reconfiguration of the NoC topology is achieved through the fractal analysis of NoC traffic and use of the on-chip wireless interfaces. More precisely, we analyze the computational and communication workloads and exploit the observed fractal characteristics for proposing a mathematical strategy for NoC reconfiguration. Experimental results demonstrate that the proposed NoC architecture achieves 56.6 and 62.8 percent improvement in energy delay product over the conventional wireline mesh and flatten butterfly-based high radix NoC architectures, respectively.

design automation conference | 2015

Energy efficient MapReduce with VFI-enabled multicore platforms

Karthi Duraisamy; Ryan Gary Kim; Wonje Choi; Guangshuo Liu; Partha Pratim Pande; Radu Marculescu; Diana Marculescu

In an era when power constraints and data movement are proving to be significant barriers for high-end computing, multicore architectures offer a low-power and highly scalable platform suitable for both data- and compute-intensive applications. MapReduce is a popular framework to facilitate the management and development of big-data workloads. In this work, we demonstrate that by using a wireless NoC-enabled Voltage Frequency Island (VFI)-based multicore platform it is possible to enhance the energy efficiency of MapReduce implementations without paying significant execution time penalties. Our experimental results show that for the benchmarks considered, the designed VFI system can achieve an average of 33.7% energy-delay product (EDP) savings over the standard baseline non-VFI mesh-based system while paying a maximum of 3.22% execution time penalty.

IEEE Transactions on Computers | 2018

On-Chip Communication Network for Efficient Training of Deep Convolutional Networks on Heterogeneous Manycore Systems

Wonje Choi; Karthi Duraisamy; Ryan Gary Kim; Janardhan Rao Doppa; Partha Pratim Pande; Diana Marculescu; Radu Marculescu

Convolutional Neural Networks (CNNs) have shown a great deal of success in diverse application domains including computer vision, speech recognition, and natural language processing. However, as the size of datasets and the depth of neural network architectures continue to grow, it is imperative to design high-performance and energy-efficient computing hardware for training CNNs. In this paper, we consider the problem of designing specialized CPU-GPU based heterogeneous manycore systems for energy-efficient training of CNNs. It has already been shown that the typical on-chip communication infrastructures employed in conventional CPU-GPU based heterogeneous manycore platforms are unable to handle both CPU and GPU communication requirements efficiently. To address this issue, we first analyze the on-chip traffic patterns that arise from the computational processes associated with training two deep CNN architectures, namely, LeNet and CDBNet, to perform image classification. By leveraging this knowledge, we design a hybrid Network-on-Chip (NoC) architecture, which consists of both wireline and wireless links, to improve the performance of CPU-GPU based heterogeneous manycore platforms running the above-mentioned CNN training workloads. The proposed NoC achieves 1.8× reduction in network latency and improves the network throughput by a factor of 2.2 for training CNNs, when compared to a highly-optimized wireline mesh NoC. For the considered CNN workloads, these network-level improvements translate into 25 percent savings in full-system energy-delay-product (EDP). This demonstrates that the proposed hybrid NoC for heterogeneous manycore architectures is capable of significantly accelerating training of CNNs while remaining energy-efficient.

Explore More