Turbo Majumder | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Turbo Majumder is active.

Explore More

Publication

Featured researches published by Turbo Majumder.

international symposium on circuits and systems | 2010

Hardware accelerators for biocomputing: A survey

Souradip Sarkar; Turbo Majumder; Ananth Kalyanaraman; Partha Pratim Pande

Computing research has become a vital cog in the machinery required to drive biological discovery. Computing has made possible significant achievements over the last decade, especially in the genomics sector. An emerging area is the investigation of hardware accelerators for speeding up the massive scale of computation needed in large-scale biocomputing applications. Various hardware platforms, such as FPGA, Graphics Processing Unit (GPU), the Cell Broadband Engine (CBE) and multi-core processors are being explored. In this paper, we present a survey of hardware accelerators for biocomputing by choosing a representative set of each.

IEEE Transactions on Computers | 2012

NoC-Based Hardware Accelerator for Breakpoint Phylogeny

Turbo Majumder; Souradip Sarkar; Partha Pratim Pande; Ananth Kalyanaraman

Maximum Parsimony phylogenetic tree reconstruction is based on finding the breakpoint median, given a set of species, and is represented by a bounded edge-weight graph model. This reduces the breakpoint median problem to one of solving multiple instances of the Traveling Salesman Problem (TSP), which is a classical NP-complete problem in graph theory. Exponential time algorithms that apply efficient runtime heuristics, such as branch-and-bound, to dynamically prune the search space are used to solve TSP. In this paper, we present the design and performance evaluation of a network-on-chip (NoC)-based implementation for solving TSP under the bounded edge-weight model, as used in the computation of breakpoint phylogeny. Our approach takes advantage of fine-grain parallelism from the multiple processing elements (PEs) and uses efficient NoC architecture for inter-PE communication. To accelerate the application on hardware, our PE design optimizes a particular lower bound calculation operation which typically tends to be the serial bottleneck in computation of a TSP solution. We also explore two representative NoC architectures-mesh and quad-tree-and show that the latter is more energy-efficient for this application domain. Experimental results show that this new implementation is able to achieve speedups of up to three orders of magnitude over state-of-the-art multithreaded software implementations.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2012

On-Chip Network-Enabled Multicore Platforms Targeting Maximum Likelihood Phylogeny Reconstruction

Turbo Majumder; Michael Edward Borgens; Partha Pratim Pande; Ananth Kalyanaraman

In phylogenetic inference, which aims at finding a phylogenetic tree that best explains the evolutionary relationship among a given set of species, statistical estimation approaches such as maximum likelihood (ML) and Bayesian inference provide more accurate estimates than other nonstatistical approaches. However, the improved quality comes at a higher computational cost, as these approaches, even though heuristic driven, involve optimization over multidimensional real continuous space. The number of possible search trees in ML is at least exponential, thereby making runtimes on even modest-sized datasets to clock up to several million CPU hours. Evaluation of these trees, involving node-level likelihood vector computation and branch-length optimization, can be partitioned into tasks (or kernels), providing the application with the potential to benefit from hardware acceleration. The range of hardware acceleration architectures tried so far offer limited degree of fine-grain parallelism. Network-on-chip (NoC) is an emerging paradigm that can efficiently support integration of massive number of cores on a chip. In this paper, we explore the design and performance evaluation of 2-D and 3-D NoC architectures for RAxML, which is one of the most widely used ML software suites. Specifically, we implement the computation kernels of the top three functions consuming more than 85% of the total software runtime. Simulations show that through appropriate choice of NoC architecture, and novel core design, allocation and placement strategies, our NoC-based implementation can achieve individual function-level speedups of 390x to 847x, speed up the targeted kernels in excess of 6500x, and provide end-to-end runtime reductions up to 5x over state-of-the-art multithreaded software.

design, automation, and test in europe | 2015

NoC-enabled multicore architectures for stochastic analysis of biomolecular reactions

Turbo Majumder; Xian Li; Paul Bogdan; Partha Pratim Pande

Recent medical challenges such as cancer, drug-resistant microbes or diabetes crucially affect human health. To tackle these, modern medicine must analyze molecular interactions and rely on powerful computational platforms for the design and performance evaluation of medical therapies. Towards this end, we propose a Network-on-Chip (NoC)-based multicore platform enabling the efficient analysis of stochastic molecular interactions among biological entities. Our in-depth analysis of the stochastic interactions among biological components and the characterization of their computational and communication requirements allows us to design a high-performance NoC architecture sustaining a throughput of over 1.36E5 events/ms, while consuming only 15 mJ per 1E5 stochastic events. Our proposed NoC-based multicore can offer a throughput improvement of 23% over a regular mesh-based NoC, while consuming 20% less energy.

IEEE Design & Test of Computers | 2014

Wireless NoC Platforms With Dynamic Task Allocation for Maximum Likelihood Phylogeny Reconstruction

Turbo Majumder; Partha Pratim Pande; Ananth Kalyanaraman

Maximum likelihood (ML) phylogeny is an important statistical approach in computational biology that estimates the most likely evolutionary relationship among a given set of species. This paper demonstrates how wireless network-on-chip (WiNoC)-based multicore platforms can be employed to achieve faster time-to-solution ML phylogeny reconstruction.

IEEE Design & Test of Computers | 2014

Hardware Accelerators in Computational Biology: Application, Potential, and Challenges

Turbo Majumder; Partha Pratim Pande; Ananth Kalyanaraman

Computational biology is increasingly relying on hardware accelerators to allow data processing to keep up with the increasing amount of data generated from biology applications. This paper gives an introduction to the area of hardware accelerators for computational biology and a comparative study of a set of biological applications.

symposium on computer architecture and high performance computing | 2011

Accelerating Maximum Likelihood Based Phylogenetic Kernels Using Network-on-Chip

Turbo Majumder; Partha Pratim Pande; Ananth Kalyanaraman

Probability-based approaches for phylogenetic inference, like Maximum Likelihood (ML) and Bayesian Inference, provide the most accurate estimate of evolutionary relationships among species. But they come at a high algorithmic and computational cost. Network-on-chip (NoC), being an emerging paradigm, has not been explored yet to achieve fine-grained parallelism for these applications. In this paper, we present the design and performance evaluation of an NoC architecture for RAxML, which is one of the most widely used ML software suites. Specifically, we implement the top three function kernels that account for more than 85% of the total run-time. Simulations show that through novel core design, allocation and placement strategies our NoC-based implementation can achieve function-level speedups of 388x to 786x and system-level speedups in excess of 5000x over state-of-the-art multithreaded software.

application specific systems architectures and processors | 2010

An optimized NoC architecture for accelerating TSP kernels in breakpoint median problem

Turbo Majumder; Souradip Sarkar; Partha Pratim Pande; Ananth Kalyanaraman

Traveling Salesman Problem (TSP) is a classical NP-complete problem in graph theory. It aims at finding a least-cost Hamiltonian cycle that traverses all vertices of an input edge-weighted graph. One application of TSP is in breakpoint median-based Maximum Parsimony phylogenetic tree reconstruction, wherein a bounded edge-weight model is used. Exponential algorithms that apply efficient heuristics, such as branch-and-bound, to dynamically prune the search space are used. We adopted this approach in an NoC-based implementation for solving TSP targeted towards phylogenetics taking advantage of the fine-grained parallelism and efficient communication network. The largest fraction of the solution time for TSP is accounted for by a particular lower bound calculation operation that uses the graphs adjacency matrix. In this paper, we present the design and implementation of the processing elements with a highly optimized lower bound computation kernel and evaluate its performance. Additionally, we explore two major NoC architectures -mesh and quad-tree - and show that the latter is more suitable for this application domain.

international symposium on circuits and systems | 2015

NoC router using STT-MRAM based hybrid buffers with error correction and limited flit retransmission

Turbo Majumder; Manan Suri; Vinay Shekhar

In this paper, we present a unique methodology to implement deep IO buffers for Network-on-Chip (NoC) platform, based on a hybrid design involving conventional SRAM and emerging Spin-Transfer-Torque Magnetic Random Access Memory (STT-MRAM) technology. We focus on the system-level impact of probabilistic switching of STT-MRAM devices, arising when write latency of STT-MRAM is reduced through conservative programming and aggressive scaling. We incorporate STT-MRAM specific error detection and correction schemes at the input buffers, and propose a new limited flit retransmission scheme to reduce flit errors due to the probabilistic switching. Our hybrid STT-MRAM buffers along with additional logic consume less than 80% of the area of SRAM-only FIFOs of the same depth. We demonstrate optimum NoC throughput at moderate injection rates on a mesh NoC.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2013

Network-on-Chip with Long-Range Wireless Links for High-Throughput Scientific Computation

Turbo Majumder; Partha Pratim Pande; Ananth Kalyanaraman

Several emerging application domains in scientific computing demand high computation throughputs to achieve terascale or higher performance. Dedicated centers hosting scientific computing tools on a few high-end servers could rely on hardware accelerator co-processors that contain multiple lightweight custom cores interconnected through an on-chip network. While network-on-chip (NoC) driven platforms have been studied in the context of accelerating individual applications, this work studies the efficacy of NoC-based platforms to enhance overall computation throughput in the presence of several concurrently executing jobs. Use of long-range links has been shown to reduce network diameter and we use this property in conjunction with different resource allocation strategies to deliver high throughput. Our experiments using a computational biology application suite as a demonstration study show that the use of long-range wireless shortcuts coupled with the appropriate resource allocation strategy delivers computation throughput over 1011 operations per second, consuming ~0.5 nJ per operation.

Explore More