Anantharaman Kalyanaraman

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Anantharaman Kalyanaraman is active.

Explore More

Publication

Featured researches published by Anantharaman Kalyanaraman.

Molecular Genetics and Genomics | 2004

A survey of SL1-spliced transcripts from the root-lesion nematode Pratylenchus penetrans

Makedonka Mitreva; A. A. Elling; Mike Dante; Andrew P. Kloek; Anantharaman Kalyanaraman; Srinivas Aluru; Sandra W. Clifton; D. McK. Bird; Thomas J. Baum; James P. McCarter

Plant-parasitic nematodes are important and cosmopolitan pathogens of crops. Here, we describe the generation and analysis of 1928 expressed sequence tags (ESTs) of a splice-leader 1 (SL1) library from mixed life stages of the root-lesion nematode Pratylenchus penetrans. The ESTs were grouped into 420 clusters and classified by function using the Gene Ontology (GO) hierarchy and the Kyoto KEGG database. Approximately 80% of all translated clusters show homology to Caenorhabditis elegans proteins, and 37% of the C. elegans gene homologs had confirmed phenotypes as assessed by RNA interference tests. Use of an SL1-PCR approach, while ensuring the cloning of the 5′ ends of mRNAs, has demonstrated bias toward short transcripts. Putative nematode-specific and Pratylenchus -specific genes were identified, and their implications for nematode control strategies are discussed.

IEEE Transactions on Parallel and Distributed Systems | 2003

Space and time efficient parallel algorithms and software for EST clustering

Anantharaman Kalyanaraman; Srinivas Aluru; Volker Brendel; Suresh C. Kothari

Expressed sequence tags, abbreviated as ESTs, are DNA molecules experimentally derived from expressed portions of genes. Clustering of ESTs is essential for gene recognition and for understanding important genetic variations such as those resulting in diseases. We present the algorithmic foundations and implementation of PaCE, a parallel software system we developed for large-scale EST clustering. The novel features of our approach include 1) design of space-efficient algorithms to limit the space required to linear in the size of the input data set, 2) a combination of algorithmic techniques to reduce the total work without sacrificing the quality of EST clustering, and 3) use of parallel processing to reduce runtime and facilitate clustering of large data sets. Using a combination of these techniques, we report the clustering of 327,632 rat ESTs in 47 minutes, and 420,694 Triticum aestivum ESTs in 3 hours and 15 minutes, using a 60-processor IBM xSeries cluster. These problems are well beyond the capabilities of state-of-the-art sequential software. We also present thorough experimental evaluation of our software including quality assessment using benchmark Arabidopsis EST data.

Journal of Bioinformatics and Computational Biology | 2006

Efficient algorithms and software for detection of full-length LTR retrotransposons.

Anantharaman Kalyanaraman; Srinivas Aluru

LTR retrotransposons constitute one of the most abundant classes of repetitive elements in eukaryotic genomes. In this paper, we present a new algorithm for detection of full-length LTR retrotransposons in genomic sequences. The algorithm identifies regions in a genomic sequence that show structural characteristics of LTR retrotransposons. Three key components distinguish our algorithm from that of current software-(i) a novel method that preprocesses the entire genomic sequence in linear time and produces high quality pairs of LTR candidates in running time that is constant per pair, (ii) a thorough alignment-based evaluation of candidate pairs to ensure high quality prediction, and (Hi) a robust parameter set encompassing both structural constraints and quality controls providing users with a high degree of flexibility. Validation of both our serial and parallel implementations of the algorithm against the yeast genome indicates both superior quality and performance results when compared to existing software.

IEEE Transactions on Parallel and Distributed Systems | 2012

pGraph: Efficient Parallel Construction of Large-Scale Protein Sequence Homology Graphs

Changjun Wu; Anantharaman Kalyanaraman; William R. Cannon

Detecting sequence homology between protein sequences is a fundamental problem in computational molecular biology, with a pervasive application in nearly all analyses that aim to structurally and functionally characterize protein molecules. While detecting the homology between two protein sequences is relatively inexpensive, detecting pairwise homology for a large number of protein sequences can become computationally prohibitive for modern inputs, often requiring millions of CPU hours. Yet, there is currently no robust support to parallelize this kernel. In this paper, we identify the key characteristics that make this problem particularly hard to parallelize, and then propose a new parallel algorithm that is suited for detecting homology on large data sets using distributed memory parallel computers. Our method, called pGraph, is a novel hybrid between the hierarchical multiple-master/worker model and producer-consumer model, and is designed to break the irregularities imposed by alignment computation and work generation. Experimental results show that pGraph achieves linear scaling on a 2,048 processor distributed memory cluster for a wide range of inputs ranging from as small as 20,000 sequences to 2,560,000 sequences. In addition to demonstrating strong scaling, we present an extensive report on the performance of the various system components and related parametric studies.

ieee international conference on high performance computing, data, and analytics | 2014

Scaling graph community detection on the Tilera many-core architecture

Daniel G. Chavarría-Miranda; Mahantesh Halappanavar; Anantharaman Kalyanaraman

In an era when power constraints and data movement are proving to be significant barriers for the application of high-end computing, the Tilera many-core architecture offers a low-power platform exhibiting many important characteristics of future systems, including a large number of simple cores, a sophisticated network-on-chip, and fine-grained control over memory and caching policies. While this emerging architecture has been previously studied for structured compute-intensive kernels, benchmarking the platform for data-bound, irregular applications present significant challenges that have remained unexplored. Community detection is an advanced prototypical graph-theoretic operation with applications in numerous scientific domains including life sciences, cyber security, and power systems. In this work, we explore multiple design strategies toward developing a scalable tool for community detection on the Tilera platform. Using several memory layout and work scheduling techniques we demonstrate speedups of up to 47× on 36 cores of the Tilera TileGX36 platform over the best serial implementation, and also show results that have comparable quality and performance to mainstream x86 platforms. To the best of our knowledge this is the first work addressing graph algorithms on the Tilera platform. This study demonstrates that through careful design space exploration, low-power many-core platforms like Tilera can be effectively exploited for graph algorithms that embody all the essential characteristics of an irregular application.

ieee international conference on high performance computing, data, and analytics | 2012

Towards scalable optimal sequence homology detection

Jeffrey A. Daily; Sriram Krishnamoorthy; Anantharaman Kalyanaraman

The field of bioinformatics and computational biology is experiencing a data revolution — experimental techniques to procure data have increased in throughput improved in accuracy and reduced in costs. This has spurred an array of high profile sequencing and data generation projects. While the data repositories represent untapped reservoirs of rich information critical for scientific breakthroughs the analytical software tools that are needed to analyze large volumes of such sequence data have significantly lagged behind in their capacity to scale. In this paper we address homology detection which is a fundamental problem in large-scale sequence analysis with numerous applications. We present a scalable framework to conduct large-scale optimal homology detection on massively parallel super-computing platforms. Our approach employs distributed memory work stealing to effectively parallelize optimal pairwise alignment computation tasks. Results on 120,000 cores of the Hopper Cray XE6 supercomputer demonstrate strong scaling and up to 2.42 × 107 optimal pairwise sequence alignments computed per second (PSAPS) the highest reported in the literature.

international conference on parallel processing | 2010

A Scalable Parallel Algorithm for Large-Scale Protein Sequence Homology Detection

Changjun Wu; Anantharaman Kalyanaraman; William R. Cannon

Protein sequence homology detection is a fundamental problem in computational molecular biology, with a pervasive application in nearly all analyses that aim to structurally and functionally characterize protein molecules. While detecting homology between two protein sequences is computationally inexpensive, detecting pairwise homology at a large-scale becomes prohibitive, requiring millions of CPU hours. Yet, there is currently no efficient method available to parallelize this kernel. In this paper, we present the key characteristics that make this problem particularly hard to parallelize, and then propose a new parallel algorithm that is suited for large-scale protein sequence data. Our method, called pGraph, is designed using a hierarchical multiple-master multiple-worker model, where the processor space is partitioned into subgroups and the hierarchy helps in ensuring the workload is load balanced fashion despite the inherent irregularity that may originate in the input. Experimental evaluation demonstrates that our method scales linearly on all input sizes tested (up to 640K sequences) on a 1,024 node supercomputer. In addition to demonstrating strong scaling, we present an extensive study of the various components of the system and related parametric studies.

international parallel and distributed processing symposium | 2006

Assembling genomes on large-scale parallel computers

Anantharaman Kalyanaraman; Scott J. Emrich; Srinivas Aluru

Assembly of large genomes from tens of millions of short genomic fragments is computationally demanding requiring hundreds of gigabytes of memory and tens of thousands of CPU hours. New gene-enrichment sequencing strategies are expected to further exacerbate this situation. In this paper, we present a massively parallel genome assembly framework. The unique features of our approach include space-efficient and on-demand algorithms that consume only linear space, and heuristic strategies that reduce the number of expensive pairwise sequence alignments while maintaining assembly quality. As part of the ongoing efforts in maize genome sequencing, we applied our assembly framework to the largest available collection of maize genomic data. We report the partitioning of more than 1.6 million fragments of over 1.25 billion nucleotides total size into genomic islands in 2 hours on 1,024 processors of an IBM BlueGene/L supercomputer.

computational systems bioinformatics | 2005

Efficient algorithms and software for detection of full-length LTR retrotransposons

Anantharaman Kalyanaraman; Srinivas Aluru

LTR retrotransposons constitute one of the most abundant classes of repetitive elements in eukaryotic genomes. In this paper, we present a new algorithm for detection of full-length LTR retrotransposons in genomic sequences. The algorithm identifies regions in a genomic sequence that show structural characteristics of LTR retrotransposons. Three key components distinguish our algorithm from that of current software--(i) a novel method that preprocesses the entire genomic sequence in linear time and produces high quality pairs of LTR candidates in run-time that is constant per pair, (ii) a thorough alignment-based evaluation of candidate pairs to ensure high quality prediction, and (iii) a robust parameter set encompassing both structural constraints and quality controls providing users with a high degree of flexibility. We implemented our algorithm into a software program called LTR_par, which can be run on both serial and parallel computers. Validation of our software against the yeast genome indicates superior results in both quality and performance when compared to existing software. Additional validations are presented on rice BACs and chimpanzee genome.

international parallel and distributed processing symposium | 2002

Parallel EST clustering

Anantharaman Kalyanaraman; Srinivas Aluru; Suresh C. Kothari

Expressed sequence tags, abbreviated ESTs, are DNA fragments experimentally derived from expressed portions of genes. Clustering of ESTs is essential for gene recognition and understanding important genetic variations such as those resulting in diseases. In this paper, we present the design and development of a parallel software system for EST clustering. The novel features of our approach include 1) space efficient algorithms to keep the space requirement linear in the size of the input data set, 2) a combination of algorithmic techniques to reduce the total work without sacrificing the quality of EST clustering, and 3) use of parallel processing to reduce the run-time and facilitate the clustering of large data sets. Using a combination of these techniques, we report the clustering of 50,000 maize ESTs in 16 minutes on a 32-processor IBM SP. To our knowledge, this is the first effort in building a parallel software system for EST clustering.

Explore More