Vipin Sachdeva | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vipin Sachdeva is active.

Explore More

Publication

Featured researches published by Vipin Sachdeva.

ieee international symposium on workload characterization | 2005

BioPerf: a benchmark suite to evaluate high-performance computer architecture on bioinformatics applications

David A. Bader; Yue Li; Tao Li; Vipin Sachdeva

The exponential growth in the amount of genomic data has spurred growing interest in large scale analysis of genetic information. Bioinformatics applications, which explore computational methods to allow researchers to sift through the massive biological data and extract useful information, are becoming increasingly important computer workloads. This paper presents BioPerf a benchmark suite of representative bioinformatics applications to facilitate the design and evaluation of high-performance computer architectures for these emerging workloads. Currently, the BioPerf suite contains codes from 10 highly popular bioinformatics packages and covers the major fields of study in computational biology such as sequence comparison, phylogenetic reconstruction, protein structure prediction, and sequence homology & gene finding. We demonstrate the use of BioPerf by providing simulation points of pre-compiled Alpha binaries and with a performance study on IBM Power using IBM Mambo simulations cross-compared with Apple G5 executions. The BioPerf suite (available from www.bioperf.org) includes benchmark source code, input datasets of various sizes, and information for compiling and using the benchmarks. Our benchmark suite includes parallel codes where available.

international parallel and distributed processing symposium | 2007

Exploring the Viability of the Cell Broadband Engine for Bioinformatics Applications

Vipin Sachdeva; Michael Kistler; Evan Speight; Tzy-Hwa Kathy Tzeng

This paper evaluates the performance of bioinformatics applications on the Cell Broadband Engine recently developed at IBM. In particular we focus on two highly popular bioinformatics applications - FASTA and ClustalW. The characteristics of these bioinformatics applications, such as small critical time-consuming code size, regular memory accesses, existing vectorized code and embarrassingly parallel computation, make them uniquely suitable for the Cell processing platform. The price and power advantages afforded by the Cell processor also make it an attractive alternative to general purpose processors. We report preliminary performance results for these applications, and contrast these results with the state-of-the-art hardware.

parallel computing | 2008

Exploring the viability of the Cell Broadband Engine for bioinformatics applications

Vipin Sachdeva; Michael Kistler; Evan Speight; Tzy-Hwa Kathy Tzeng

international parallel and distributed processing symposium | 2012

Mesh Interface Resolution and Ghost Exchange in a Parallel Mesh Representation

Timothy J. Tautges; Jason A. Kraftcheck; Nathan Bertram; Vipin Sachdeva; John Harold Magerlein

Algorithms are described for the resolution of shared vertices and higher-dimensional interfaces on domain-decomposed parallel mesh, and for ghost exchange between neighboring processors. Performance data is given for large (up to 64M tet and 32M hex element) meshes on up to 16k processors. Shared interface resolution for structured mesh is also described. Small modifications are required to enable the algorithm to match vertices based on geometric location, useful for joining multi-piece meshes, this capability is also demonstrated.

international conference on computational science | 2009

Evaluating the Jaccard-Tanimoto Index on Multi-core Architectures

Vipin Sachdeva; Douglas M. Freimuth; Chris Mueller

The Jaccard/Tanimoto coefficient is an important workload, used in a large variety of problems including drug design fingerprinting, clustering analysis, similarity web searching and image segmentation. This paper evaluates the Jaccard coefficient on three platforms: the Cell Broadband EngineTMprocessor Intel ®Xeon ®dual-core platform and Nvidia ®8800 GTX GPU. In our work, we have developed a novel parallel algorithm specially suited for the Cell/B.E. architecture for all-to-all Jaccard comparisons, that minimizes DMA transfers and reuses data in the local store. We show that our implementation on Cell/B.E. outperforms the implementations on comparable Intel platforms by 6-20X with full accuracy, and from 10-50X in reduced accuracy mode, depending on the size of the data, and by more than 60X compared to Nvidia 8800 GTX. In addition to performance, we also discuss in detail our efforts to optimize our workload on these architectures and explain how avenues for optimization on each architecture are very different and vary from one architecture to another for our workload. Our work shows that the algorithms or kernels employed for the Jaccard coefficient calculation are heavily dependent on the traits of the target hardware.

international parallel and distributed processing symposium | 2014

Parallelization of the Trinity Pipeline for De Novo Transcriptome Assembly

Vipin Sachdeva; Chang Sik Kim; Kirk E. Jordan; Martyn Winn

This paper details a distributed-memory implementation of Chrysalis, part of the popular Trinity workflow used for de novo transcripto me assembly. We have implemented changes to Chrysalis, which was previously multi-threaded for shared-memory architectures, to change it to a hybrid implementation which uses both MPI and OpenMP. With the new hybrid implementation, we report speedups of about a factor of twenty for both Graph From Fasta and Reads To Transcripts on an iDataPlex cluster for a sugar beet dataset containing around 130 million reads. Along with the hybrid implementation, we also use PyFasta to speed up Bowtie execution by a factor of three which is also part of the Trinity workflow. Overall, we reduce the runtime of the Chrysalis step of the Trinity workflow from over 50 hours to less than 5 hours for the sugar beet dataset. By enabling the use of multi-node clusters, this implementation is a significant step towards making de novo transcripto me assembly feasible for ever bigger transcripto me datasets.

ieee international symposium on workload characterization | 2007

Characterizing and Improving the Performance of Bioinformatics Workloads on the POWER5 Architecture

Vipin Sachdeva; Evan Speight; Mark W. Stephenson; Lei Chen

This paper examines several mechanisms to improve the performance of life science applications on high-performance computer architectures typically designed for more traditional supercomputing tasks. In particular, we look at the detailed performance characteristics of some of the most popular sequence alignment and homology applications on the POWERS architecture offering from IBM. Through detailed analysis of performance counter information collected from the hardware, we identify the main performance bottleneck in the current POWER5 architecture for these applications is the high branch misprediction penalty of the most time-consuming kernels of these codes. Utilizing our PowerPC full system simulation environment, we show the performance improvement afforded by adding conditional assignments to the PowerPC ISA. We also show the impact of changing the number of functional units to a more appropriate mix for the characteristics of bioinformatics applications. Finally, we examine the benefit of removing the two-cycle penalty currently in the POWERS architecture for taken branches due to the lack of a branch target buffer. Addressing these three performance-limiting aspects provides an average 64% improvement in application performance.

international conference on conceptual structures | 2012

Coupling a Basin Modeling and a Seismic Code using MOAB

Mi Yan; Kirk E. Jordan; Dinesh K. Kaushik; Michael P. Perrone; Vipin Sachdeva; Timothy J. Tautges; John Harold Magerlein

We report on a demonstration of loose multiphysics coupling between a basin modeling code and a seismic code running on a large parallel machine. Multiphysics coupling, which is one critical capability for a high performance computing (HPC) framework, was implemented using the MOAB open-source mesh and field database. MOAB provides for code coupling by storing mesh data and input and output field data for the coupled analysis codes and interpolating the field values between different meshes used by the coupled codes. We found it straightforward to use MOAB to couple the PBSM basin modeling code and the FWI3D seismic code on an IBM Blue Gene/P system. We describe how the coupling was implemented and present benchmarking results for up to 8 racks of Blue Gene/P with 8192 nodes and MPI processes. The coupling code is fast compared to the analysis codes and it scales well up to at least 8192 nodes, indicating that a mesh and field database is an efficient way to implement loose multiphysics coupling for large parallel machines.

BMC Bioinformatics | 2017

K-mer clustering algorithm using a MapReduce framework: application to the parallelization of the Inchworm module of Trinity

Chang Sik Kim; Martyn Winn; Vipin Sachdeva; Kirk E. Jordan

BackgroundDe novo transcriptome assembly is an important technique for understanding gene expression in non-model organisms. Many de novo assemblers using the de Bruijn graph of a set of the RNA sequences rely on in-memory representation of this graph. However, current methods analyse the complete set of read-derived k-mer sequence at once, resulting in the need for computer hardware with large shared memory.ResultsWe introduce a novel approach that clusters k-mers as the first step. The clusters correspond to small sets of gene products, which can be processed quickly to give candidate transcripts. We implement the clustering step using the MapReduce approach for parallelising the analysis of large datasets, which enables the use of compute clusters. The computational task is distributed across the compute system using the industry-standard MPI protocol, and no specialised hardware is required. Using this approach, we have re-implemented the Inchworm module from the widely used Trinity pipeline, and tested the method in the context of the full Trinity pipeline. Validation tests on a range of real datasets show large reductions in the runtime and per-node memory requirements, when making use of a compute cluster.ConclusionsOur study shows that MapReduce-based clustering has great potential for distributing challenging sequencing problems, without loss of accuracy. Although we have focussed on the Trinity package, we propose that such clustering is a useful initial step for other assembly pipelines.

international parallel and distributed processing symposium | 2016

A Memory and Time Scalable Parallelization of the Reptile Error-Correction Code

Vipin Sachdeva; Srinivas Aluru; David A. Bader

This paper details a distributed memory implementation of Reptile, a scalable and accurate spectrum based error-correction method. Reptile uses both k-mer and adjoining k-mers (called tiles) information along with the quality scores of bases to correct substitution-based errors from next generation sequencing machines. Previous approaches to parallelize Preptile have replicated the spectrums on each node which can be prohibitive in terms of memory needed for huge datasets. Our approach distributes both the k-mer and the tile spectrum amongst the processing ranks, relying on message passing for error correction. This allows hardware with any memory size per node to be employed for error-correction using Reptiles algorithm, irrespective of the size of the dataset. As part of our implementation, we have also implemented several heuristics which can be used to run the algorithm optimally based on the advantages of the hardware used. We present our results on IBMs BlueGene/Q architecture for the E.Coli, Drosophila and the human datasets showing excellent scalability with increasing number of nodes. Using 256 nodes of BlueGene/Q, we are able to error correct E.Coli and Drosphila datasets in less than 200 seconds and 600 seconds respectively. The human dataset consisting of 1.55 billion reads is corrected in a little more than two hours using 1024 nodes of BlueGene/Q. All three datasets are corrected with Reptiles memory intensive algorithm with less than 512 MB per process.

Explore More