Dinanath Sulakhe
Argonne National Laboratory
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dinanath Sulakhe.
Nucleic Acids Research | 2006
Natalia Maltsev; Elizabeth M. Glass; Dinanath Sulakhe; Alexis Rodriguez; Mustafa Syed; Tanuja Bompada; Yi Zhang; Mark D'Souza
The PUMA2 system (available at ) is an interactive, integrated bioinformatics environment for high-throughput genetic sequence analysis and metabolic reconstructions from sequence data. PUMA2 provides a framework for comparative and evolutionary analysis of genomic data and metabolic networks in the context of taxonomic and phenotypic information. Grid infrastructure is used to perform computationally intensive tasks. PUMA2 currently contains precomputed analysis of 213 prokaryotic, 22 eukaryotic, 650 mitochondrial and 1493 viral genomes and automated metabolic reconstructions for >200 organisms. Genomic data is annotated with information integrated from >20 sequence, structural and metabolic databases and ontologies. PUMA2 supports both automated and interactive expert-driven annotation of genomes, using a variety of publicly available bioinformatics tools. It also contains a suite of unique PUMA2 tools for automated assignment of gene function, evolutionary analysis of protein families and comparative analysis of metabolic pathways. PUMA2 allows users to submit batch sequence data for automated functional analysis and construction of metabolic models. The results of these analyses are made available to the users in the PUMA2 environment for further interactive sequence analysis and annotation.
BMC Bioinformatics | 2010
Wei Tan; Ravi K. Madduri; Aleksandra Nenadic; Stian Soiland-Reyes; Dinanath Sulakhe; Ian T. Foster; Carole A. Goble
BackgroundIn biological and medical domain, the use of web services made the data and computation functionality accessible in a unified manner, which helped automate the data pipeline that was previously performed manually. Workflow technology is widely used in the orchestration of multiple services to facilitate in-silico research. Cancer Biomedical Informatics Grid (caBIG) is an information network enabling the sharing of cancer research related resources and caGrid is its underlying service-based computation infrastructure. CaBIG requires that services are composed and orchestrated in a given sequence to realize data pipelines, which are often called scientific workflows.ResultsCaGrid selected Taverna as its workflow execution system of choice due to its integration with web service technology and support for a wide range of web services, plug-in architecture to cater for easy integration of third party extensions, etc. The caGrid Workflow Toolkit (or the toolkit for short), an extension to the Taverna workflow system, is designed and implemented to ease building and running caGrid workflows. It provides users with support for various phases in using workflows: service discovery, composition and orchestration, data access, and secure service invocation, which have been identified by the caGrid community as challenging in a multi-institutional and cross-discipline domain.ConclusionsBy extending the Taverna Workbench, caGrid Workflow Toolkit provided a comprehensive solution to compose and coordinate services in caGrid, which would otherwise remain isolated and disconnected from each other. Using it users can access more than 140 services and are offered with a rich set of features including discovery of data and analytical services, query and transfer of data, security protections for service invocations, state management in service interactions, and sharing of workflows, experiences and best practices. The proposed solution is general enough to be applicable and reusable within other service-computing infrastructures that leverage similar technology stack.
Concurrency and Computation: Practice and Experience | 2014
Ravi K. Madduri; Dinanath Sulakhe; Lukasz Lacinski; Bo Liu; Alex Rodriguez; Kyle Chard; Utpal J. Dave; Ian T. Foster
We describe Globus Genomics, a system that we have developed for rapid analysis of large quantities of next‐generation sequencing genomic data. This system achieves a high degree of end‐to‐end automation that encompasses every stage of data analysis including initial data retrieval from remote sequencing centers or storage (via the Globus file transfer system); specification, configuration, and reuse of multistep processing pipelines (via the Galaxy workflow system); creation of custom Amazon Machine Images and on‐demand resource acquisition via a specialized elastic provisioner (on Amazon EC2); and efficient scheduling of these pipelines over many processors (via the HTCondor scheduler). The system allows biomedical researchers to perform rapid analysis of large next‐generation sequencing datasets in a fully automated manner, without software installation or a need for any local computing infrastructure. We report performance and cost results for some representative workloads. Copyright
Concurrency and Computation: Practice and Experience | 2015
Ravi K. Madduri; Kyle Chard; Ryan Chard; Lukasz Lacinski; Alex Rodriguez; Dinanath Sulakhe; David Kelly; Utpal J. Dave; Ian T. Foster
The use of public cloud computers to host sophisticated scientific data and software is transforming scientific practice by enabling broad access to capabilities previously available only to the few. The primary obstacle to more widespread use of public clouds to host scientific software (‘cloud‐based science gateways’) has thus far been the considerable gap between the specialized needs of science applications and the capabilities provided by cloud infrastructures. We describe here a domain‐independent, cloud‐based science gateway platform, the Globus Galaxies platform, which overcomes this gap by providing a set of hosted services that directly address the needs of science gateway developers. The design and implementation of this platform leverages our several years of experience with Globus Genomics, a cloud‐based science gateway that has served more than 200 genomics researchers across 30 institutions. Building on that foundation, we have implemented a platform that leverages the popular Galaxy system for application hosting and workflow execution; Globus services for data transfer, user and group management, and authentication; and a cost‐aware elastic provisioning model specialized for public cloud resources. We describe here the capabilities and architecture of this platform, present six scientific domains in which we have successfully applied it, report on user experiences, and analyze the economics of our deployments. Published 2015. This article is a U.S. Government work and is in the public domain in the USA.
Journal of Clinical Monitoring and Computing | 2005
Dinanath Sulakhe; Alex Rodriguez; Mark D'Souza; Michael Wilde; Veronika Nefedova; Ian T. Foster; Natalia Maltsev
Recent progress in genomics and experimental biology has brought exponential growth of the biological information available for computational analysis in public genomics databases. However, applying the potentially enormous scientific value of this information to the understanding of biological systems requires computing and data storage technology of an unprecedented scale. The Grid, with its aggregated and distributed computational and storage infrastructure, offers an ideal platform for high-throughput bioinformatics analysis. To leverage this we have developed the Genome Analysis Research Environment (GNARE) – a scalable computational system for the high-throughput analysis of genomes, which provides an integrated database and computational backend for data-driven bioinformatics applications. GNARE efficiently automates the major steps of genome analysis including acquisition of data from multiple genomic databases; data analysis by a diverse set of bioinformatics tools; and storage of results and annotations.High-throughput computations in GNARE are performed using distributed heterogeneous Grid computing resources such as Grid2003, TeraGrid, and the DOE Science Grid. Multi-step genome analysis workflows involving massive data processing, the use of application-specific tools and algorithms and updating of an integrated database to provide interactive web access to results are all expressed and controlled by a “virtual data” model which transparently maps computational workflows to distributed Grid resources. This paper describes how Grid technologies such as Globus, Condor, and the Gryphyn Virtual Data System were applied in the development of GNARE. It focuses on our approach to Grid resource allocation and to the use of GNARE as a computational framework for the development of bioinformatics applications.
Nucleic Acids Research | 2014
Dinanath Sulakhe; Sandhya Balasubramanian; Bingqing Xie; Bo Feng; Andrew Taylor; Sheng Wang; Eduardo Berrocal; Utpal J. Dave; Jinbo Xu; Daniela Börnigen; T. Conrad Gilliam; Natalia Maltsev
We have developed Lynx (http://lynx.ci.uchicago.edu)—a web-based database and a knowledge extraction engine, supporting annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Its underlying knowledge base (LynxKB) integrates various classes of information from >35 public databases and private collections, as well as manually curated data from our group and collaborators. Lynx provides advanced search capabilities and a variety of algorithms for enrichment analysis and network-based gene prioritization to assist the user in extracting meaningful knowledge from LynxKB and experimental data, whereas its service-oriented architecture provides public access to LynxKB and its analytical tools via user-friendly web services and interfaces.
Computational and structural biotechnology journal | 2015
Krithika Bhuvaneshwar; Dinanath Sulakhe; Robinder Gauba; Alex Rodriguez; Ravi K. Madduri; Utpal J. Dave; Lukasz Lacinski; Ian T. Foster; Yuriy Gusev; Subha Madhavan
Next generation sequencing (NGS) technologies produce massive amounts of data requiring a powerful computational infrastructure, high quality bioinformatics software, and skilled personnel to operate the tools. We present a case study of a practical solution to this data management and analysis challenge that simplifies terabyte scale data handling and provides advanced tools for NGS data analysis. These capabilities are implemented using the “Globus Genomics” system, which is an enhanced Galaxy workflow system made available as a service that offers users the capability to process and transfer data easily, reliably and quickly to address end-to-endNGS analysis requirements. The Globus Genomics system is built on Amazon s cloud computing infrastructure. The system takes advantage of elastic scaling of compute resources to run multiple workflows in parallel and it also helps meet the scale-out analysis needs of modern translational genomics research.
extreme science and engineering discovery environment | 2013
Ravi K. Madduri; Paul Dave; Dinanath Sulakhe; Lukasz Lacinski; Bo Liu; Ian T. Foster
We describe Globus Genomics, a system that we have developed for rapid analysis of large quantities of next-generation sequencing (NGS) genomic data. This system is notable for its high degree of end-to-end automation, which encompasses every stage of the data analysis pipeline from initial data access (from remote sequencing center or database, by the Globus Online file transfer system) to on-demand resource acquisition (on Amazon EC2, via the Globus Provision cloud manager); specification, configuration, and reuse of multi-step processing pipelines (via the Galaxy workflow system); and efficient scheduling of these pipelines over many processors (via the Condor scheduler). The system allows biomedical researchers to perform rapid analysis of large NGS datasets using just a web browser in a fully automated manner, without software installation.
international conference on web services | 2009
Wei Tan; Kyle Chard; Dinanath Sulakhe; Ravi K. Madduri; Ian T. Foster; Stian Soiland-Reyes; Carole A. Goble
In scientific collaboration platforms such as caGrid,workflow-as-a-service is a useful concept for various reasons, such as easy reuse of workflows, access to remote resources, security concerns, and improved execution performance. We propose a solution for facilitating workflow-as-a-service based on Taverna as the workflow engine and gRAVI as a service wrapping tool. We provide both a generic service to execute all Taverna workflows, and an easy-to-use tool (gRAVI-t) for users to wrap their workflows as workflow-specific services, without developing service code. The signature of the specific service is identical to the corresponding workflows input/output definition and is therefore more self-explained to workflow users. These two categories of services are useful in different scenarios, respectively. We use a tumor analysis workflow as an example to demonstrate how the workflow-as-a-service approach benefits the execution performance. Finally a conclusion is drawn and future research opportunities are discussed.
international conference of the ieee engineering in medicine and biology society | 2008
Dinanath Sulakhe; Alex Rodriguez; Michael Wilde; Ian T. Foster; Natalia Maltsev
Bioinformatics tools used for efficient and computationally intensive analysis of genetic sequences require large-scale computational resources to accommodate the growing data. Grid computational resources such as the Open Science Grid and TeraGrid have proved useful for scientific discovery. The genome analysis and database update system (GADU) is a high-throughput computational system developed to automate the steps involved in accessing the Grid resources for running bioinformatics applications. This paper describes the requirements for building an automated scalable system such as GADU that can run jobs on different Grids. The paper describes the resource-independent configuration of GADU using the Pegasus-based virtual data system that makes high-throughput computational tools interoperable on heterogeneous Grid resources. The paper also highlights the features implemented to make GADU a gateway to computationally intensive bioinformatics applications on the Grid. The paper will not go into the details of problems involved or the lessons learned in using individual Grid resources as it has already been published in our paper on genome analysis research environment (GNARE) and will focus primarily on the architecture that makes GADU resource independent and interoperable across heterogeneous Grid resources.