Ravi K. Madduri
Argonne National Laboratory
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ravi K. Madduri.
Journal of the American Medical Informatics Association | 2008
Scott Oster; Stephen Langella; Shannon Hastings; David Ervin; Ravi K. Madduri; Joshua Phillips; Tahsin M. Kurç; Frank Siebenlist; Peter A. Covitz; Krishnakant Shanbhag; Ian T. Foster; Joel H. Saltz
OBJECTIVE To develop software infrastructure that will provide support for discovery, characterization, integrated access, and management of diverse and disparate collections of information sources, analysis methods, and applications in biomedical research. DESIGN An enterprise Grid software infrastructure, called caGrid version 1.0 (caGrid 1.0), has been developed as the core Grid architecture of the NCI-sponsored cancer Biomedical Informatics Grid (caBIG) program. It is designed to support a wide range of use cases in basic, translational, and clinical research, including 1) discovery, 2) integrated and large-scale data analysis, and 3) coordinated study. MEASUREMENTS The caGrid is built as a Grid software infrastructure and leverages Grid computing technologies and the Web Services Resource Framework standards. It provides a set of core services, toolkits for the development and deployment of new community provided services, and application programming interfaces for building client applications. RESULTS The caGrid 1.0 was released to the caBIG community in December 2006. It is built on open source components and caGrid source code is publicly and freely available under a liberal open source license. The core software, associated tools, and documentation can be downloaded from the following URL: https://cabig.nci.nih.gov/workspaces/Architecture/caGrid. CONCLUSIONS While caGrid 1.0 is designed to address use cases in cancer research, the requirements associated with discovery, analysis and integration of large scale data, and coordinated studies are common in other biomedical fields. In this respect, caGrid 1.0 is the realization of a framework that can benefit the entire biomedical community.
ieee international conference on services computing | 2011
Jia Zhang; Wei Tan; John Alexander; Ian T. Foster; Ravi K. Madduri
Services computing technology enables scientists to expose data and computational resources wrapped as publicly accessible Web services. However, our study indicates that scientific services are currently poorly reused in an ad hoc style. This project aims to help domain scientists find interested services and reuse successful processes to attain their research purposes in the form of workflows. In contrast to existing interface-based services discovery approaches, this paper proposes a novel approach of proactively recommending services in a workflow composition process, based on service usage history. The underpinning is a People-Service-Workflow (PSW) network that models existing scientific artifacts, services and workflows, and their past usage relationships into a social network. Various social network analysis techniques are applied to discover hidden knowledge accrued. A prototyping search engine has been developed as a proof of concept, and is seamlessly integrated as a plug-in into the Tavern a workbench, a widely used scientific workflow management tool.
Journal of Biomedical Informatics | 2014
Bo Liu; Ravi K. Madduri; Borja Sotomayor; Kyle Chard; Lukasz Lacinski; Utpal J. Dave; Jianqiang Li; Chunchen Liu; Ian T. Foster
Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach.
BMC Bioinformatics | 2010
Wei Tan; Ravi K. Madduri; Aleksandra Nenadic; Stian Soiland-Reyes; Dinanath Sulakhe; Ian T. Foster; Carole A. Goble
BackgroundIn biological and medical domain, the use of web services made the data and computation functionality accessible in a unified manner, which helped automate the data pipeline that was previously performed manually. Workflow technology is widely used in the orchestration of multiple services to facilitate in-silico research. Cancer Biomedical Informatics Grid (caBIG) is an information network enabling the sharing of cancer research related resources and caGrid is its underlying service-based computation infrastructure. CaBIG requires that services are composed and orchestrated in a given sequence to realize data pipelines, which are often called scientific workflows.ResultsCaGrid selected Taverna as its workflow execution system of choice due to its integration with web service technology and support for a wide range of web services, plug-in architecture to cater for easy integration of third party extensions, etc. The caGrid Workflow Toolkit (or the toolkit for short), an extension to the Taverna workflow system, is designed and implemented to ease building and running caGrid workflows. It provides users with support for various phases in using workflows: service discovery, composition and orchestration, data access, and secure service invocation, which have been identified by the caGrid community as challenging in a multi-institutional and cross-discipline domain.ConclusionsBy extending the Taverna Workbench, caGrid Workflow Toolkit provided a comprehensive solution to compose and coordinate services in caGrid, which would otherwise remain isolated and disconnected from each other. Using it users can access more than 140 services and are offered with a rich set of features including discovery of data and analytical services, query and transfer of data, security protections for service invocations, state management in service interactions, and sharing of workflows, experiences and best practices. The proposed solution is general enough to be applicable and reusable within other service-computing infrastructures that leverage similar technology stack.
Concurrency and Computation: Practice and Experience | 2014
Ravi K. Madduri; Dinanath Sulakhe; Lukasz Lacinski; Bo Liu; Alex Rodriguez; Kyle Chard; Utpal J. Dave; Ian T. Foster
We describe Globus Genomics, a system that we have developed for rapid analysis of large quantities of next‐generation sequencing genomic data. This system achieves a high degree of end‐to‐end automation that encompasses every stage of data analysis including initial data retrieval from remote sequencing centers or storage (via the Globus file transfer system); specification, configuration, and reuse of multistep processing pipelines (via the Galaxy workflow system); creation of custom Amazon Machine Images and on‐demand resource acquisition via a specialized elastic provisioner (on Amazon EC2); and efficient scheduling of these pipelines over many processors (via the HTCondor scheduler). The system allows biomedical researchers to perform rapid analysis of large next‐generation sequencing datasets in a fully automated manner, without software installation or a need for any local computing infrastructure. We report performance and cost results for some representative workloads. Copyright
Concurrency and Computation: Practice and Experience | 2015
Ravi K. Madduri; Kyle Chard; Ryan Chard; Lukasz Lacinski; Alex Rodriguez; Dinanath Sulakhe; David Kelly; Utpal J. Dave; Ian T. Foster
The use of public cloud computers to host sophisticated scientific data and software is transforming scientific practice by enabling broad access to capabilities previously available only to the few. The primary obstacle to more widespread use of public clouds to host scientific software (‘cloud‐based science gateways’) has thus far been the considerable gap between the specialized needs of science applications and the capabilities provided by cloud infrastructures. We describe here a domain‐independent, cloud‐based science gateway platform, the Globus Galaxies platform, which overcomes this gap by providing a set of hosted services that directly address the needs of science gateway developers. The design and implementation of this platform leverages our several years of experience with Globus Genomics, a cloud‐based science gateway that has served more than 200 genomics researchers across 30 institutions. Building on that foundation, we have implemented a platform that leverages the popular Galaxy system for application hosting and workflow execution; Globus services for data transfer, user and group management, and authentication; and a cost‐aware elastic provisioning model specialized for public cloud resources. We describe here the capabilities and architecture of this platform, present six scientific domains in which we have successfully applied it, report on user experiences, and analyze the economics of our deployments. Published 2015. This article is a U.S. Government work and is in the public domain in the USA.
international conference on web services | 2011
Jia Zhang; Ravi K. Madduri; Wei Tan; Kevin Deichl; John Alexander; Ian T. Foster
caGrid has accumulated a repository of biomedical services, however, how a cancer researcher can find proper services in the caGrid when needed remains a big challenge. This research aims to enhance the cyber infrastructure of caGrid, by developing a mechanism that turns caGrid services into semantic-aware interoperable services. We proposed a service semantics model, and developed a technique that automatically extracts semantic metadata from static WSDL service descriptions. Such semantic information is stored as loosely coupled annotations that can be queried using semantic Web techniques, to enhance services discovery and composition. We also proposed a two-phase discovery technique that helps users quickly identify interested service operations. This paper also reports our examinations over available techniques and recommends a feasible infrastructure for biomedical service reuse. A prototyping system is developed as a proof of concept.
Computational and structural biotechnology journal | 2015
Krithika Bhuvaneshwar; Dinanath Sulakhe; Robinder Gauba; Alex Rodriguez; Ravi K. Madduri; Utpal J. Dave; Lukasz Lacinski; Ian T. Foster; Yuriy Gusev; Subha Madhavan
Next generation sequencing (NGS) technologies produce massive amounts of data requiring a powerful computational infrastructure, high quality bioinformatics software, and skilled personnel to operate the tools. We present a case study of a practical solution to this data management and analysis challenge that simplifies terabyte scale data handling and provides advanced tools for NGS data analysis. These capabilities are implemented using the “Globus Genomics” system, which is an enhanced Galaxy workflow system made available as a service that offers users the capability to process and transfer data easily, reliably and quickly to address end-to-endNGS analysis requirements. The Globus Genomics system is built on Amazon s cloud computing infrastructure. The system takes advantage of elastic scaling of compute resources to run multiple workflows in parallel and it also helps meet the scale-out analysis needs of modern translational genomics research.
extreme science and engineering discovery environment | 2013
Ravi K. Madduri; Paul Dave; Dinanath Sulakhe; Lukasz Lacinski; Bo Liu; Ian T. Foster
We describe Globus Genomics, a system that we have developed for rapid analysis of large quantities of next-generation sequencing (NGS) genomic data. This system is notable for its high degree of end-to-end automation, which encompasses every stage of the data analysis pipeline from initial data access (from remote sequencing center or database, by the Globus Online file transfer system) to on-demand resource acquisition (on Amazon EC2, via the Globus Provision cloud manager); specification, configuration, and reuse of multi-step processing pipelines (via the Galaxy workflow system); and efficient scheduling of these pipelines over many processors (via the Condor scheduler). The system allows biomedical researchers to perform rapid analysis of large NGS datasets using just a web browser in a fully automated manner, without software installation.
IEEE Computer | 2008
Joel H. Saltz; Tahsin M. Kurç; Shannon Hastings; Stephen Langella; Scott Oster; David Ervin; Ashish Sharma; Tony Pan; Metin N. Gurcan; Justin Permar; Renato Ferreira; Philip R. O. Payne; E. Caserta; G. Leone; M.C. Ostrowski; Ravi K. Madduri; Ian T. Foster; Subha Madhavan
Translational research projects target a wide variety of diseases, test many different kinds of biomedical hypotheses, and employ a large assortment of experimental methodologies. Diverse data, complex execution environments, and demanding security and reliability requirements make the implementation of these projects extremely challenging and require novel e-Science technologies.