Kyle Chard | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kyle Chard is active.

Explore More

Publication

Featured researches published by Kyle Chard.

international conference on cloud computing | 2010

Social Cloud: Cloud Computing in Social Networks

Kyle Chard; Simon Caton; Omer Farooq Rana; Kris Bubendorfer

With the increasingly ubiquitous nature of Social networks and Cloud computing, users are starting to explore new ways to interact with, and exploit these developing paradigms. Social networks are used to reflect real world relationships that allow users to share information and form connections between one another, essentially creating dynamic Virtual Organizations. We propose leveraging the pre-established trust formed through friend relationships within a Social network to form a dynamic“Social Cloud”, enabling friends to share resources within the context of a Social network. We believe that combining trust relationships with suitable incentive mechanisms (through financial payments or bartering) could provide much more sustainable resource sharing mechanisms. This paper outlines our vision of, and experiences with, creating a Social Storage Cloud, looking specifically at possible market mechanisms that could be used to create a dynamic Cloud infrastructure in a Social network environment.

Journal of Biomedical Informatics | 2014

Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses

Bo Liu; Ravi K. Madduri; Borja Sotomayor; Kyle Chard; Lukasz Lacinski; Utpal J. Dave; Jianqiang Li; Chunchen Liu; Ian T. Foster

Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach.

Concurrency and Computation: Practice and Experience | 2014

Experiences building Globus Genomics: a next-generation sequencing analysis service using Galaxy, Globus, and Amazon Web Services

Ravi K. Madduri; Dinanath Sulakhe; Lukasz Lacinski; Bo Liu; Alex Rodriguez; Kyle Chard; Utpal J. Dave; Ian T. Foster

We describe Globus Genomics, a system that we have developed for rapid analysis of large quantities of next‐generation sequencing genomic data. This system achieves a high degree of end‐to‐end automation that encompasses every stage of data analysis including initial data retrieval from remote sequencing centers or storage (via the Globus file transfer system); specification, configuration, and reuse of multistep processing pipelines (via the Galaxy workflow system); creation of custom Amazon Machine Images and on‐demand resource acquisition via a specialized elastic provisioner (on Amazon EC2); and efficient scheduling of these pipelines over many processors (via the HTCondor scheduler). The system allows biomedical researchers to perform rapid analysis of large next‐generation sequencing datasets in a fully automated manner, without software installation or a need for any local computing infrastructure. We report performance and cost results for some representative workloads. Copyright

IEEE Cloud Computing | 2014

Efficient and Secure Transfer, Synchronization, and Sharing of Big Data

Kyle Chard; Steven Tuecke; Ian T. Foster

Cloud computing provides a scalable computing platform through which large datasets can be stored and analyzed. However, because of the number of storage models used and rapidly increasing data sizes, it is often difficult to efficiently and securely access, transfer, synchronize, and share data. The authors describe the approaches taken by Globus to create standard data interfaces and common security models for performing these actions on large quantities of data. These approaches are general, allowing users to access different types of cloud storage with the same ease with which they access local storage. Through an existing network of more than 8,000 active storage endpoints and support for direct access to cloud storage, Globus has demonstrated both the effectiveness and scalability of the approaches presented.

IEEE Transactions on Parallel and Distributed Systems | 2013

High Performance Resource Allocation Strategies for Computational Economies

Kyle Chard; Kris Bubendorfer

Utility computing models have long been the focus of academic research, and with the recent success of commercial cloud providers, computation and storage is finally being realized as the fifth utility. Computational economies are often proposed as an efficient means of resource allocation, however adoption has been limited due to a lack of performance and high overheads. In this paper, we address the performance limitations of existing economic allocation models by defining strategies to reduce the failure and reallocation rate, increase occupancy and thereby increase the obtainable utilization of the system. The high-performance resource utilization strategies presented can be used by market participants without requiring dramatic changes to the allocation protocol. The strategies considered include overbooking, advanced reservation, just-in-time bidding, and using substitute providers for service delivery. The proposed strategies have been implemented in a distributed metascheduler and evaluated with respect to Grid and cloud deployments. Several diverse synthetic workloads have been used to quantity both the performance benefits and economic implications of these strategies.

international conference on e-science | 2015

Globus Data Publication as a Service: Lowering Barriers to Reproducible Science

Kyle Chard; Jim Pruyne; Ben Blaiszik; Rachana Ananthakrishnan; Steven Tuecke; Ian T. Foster

Broad access to the data on which scientific results are based is essential for verification, reproducibility, and extension. Scholarly publication has long been the means to this end. But as data volumes grow, new methods beyond traditional publications are needed for communicating, discovering, and accessing scientific data. We describe data publication capabilities within the Globus research data management service, which supports publication of large datasets, with customizable policies for different institutions and researchers, the ability to publish data directly from both locally owned storage and cloud storage, extensible metadata that can be customized to describe specific attributes of different research domains, flexible publication and curation workflows that can be easily tailored to meet institutional requirements, and public and restricted collections that give complete control over who may access published data. We describe the architecture and implementation of these new capabilities and review early results from pilot projects involving nine research communities that span a range of data sizes, data types, disciplines, and publication policies.

Concurrency and Computation: Practice and Experience | 2015

The Globus Galaxies platform: delivering science gateways as a service

Ravi K. Madduri; Kyle Chard; Ryan Chard; Lukasz Lacinski; Alex Rodriguez; Dinanath Sulakhe; David Kelly; Utpal J. Dave; Ian T. Foster

The use of public cloud computers to host sophisticated scientific data and software is transforming scientific practice by enabling broad access to capabilities previously available only to the few. The primary obstacle to more widespread use of public clouds to host scientific software (‘cloud‐based science gateways’) has thus far been the considerable gap between the specialized needs of science applications and the capabilities provided by cloud infrastructures. We describe here a domain‐independent, cloud‐based science gateway platform, the Globus Galaxies platform, which overcomes this gap by providing a set of hosted services that directly address the needs of science gateway developers. The design and implementation of this platform leverages our several years of experience with Globus Genomics, a cloud‐based science gateway that has served more than 200 genomics researchers across 30 institutions. Building on that foundation, we have implemented a platform that leverages the popular Galaxy system for application hosting and workflow execution; Globus services for data transfer, user and group management, and authentication; and a cost‐aware elastic provisioning model specialized for public cloud resources. We describe here the capabilities and architecture of this platform, present six scientific domains in which we have successfully applied it, report on user experiences, and analyze the economics of our deployments. Published 2015. This article is a U.S. Government work and is in the public domain in the USA.

PLOS ONE | 2016

Predictive Big Data Analytics: A Study of Parkinson's Disease Using Large, Complex, Heterogeneous, Incongruent, Multi-Source and Incomplete Observations.

Ivo D. Dinov; Ben Heavner; Ming Tang; Gustavo Glusman; Kyle Chard; Mike D'Arcy; Ravi K. Madduri; Judy Pa; Cathie Spino; Carl Kesselman; Ian T. Foster; Eric W. Deutsch; Nathan D. Price; John D. Van Horn; Joseph Ames; Kristi A. Clark; Leroy Hood; Benjamin M. Hampstead; William T. Dauer; Arthur W. Toga

Background A unique archive of Big Data on Parkinson’s Disease is collected, managed and disseminated by the Parkinson’s Progression Markers Initiative (PPMI). The integration of such complex and heterogeneous Big Data from multiple sources offers unparalleled opportunities to study the early stages of prevalent neurodegenerative processes, track their progression and quickly identify the efficacies of alternative treatments. Many previous human and animal studies have examined the relationship of Parkinson’s disease (PD) risk to trauma, genetics, environment, co-morbidities, or life style. The defining characteristics of Big Data–large size, incongruency, incompleteness, complexity, multiplicity of scales, and heterogeneity of information-generating sources–all pose challenges to the classical techniques for data management, processing, visualization and interpretation. We propose, implement, test and validate complementary model-based and model-free approaches for PD classification and prediction. To explore PD risk using Big Data methodology, we jointly processed complex PPMI imaging, genetics, clinical and demographic data. Methods and Findings Collective representation of the multi-source data facilitates the aggregation and harmonization of complex data elements. This enables joint modeling of the complete data, leading to the development of Big Data analytics, predictive synthesis, and statistical validation. Using heterogeneous PPMI data, we developed a comprehensive protocol for end-to-end data characterization, manipulation, processing, cleaning, analysis and validation. Specifically, we (i) introduce methods for rebalancing imbalanced cohorts, (ii) utilize a wide spectrum of classification methods to generate consistent and powerful phenotypic predictions, and (iii) generate reproducible machine-learning based classification that enables the reporting of model parameters and diagnostic forecasting based on new data. We evaluated several complementary model-based predictive approaches, which failed to generate accurate and reliable diagnostic predictions. However, the results of several machine-learning based classification methods indicated significant power to predict Parkinson’s disease in the PPMI subjects (consistent accuracy, sensitivity, and specificity exceeding 96%, confirmed using statistical n-fold cross-validation). Clinical (e.g., Unified Parkinsons Disease Rating Scale (UPDRS) scores), demographic (e.g., age), genetics (e.g., rs34637584, chr12), and derived neuroimaging biomarker (e.g., cerebellum shape index) data all contributed to the predictive analytics and diagnostic forecasting. Conclusions Model-free Big Data machine learning-based classification methods (e.g., adaptive boosting, support vector machines) can outperform model-based techniques in terms of predictive precision and reliability (e.g., forecasting patient diagnosis). We observed that statistical rebalancing of cohort sizes yields better discrimination of group differences, specifically for predictive analytics based on heterogeneous and incomplete PPMI data. UPDRS scores play a critical role in predicting diagnosis, which is expected based on the clinical definition of Parkinson’s disease. Even without longitudinal UPDRS data, however, the accuracy of model-free machine learning based classification is over 80%. The methods, software and protocols developed here are openly shared and can be employed to study other neurodegenerative disorders (e.g., Alzheimer’s, Huntington’s, amyotrophic lateral sclerosis), as well as for other predictive Big Data analytics applications.

ieee international conference on high performance computing data and analytics | 2012

Deploying Bioinformatics Workflows on Clouds with Galaxy and Globus Provision

Bo Liu; Borja Sotomayor; Ravi K. Madduri; Kyle Chard; Ian T. Foster

Cloud computing is attracting increasing attention as a means of providing users with fast provisioning of computational and storage resources, elastic scaling, and payas-you-go pricing. The integration of scientific workflows and Cloud computing has the potential to significantly improve resource utilization, processing speed, and user experience. This paper proposes a novel approach for deploying bioinformatics workflows in Cloud environments using Galaxy, a platform for scientific workflows, and Globus Provision, a tool for deploying distributed computing clusters on Amazon EC2. Collectively this combination of tools provides an easy to use, high performance and scalable workflow environment that addresses the needs of data-intensive applications through dynamic cluster configuration, automatic user-defined node provisioning, high speed data transfer, and automated deployment and configuration of domain-specific software. To demonstrate how this approach can be used in practice we present a domain-specific workflow use case and performance evaluation.

ieee international conference on escience | 2011

Collaborative eResearch in a Social Cloud

Ashfag M. Thaufeeg; Kris Bubendorfer; Kyle Chard

Social networks provide a useful basis for enabling collaboration among groups of individuals. This is applicable not only to social communities but also to the scientific community. Already scientists are leveraging social networking concepts in projects to form groups, share information and communicate with their peers. For scientific projects which require large computing resources, one useful aspect of collaboration is the sharing of computing resources among project members. A social network provides an ideal platform to share these resources. This paper introduces a framework for Social Cloud computing with a view towards collaboration and resource sharing within a scientific community. The architecture of a Social Cloud, where individuals or institutions contribute the capacity of their computing resources by means of Virtual Machines leased through the social network, is outlined. Members of the Social Cloud can contribute, request, and use Virtual Machines from other members, as well as form Virtual Organizations among groups of members.

Explore More