Dan Fay | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dan Fay is active.

Explore More

Publication

Featured researches published by Dan Fay.

International Journal of Digital Earth | 2011

Spatial cloud computing: how can the geospatial sciences use and help shape cloud computing?

Chaowei Phil Yang; Michael F. Goodchild; Qunying Huang; Doug Nebert; Robert Raskin; Yan Xu; Myra Bambacus; Dan Fay

Abstract The geospatial sciences face grand information technology (IT) challenges in the twenty-first century: data intensity, computing intensity, concurrent access intensity and spatiotemporal intensity. These challenges require the readiness of a computing infrastructure that can: (1) better support discovery, access and utilization of data and data processing so as to relieve scientists and engineers of IT tasks and focus on scientific discoveries; (2) provide real-time IT resources to enable real-time applications, such as emergency response; (3) deal with access spikes; and (4) provide more reliable and scalable service for massive numbers of concurrent users to advance public knowledge. The emergence of cloud computing provides a potential solution with an elastic, on-demand computing platform to integrate – observation systems, parameter extracting algorithms, phenomena simulations, analytical visualization and decision support, and to provide social impact and user feedback – the essential elements of the geospatial sciences. We discuss the utilization of cloud computing to support the intensities of geospatial sciences by reporting from our investigations on how cloud computing could enable the geospatial sciences and how spatiotemporal principles, the kernel of the geospatial sciences, could be utilized to ensure the benefits of cloud computing. Four research examples are presented to analyze how to: (1) search, access and utilize geospatial data; (2) configure computing infrastructure to enable the computability of intensive simulation models; (3) disseminate and utilize research results for massive numbers of concurrent users; and (4) adopt spatiotemporal principles to support spatiotemporal intensive applications. The paper concludes with a discussion of opportunities and challenges for spatial cloud computing (SCC).

hawaii international conference on system sciences | 2009

GrayWulf: Scalable Clustered Architecture for Data Intensive Computing

Alexander S. Szalay; Gordon Bell; Jan Vandenberg; Alainna Wonders; Randal C. Burns; Dan Fay; J. N. Heasley; Tony Hey; Maria A. Nieto-santisteban; Ani Thakar; Richard Wilton

Data intensive computing presents a significant challenge for traditional supercomputing architectures that maximize FLOPS since CPU speed has surpassed IO capabilities of HPC systems and BeoWulf clusters. We present the architecture for a three tier commodity component cluster designed for a range of data intensive computations operating on petascale data sets named GrayWulf. The design goal is a balanced system in terms of IO performance and memory size, according to Amdahls Laws. The hardware currently installed at JHU exceeds one petabyte of storage and has 0.5 bytes/sec of I/O and 1 byte of memory for each CPU cycle. The GrayWulf provides almost an order of magnitude better balance than existing systems. The paper covers its architecture and reference applications. The software design is presented in a companion paper.

challenges of large applications in distributed environments | 2008

Efficient scheduling of scientific workflows in a high performance computing cluster

Roger S. Barga; Dan Fay; Dean Guo; Steven Newhouse; Yogesh Simmhan; Alexander S. Szalay

The scientific computing community, especially academia is clearly in need of technology to handle and organize the 1-100+ Terabyte datasets coming from computer simulations and scientific instrumentation. In this paper we briefly describe GrayWulf, an exemplar cluster for data intensive applications using SQL Server and HPC Clusters. One of the key software components of GrayWulf is Trident, a scientific workflow workbench that performs automatic scheduling of workflows across the cluster. We examine the challenges of scheduling workflows on GrayWulf, algorithms to improve performance, and present early results from applying Trident to schedule data loading workflows on GrayWulf for an actual e-Science project

scientific cloud computing | 2014

Science in the cloud: lessons from three years of research projects on microsoft azure

Dennis Gannon; Dan Fay; Daron Green; Kenji Takeda; Wenming Yi

Microsoft Research is now in its fourth year of awarding Windows Azure cloud resources to the academic community. As of April 2014, over 200 research projects have started. In this paper we review the results of this effort to date. We also characterize the computational paradigms that work well in public cloud environments and those that are usually disappointing. We also discuss many of the barriers to successfully using commercial cloud platforms in research and ways these problems can be overcome.

Annals of Gis: Geographic Information Sciences | 2013

A visualization-enhanced graphical user interface for geospatial resource discovery

Zhipeng Gui; Chaowei Yang; Jizhe Xia; Jing Li; Abdelmounaam Rezgui; Min Sun; Yan Xu; Dan Fay

Information visualization and user interaction play critical roles in geospatial resource discovery processes. Well-designed graphical user interfaces improve user experience and also help convey important information to assist decision-making. Existing approaches have visualization problems that impact the efficiency of geospatial resource discovery including (1) search portals lack intuitive and diverse information visualization methods to present search results; (2) functions to sort, filter, explore and analyse search results are inadequate and inefficient and (3) value-added information to help users make selection is missing. To address these problems, we propose a visualization- and interaction-enhanced discovery workflow. We use the latest Rich Internet Application (RIA) technologies from Microsoft to implement the proposed methods for our search portal – GeoSearch. Specifically, (1) based on the Pivot Viewer, a multiple sorting, filtering and multi-level-zoom-enabled histogram clustering function is implemented to assist in records exploration. (2) A Bing Maps Viewer is built upon the Bing Maps Control to show geo-location (e.g. BoundingBox and server location) of resources and conduct map-based interactions. (3) Multiple data visualization tools are integrated to provide data preview and animation functions. (4) A service quality vieweris developed to help users select resources based on non-functional properties. Results show that the proposed visualization and interaction technologies improve user experience and can help users obtain required geospatial resources effectively and efficiently.

international conference on computing for geospatial research applications | 2011

A service visualization tool for spatial web portal

Chen Xu; Chaowei Yang; Jing Li; Jizhe Xia; Xin Qu; Min Sun; Yan Xu; Dan Fay; Myra Bambacus

This paper introduces the design and development of a client-side Graphical User Interface (GUI) for the NASA Spatial Web Portal. A spatial web portal is an entry point to spatial data and services within spatial platforms. The usability and efficiency of the entry point will decide the success of the entire system. This paper describes visualization and GUI designs and implementation for improving client performance. Abundant information about web service is tightly grouped into different floating windows for better information display. Asynchronous JavaScript and XML (AJAX) technology is implemented to enhance visualization performance. The mirror world tools such as Google Maps, Bing Maps, and NASA World Wind provide new mechanism for spatial data discovery and visualization. We test how the methods utilized along with Bing Maps improve the performance of the NASA Spatial Web Portal.

ieee international conference on high performance computing data and analytics | 2011

Data-intensive science: The Terapixel and MODISAzure projects

Deborah A. Agarwal; You-Wei Cheah; Dan Fay; Jonathan Edgar Fay; Dean Guo; Tony Hey; Marty Humphrey; Keith Jackson; Jie Li; Christophe Poulain; Youngryel Ryu; Catharine van Ingen

We live in an era in which scientific discovery is increasingly driven by data exploration of massive datasets. Scientists today are envisioning diverse data analyses and computations that scale from the desktop to supercomputers, yet often have difficulty designing and constructing software architectures to accommodate the heterogeneous and often inconsistent data at scale. Moreover, scientific data and computational resource needs can vary widely over time. The needs grow as the science collaboration broadens or as additional data is accumulated; the computational demand can have large transients in response to seasonal field campaigns or new instrumentation breakthroughs. Cloud computing can offer a scalable, economic, on-demand model that is well matched to some of these evolving science needs. This paper presents two of our experiences over the last year — the Terapixel Project, using workflow, high-performance computing and non-structured query language data processing to render the largest astronomical image for the WorldWide Telescope, and MODISAzure, a science pipeline for image processing, deployed using the Azure Cloud infrastructure.

Archive | 2011

Environmental Informatics: Advancing Data Intensive Sciences to Solve Environmental Problems

Chaowei Yang; Yan Xu; Dan Fay

The 21st Century witnesses emergence of geospatial cyberinfrastructure and other relevant geospatial technologies (Yang et al., 2010) for collecting data, extracting information, simulating phenomena scenarios, and supporting decision making (Caragea et al., 2005; Stadler et al., 2006). The advancements of the geospatial technologies not only provide great opportunities for us to better understand environmental issues and better position us to solve global to local environmental problems (Pecar-Ilic and Ruzic, 2006), but also pose great challenges for us to handle terabytes to petabytes of heterogeneous environmental data. Environmental informatics (Green and Klomp, 1998; Hilty, Page and Hrebi < iek, 2006) should be revisited to efficiently and effectively manage, integrate, and mine information and knowledge from the vast amount of data for supporting environmental decisions (Hey, Tansley and Tolle, 2008).

international conference on geoinformatics | 2011