Steven Timm | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Steven Timm is active.

Explore More

Publication

Featured researches published by Steven Timm.

ieee international conference on cloud computing technology and science | 2017

Understanding the Performance and Potential of Cloud Computing for Scientific Applications

Iman Sadooghi; Jesus Hernandez Martin; Tonglin Li; Kevin Brandstatter; Ketan Maheshwari; Tiago Pais Pitta de Lacerda Ruivo; G. Garzoglio; Steven Timm; Yong Zhao; Ioan Raicu

Commercial clouds bring a great opportunity to the scientific computing area. Scientific applications usually require significant resources, however not all scientists have access to sufficient high-end computing systems. Cloud computing has gained the attention of scientists as a competitive resource to run HPC applications at a potentially lower cost. But as a different infrastructure, it is unclear whether clouds are capable of running scientific applications with a reasonable performance per money spent. This work provides a comprehensive evaluation of EC2 cloud in different aspects. We first analyze the potentials of the cloud by evaluating the raw performance of different services of AWS such as compute, memory, network and I/O. Based on the findings on the raw performance, we then evaluate the performance of the scientific applications running in the cloud. Finally, we compare the performance of AWS with a private cloud, in order to find the root cause of its limitations while running scientific applications. This paper aims to assess the ability of the cloud to perform well, as well as to evaluate the cost of the cloud in terms of both raw performance and scientific applications performance. Furthermore, we evaluate other services including S3, EBS and DynamoDB among many AWS services in order to assess the abilities of those to be used by scientific applications and frameworks. We also evaluate a real scientific computing application through the Swift parallel scripting system at scale. Armed with both detailed benchmarks to gauge expected performance and a detailed monetary cost analysis, we expect this paper will be a recipe cookbook for scientists to help them decide where to deploy and run their scientific applications between public clouds, private clouds, or hybrid clouds.

scientific cloud computing | 2015

High-Performance Storage Support for Scientific Applications on the Cloud

Dongfang Zhao; Xu Yang; Iman Sadooghi; G. Garzoglio; Steven Timm; Ioan Raicu

Although cloud computing has become one of the most popular paradigms for executing data-intensive applications (for example, Hadoop), the storage subsystem is not optimized for scientific applications. We believe that when executing scientific applications in the cloud, a node-local distributed storage architecture is a key approach to overcome the challenges from the conventional shared/parallel storage systems. We analyze and evaluate four representative file systems (S3FS, HDFS, Ceph, and FusionFS) on three platforms (Kodiak cluster, Amazon EC2 and FermiCloud) with a variety of benchmarks to explore how well these storage systems can handle metadata intensive, write intensive, and read intensive workloads.

cluster computing and the grid | 2014

Exploring Infiniband Hardware Virtualization in OpenNebula towards Efficient High-Performance Computing

Tiago Pais Pitta de Lacerda Ruivo; Gerard Bernabeu Altayo; G. Garzoglio; Steven Timm; Hyunwoo Kim; Seo-Young Noh; Ioan Raicu

It has been widely accepted that software virtualization has a big negative impact on high-performance computing (HPC) application performance. This work explores the potential use of Infiniband hardware virtualization in an Open Nebula cloud towards the efficient support of MPI-based workloads. We have implemented, deployed, and tested an Infiniband network on the Fermi Cloud private Infrastructure-as-a-Service (IaaS) cloud. To avoid software virtualization towards minimizing the virtualization overhead, we employed a technique called Single Root Input/Output Virtualization (SRIOV). Our solution spanned modifications to the Linuxs Hypervisor as well as the Open Nebula manager. We evaluated the performance of the hardware virtualization on up to 56 virtual machines connected by up to 8 DDR Infiniband network links, with micro-benchmarks (latency and bandwidth) as well as with a MPI-intensive application (the HPL Linpack benchmark).

ieee international conference on cloud computing technology and science | 2016

A Reference Model for Virtual Machine Launching Overhead

Hao Wu; Shangping Ren; G. Garzoglio; Steven Timm; Gerard Bernabeu; Keith Chadwick; Seo-Young Noh

Cloud bursting is one of the key research topics in the cloud computing communities. A well designed cloud bursting module enables private clouds to automatically launch virtual machines (VMs) to public clouds when more resources are needed. One of the main challenges in developing a cloud bursting module is to decide when and where to launch a VM so that all resources are most effectively and efficiently utilized and the system performance is optimized. However, based on system operational data obtained from FermiCloud, a private cloud developed by the Fermi National Accelerator Laboratory for scientific workflows, the VM launching overhead is not a constant. It varies with physical resource utilization, such as CPU and I/O device utilizations, at the time when a VM is launched. Hence, to make judicious decisions as to when and where a VM should be launched, a VM launching overhead reference model is needed. In this paper, we first develop a VM launching overhead reference model based on operational data we have obtained on FermiCloud. Second, we apply the developed reference model on FermiCloud and compare calculated VM launching overhead values based on the model with measured overhead values on FermiCloud. Our empirical results on FermiCloud indicate that the developed reference model is accurate. We believe, with the guidance of the developed reference model, efficient resource allocation algorithms can be developed for cloud bursting process to minimize the operational cost and resource waste.

cluster computing and the grid | 2014

Modeling the Virtual Machine Launching Overhead under Fermicloud

Hao Wu; Shangping Ren; G. Garzoglio; Steven Timm; Gerard Bernabeu; Seo-Young Noh

Fermi Cloud is a private cloud developed by the Fermi National Accelerator Laboratory for scientific workflows. The Cloud Bursting module of the Fermi Cloud enables the Fermi Cloud, when more computational resources are needed, to automatically launch virtual machines to available resources such as public clouds. One of the main challenges in developing the cloud bursting module is to decide when and where to launch a VM so that all resources are most effectively and efficiently utilized and the system performance is optimized. However, based on Fermi Clouds system operational data, the VM launching overhead is not a constant. It varies with physical resource (CPU, memory, I/O device) utilization at the time when a VM is launched. Hence, to make judicious decisions as to when and where a VM should be launched, a VM launch overhead reference model is needed. The paper is to develop a VM launch overhead reference model based on operational data we have obtained on Fermi Cloud and uses the reference model to guide the cloud bursting process.

Cluster Computing | 2014

vcluster: a framework for auto scalable virtual cluster system in heterogeneous clouds

Seo-Young Noh; Steven Timm; H. Jang

Cloud computing is an emerging technology and is being widely considered for resource utilization in various research areas. One of the main advantages of cloud computing is its flexibility in computing resource allocations. Many computing cycles can be ready in very short time and can be smoothly reallocated between tasks. Because of this, there are many private companies entering the new business of reselling their idle computing cycles. Research institutes have also started building their own cloud systems for their various research purposes. In this paper, we introduce a framework for virtual cluster system called vcluster which is capable of utilizing computing resources from heterogeneous clouds and provides a uniform view in computing resource management. vcluster is an IaaS (Infrastructure as a Service) based cloud resource management system. It distributes batch jobs to multiple clouds depending on the status of queue and system pool. The main design philosophy behind vcluster is cloud and batch system agnostic and it is achieved through plugins. This feature mitigates the complexity of integrating heterogeneous clouds. In the pilot system development, we use FermiCloud and Amazon EC2, which are a private and a public cloud system, respectively. In this paper, we also discuss the features and functionalities that must be considered in virtual cluster systems.

Journal of Physics: Conference Series | 2014

Grids, virtualization, and clouds at Fermilab

Steven Timm; Keith Chadwick; G. Garzoglio; Seo-Young Noh

Fermilab supports a scientific program that includes experiments and scientists located across the globe. To better serve this community, in 2004, the (then) Computing Division undertook the strategy of placing all of the High Throughput Computing (HTC) resources in a Campus Grid known as FermiGrid, supported by common shared services. In 2007, the FermiGrid Services group deployed a service infrastructure that utilized Xen virtualization, LVS network routing and MySQL circular replication to deliver highly available services that offered significant performance, reliability and serviceability improvements. This deployment was further enhanced through the deployment of a distributed redundant network core architecture and the physical distribution of the systems that host the virtual machines across multiple buildings on the Fermilab Campus. In 2010, building on the experience pioneered by FermiGrid in delivering production services in a virtual infrastructure, the Computing Sector commissioned the FermiCloud, General Physics Computing Facility and Virtual Services projects to serve as platforms for support of scientific computing (FermiCloud 6 GPCF) and core computing (Virtual Services). Lastly, this work will present the evolution of the Fermilab Campus Grid, Virtualization and Cloud Computing infrastructure together with plans for the future.

Journal of Physics: Conference Series | 2015

Cloud Services for the Fermilab Scientific Stakeholders

Steven Timm; G. Garzoglio; Parag Mhashilkar; J Boyd; G Bernabeu; N Sharma; N Peregonow; Hyunwoo Kim; Seo-Young Noh; S Palur; Ioan Raicu

As part of the Fermilab/KISTI cooperative research project, Fermilab has successfully run an experimental simulation workflow at scale on a federation of Amazon Web Services (AWS), FermiCloud, and local FermiGrid resources. We used the CernVM-FS (CVMFS) file system to deliver the application software. We established Squid caching servers in AWS as well, using the Shoal system to let each individual virtual machine find the closest squid server. We also developed an automatic virtual machine conversion system so that we could transition virtual machines made on FermiCloud to Amazon Web Services. We used this system to successfully run a cosmic ray simulation of the NOvA detector at Fermilab, making use of both AWS spot pricing and network bandwidth discounts to minimize the cost. On FermiCloud we also were able to run the workflow at the scale of 1000 virtual machines, using a private network routable inside of Fermilab. We present in detail the technological improvements that were used to make this work a reality.

ieee acm international conference utility and cloud computing | 2014

X.509 Authentication and Authorization in Fermi Cloud

Hyunwoo Kim; Steven Timm

We present a summary of how X.509 authentication and authorization are used with Open Nebula in Fermi Cloud. We also describe a history of why the X.509 authentication was needed in Fermi Cloud, and review X.509 authorization options, both internal and external to Open Nebula. We show how these options can be and have been used to successfully run scientific workflows on federated clouds, which include Open Nebula on Fermi Cloud and Amazon Web Services as well as other community clouds. We also outline federation options being used by other commercial and open-source clouds and cloud research projects.

Journal of Physics: Conference Series | 2017

Virtual machine provisioning, code management, and data movement design for the Fermilab HEPCloud Facility

Steven Timm; G Cooper; S Fuess; G. Garzoglio; B Holzman; R Kennedy; D Grassano; A Tiradani; R Krishnamurthy; S Vinayagam; I Raicu; Hao Wu; Shangping Ren; S-Y Noh

The Fermilab HEPCloud Facility Project has as its goal to extend the current Fermilab facility interface to provide transparent access to disparate resources including commercial and community clouds, grid federations, and HPC centers. This facility enables experiments to perform the full spectrum of computing tasks, including data-intensive simulation and reconstruction. We have evaluated the use of the commercial cloud to provide elasticity to respond to peaks of demand without overprovisioning local resources. Full scale data-intensive workflows have been successfully completed on Amazon Web Services for two High Energy Physics Experiments, CMS and NOnA, at the scale of 58000 simultaneous cores. This paper describes the significant improvements that were made to the virtual machine provisioning system, code caching system, and data movement system to accomplish this work. The virtual image provisioning and contextualization service was extended to multiple AWS regions, and to support experiment-specific data configurations. A prototype Decision Engine was written to determine the optimal availability zone and instance type to run on, minimizing cost and job interruptions. We have deployed a scalable on-demand caching service to deliver code and database information to jobs running on the commercial cloud. It uses the frontiersquid server and CERN VM File System (CVMFS) clients on EC2 instances and utilizes various services provided by AWS to build the infrastructure (stack). We discuss the architecture and load testing benchmarks on the squid servers. We also describe various approaches that were evaluated to transport experimental data to and from the cloud, and the optimal solutions that were used for the bulk of the data transport. Finally, we summarize lessons learned from this scale test, and our future plans to expand and improve the Fermilab HEP Cloud Facility.

Explore More