Karsten Schwan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Karsten Schwan is active.

Explore More

Publication

Featured researches published by Karsten Schwan.

high performance distributed computing | 2007

High performance and scalable I/O virtualization via self-virtualized devices

Himanshu Raj; Karsten Schwan

While industry is making rapid advances in system virtualization, for server consolidation and for improving system maintenance and management, it has not yet become clear how virtualization can contribute to the performance of high end systems. In this context, this paper addresses a key issue in system virtualization - how to efficiently virtualize I/O subsystems and peripheral devices. We have developed a novel approach to I/O virtualization, termed self-virtualized devices, which improves I/O performance by off loading select virtualization functionality onto the device. This permits guest virtual machines to more efficiently (i.e., with less overhead and reduced latency) interact with the virtualized device. The concrete instance of such a device developed and evaluated in this paper is a self-virtualized network interface (SV-NIC), targeting the high end NICs used in thehigh performance domain. The SV-NIC (1) provides virtual interfaces (VIFs) to guest virtual machines for an underlying physical device, the network interface, (2) manages the wayin which the devices physical resources are used by guest operating systems, and (3) provides high performance, low overhead network access to guest domains. Experimental results are attained in a prototyping environment using an IXP 2400-based ethernet board as a programmable network device. The SV-NIC scales to large numbers of VIFs and guests, and offers VIFs with 77% higher throughput and 53% less latency compared to the current standard virtualized device implementations on hyper visor-based platforms.

challenges of large applications in distributed environments | 2008

Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS)

Jay F. Lofstead; Scott Klasky; Karsten Schwan; Norbert Podhorszki; Chen Jin

Scientific codes are all subject to variation in performance depending on the runtime platform and/or configuration, the output writing API employed, and the file system for output. Since changing the IO routines to match the optimal or desired configuration for a given system can be costly in terms of human time and machine resources, the Adaptable IO System provides an API nearly as simple as POSIX IO that also provides developers with the flexibility of selection the optimal IO routines for a given platform, without recompilation. As a side effect, we also gain the ability to transparently integrate more tightly with workflow systems like Kepler and Pegasus and visualization systems like Visit with no runtime impact. We achieve this through our library of highly tuned IO routines and other transport methods selected and configured in an XML file read only at startup. ADIOS-based IO has demonstrated high levels of performance and scalability. For example, we have achieved 20 GB/sec write performance using GTC on the Jaguar Cray XT4 system at Oak Ridge National Labs (about 50\% of peak performance). We can change GTC output among MPI-IO synchronous, MPI-IO collective, POSIX IO, no IO (for baseline testing), asynchronous IO using the Georgia Tech DataTap system, and Visit directly for in situ visualization with no changes to the source code. We designed this initial version of ADIOS based on the data requirements of 7 major scientific codes (GTC, Chimera, GTS, XGC1, XGC0, FLASH, and S3D) and have successfully adapted all of them to use ADIOS for all of their IO needs.

ieee international conference on high performance computing data and analytics | 2009

GViM: GPU-accelerated virtual machines

Vishakha Gupta; Ada Gavrilovska; Karsten Schwan; Harshvardhan Kharche; Niraj Tolia; Vanish Talwar; Parthasarathy Ranganathan

The use of virtualization to abstract underlying hardware can aid in sharing such resources and in efficiently managing their use by high performance applications. Unfortunately, virtualization also prevents efficient access to accelerators, such as Graphics Processing Units (GPUs), that have become critical components in the design and architecture of HPC systems. Supporting General Purpose computing on GPUs (GPGPU) with accelerators from different vendors presents significant challenges due to proprietary programming models, heterogeneity, and the need to share accelerator resources between different Virtual Machines (VMs). To address this problem, this paper presents GViM, a system designed for virtualizing and managing the resources of a general purpose system accelerated by graphics processors. Using the NVIDIA GPU as an example, we discuss how such accelerators can be virtualized without additional hardware support and describe the basic extensions needed for resource management. Our evaluation with a Xen-based implementation of GViM demonstrate efficiency and flexibility in system usage coupled with only small performance penalties for the virtualized vs. non-virtualized solutions.

symposium on cloud computing | 2010

Robust and flexible power-proportional storage

Hrishikesh Amur; James Cipar; Varun Gupta; Gregory R. Ganger; Michael Kozuch; Karsten Schwan

Power-proportional cluster-based storage is an important component of an overall cloud computing infrastructure. With it, substantial subsets of nodes in the storage cluster can be turned off to save power during periods of low utilization. Rabbit is a distributed file system that arranges its data-layout to provide ideal power-proportionality down to very low minimum number of powered-up nodes (enough to store a primary replica of available datasets). Rabbit addresses the node failure rates of large-scale clusters with data layouts that minimize the number of nodes that must be powered-up if a primary fails. Rabbit also allows different datasets to use different subsets of nodes as a building block for interference avoidance when the infrastructure is shared by multiple tenants. Experiments with a Rabbit prototype demonstrate its power-proportionality, and simulation experiments demonstrate its properties at scale.

symposium on frontiers of massively parallel computation | 1995

Falcon: on-line monitoring and steering of large-scale parallel programs

Weiming Gu; Greg Eisenhauer; Eileen Kraemer; Karsten Schwan; John T. Stasko; Jeffrey S. Vetter; Nirupama Mallavarupu

Falcon is a system for on-line monitoring and steering of large-scale parallel programs. The purpose of such program steering is to improve the applications performance or to affect its execution behavior. This paper presents the framework of the Falcon system and its implementation, and then evaluates the performance of the system. A complex sample application, a molecular dynamics simulation program (MD), is used to motivate the research as well as to measure the performance of the Falcon system.<<ETX>>

international parallel and distributed processing symposium | 2009

Adaptable, metadata rich IO methods for portable high performance IO

Jay F. Lofstead; Fang Zheng; Scott Klasky; Karsten Schwan

Since IO performance on HPC machines strongly depends on machine characteristics and configuration, it is important to carefully tune IO libraries and make good use of appropriate library APIs. For instance, on current petascale machines, independent IO tends to outperform collective IO, in part due to bottlenecks at the metadata server. The problem is exacerbated by scaling issues, since each IO library scales differently on each machine, and typically, operates efficiently to different levels of scaling on different machines. With scientific codes being run on a variety of HPC resources, efficient code execution requires us to address three important issues: (1) end users should be able to select the most efficient IO methods for their codes, with minimal effort in terms of code updates or alterations; (2) such performance-driven choices should not prevent data from being stored in the desired file formats, since those are crucial for later data analysis; and (3) it is important to have efficient ways of identifying and selecting certain data for analysis, to help end users cope with the flood of data produced by high end codes. This paper employs ADIOS, the ADaptable IO System, as an IO API to address (1)–(3) above. Concerning (1), ADIOS makes it possible to independently select the IO methods being used by each grouping of data in an application, so that end users can use those IO methods that exhibit best performance based on both IO patterns and the underlying hardware. In this paper, we also use this facility of ADIOS to experimentally evaluate on petascale machines alternative methods for high performance IO. Specific examples studied include methods that use strong file consistency vs. delayed parallel data consistency, as that provided by MPI-IO or POSIX IO. Concerning (2), to avoid linking IO methods to specific file formats and attain high IO performance, ADIOS introduces an efficient intermediate file format, termed BP, which can be converted, at small cost, to the standard file formats used by analysis tools, such as NetCDF and HDF-5. Concerning (3), associated with BP are efficient methods for data characterization, which compute attributes that can be used to identify data sets without having to inspect or analyze the entire data contents of large files.

international conference on autonomic computing | 2009

vManage: loosely coupled platform and virtualization management in data centers

Sanjay Kumar; Vanish Talwar; Vibhore Kumar; Parthasarathy Ranganathan; Karsten Schwan

Management is an important challenge for future enterprises. Previous work has addressed platform management (e.g., power and thermal management) separately from virtualization management (e.g., virtual machine (VM) provisioning, application performance). Coordinating the actions taken by these different management layers is important and beneficial, for reasons of performance, stability, and efficiency. Such coordination, in addition to working well with existing multi-vendor solutions, also needs to be extensible to support future new management solutions potentially operating on different sensors and actuators. In response to these requirements, this paper proposes vManage, a solution to loosely couple platform and virtualization management and facilitate coordination between them in data centers. Our solution is comprised of registry and proxy mechanisms that provide unified monitoring and actuation across platform and virtualization domains, and coordinators that provide policy execution for better VM placement and runtime management, including a formal approach to ensure system stability from inefficient management actions. The solution is instantiated in a Xen environment through a platform-aware virtualization manager at a cluster management node, and a virtualization-aware platform manager on each server. Experimental evaluations using enterprise benchmarks show that compared to traditional solutions, vManage can achieve additional power savings (10% lower power) with significantly improved service-level guarantees (71% less violations) and stability (54% fewer VM migrations), at low overhead.

IEEE Transactions on Software Engineering | 1992

Dynamic scheduling of hard real-time tasks and real-time threads

Karsten Schwan; Hongyi Zhou

The authors investigate the dynamic scheduling of tasks with well-defined timing constraints. They present a dynamic uniprocessor scheduling algorithm with an O(n log n) worst-case complexity. The preemptive scheduling performed by the algorithm is shown to be of higher efficiency than that of other known algorithms. Furthermore, tasks may be related by precedence constraints, and they may have arbitrary deadlines and start times (which need not equal their arrival times). An experimental evaluation of the algorithm compares its average case behavior to the worst case. An analytic model used for explanation of the experimental results is validated with actual system measurements. The dynamic scheduling algorithm is the basis of a real-time multiprocessor operating system kernel developed in conjunction with this research. Specifically, this algorithm is used at the lowest, threads-based layer of the kernel whenever threads are created. >

Computing in Science and Engineering | 2011

Keeneland: Bringing Heterogeneous GPU Computing to the Computational Science Community

Jeffrey S. Vetter; Richard Glassbrook; Jack J. Dongarra; Karsten Schwan; Bruce Loftis; Stephen McNally; Jeremy S. Meredith; James H. Rogers; Philip C. Roth; Kyle Spafford; Sudhakar Yalamanchili

The Keeneland projects goal is to develop and deploy an innovative, GPU-based high-performance computing system for the NSF computational science community.

ACM Transactions on Computer Systems | 1991

Dynamic adaptation of real-time software

Thomas E. Bihari; Karsten Schwan

In large, dynamic, real-time computer systems, it is frequently most cost effective to employ different software performance and reliability techniques at different levels of granularity, at different times, or within different subsystems. These techniques may include regulation of redundancy and resource allocation, multiversion and multipath execution, adjustments of program attributes such as time-out periods and others. The management of software in such systems is a difficult task. Software that may be adapted to meet varying performance and reliability requirements offers a solution. A REal-time Software Adaptation System (RESAS) includes a uniform model of adaptable software and provides the tool necessary for programmers to implement algorithms that choose and enact adaptations in real time. RESAS has been implemented on a testbed consisting of a multiprocessor and an attached workstation, and adaptation algorithms have been developed that address the problem of adapting software to achieve two goals: software execution within specified time constraints and software resiliency with respect to computer hardware failures.

Explore More