D. Martin Swany | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where D. Martin Swany is active.

Explore More

Publication

Featured researches published by D. Martin Swany.

international conference on service oriented computing | 2005

PerfSONAR: a service oriented architecture for multi-domain network monitoring

Andreas Hanemann; Jeff W. Boote; Eric L. Boyd; Jérôme Durand; Loukik Kudarimoti; Roman Łapacz; D. Martin Swany; Szymon Trocha; Jason Zurawski

In the area of network monitoring a lot of tools are already available to measure a variety of metrics. However, these tools are often limited to a single administrative domain so that no established methodology for the monitoring of network connections spanning over multiple domains currently exists. In addition, these tools only monitor the network from a technical point of view without providing meaningful network performance indicators for different user groups. These indicators should be derived from the measured basic metrics. In this paper a Service Oriented Architecture is presented which is able to perform multi-domain measurements without being limited to specific kinds of metrics. A Service Oriented Architecture has been chosen as it allows for increased flexibility and scalability in comparison to traditional software engineering techniques. The resulting measurement framework will be applied for measurements in the European Research Network (GEANT) and connected National Research and Education Networks in Europe as well as in the United States.

conference on high performance computing (supercomputing) | 2002

Multivariate Resource Performance Forecasting in the Network Weather Service

D. Martin Swany; Richard Wolski

This paper describes a new technique in the Network Weather Service for producing multi-variate forecasts. The new technique uses the NWS’s univariate forecasters and emprically gathered Cumulative Distribution Functions (CDFs) to make predictions from correlated measurement streams. Experimental results are shown in which throughput is predicted for long TCP/IP transfers from short NWS network probes.

conference on high performance computing (supercomputing) | 2005

Transformations to Parallel Codes for Communication-Computation Overlap

Anthony Danalis; Ki-Yong Kim; Lori L. Pollock; D. Martin Swany

This paper presents program transformations directed toward improving communication-computation overlap in parallel programs that use MPI’s collective operations. Our transformations target a wide variety of applications focusing on scientific codes with computation loops that exhibit limited dependence among iterations. We include guidance for developers for transforming an application code in order to exploit the communicationcomputation overlap available in the underlying cluster, as well as a discussion of the performance improvements achieved by our transformations. We present results from a detailed study of the effect of the problem and message size, level of communication-computation overlap, and amount of communication aggregation on runtime performance in a cluster environment based on an RDMA-enabled network. The targets of our study are two scientific codes written by domain scientists, but the applicability of our work extends far beyond the scope of these two applications.

Future Generation Computer Systems | 2003

The internet backplane protocol: a study in resource sharing

Alessandro Bassi; Micah Beck; Terry Moore; James S. Plank; D. Martin Swany; Richard Wolski; Graham E. Fagg

In this work we present the Internet Backplane Protocol (IBP), a middleware created to allow the sharing of storage resources, implemented as part of the network fabric. IBP allows an application to control intermediate data staging operations explicitly. As IBP follows a very simple philosophy, very similar to the Internet Protocol, and the resulting semantic might be too weak for some applications, we introduce the exNode, a data structure that aggregates storage allocations on the Internet.

workshop on parallel and distributed simulation | 2005

Distributed Worm Simulation with a Realistic Internet Model

Songjie Wei; Jelena Mirkovic; D. Martin Swany

Internet worm spread is a phenomenon involving millions of hosts, who interact in complex and diverse environment. Scanning speed of each infected host depends on its resources and the defenses at work in its network. Aggressive worms further interact with the underlying Internet topology - the dynamics of the spread is constrained by the limited bandwidth of network links, and high-volume scan traffic leads to BGP router failure thus affecting global routing. Worm traffic also interacts with legitimate background traffic competing for (and often winning) the limited bandwidth resources. To faithfully simulate worm spread and other Internet-wide events such as DDoS, flash crowds and spam we need a detailed Internet model, a packet-level simulation of relevant event features, and a realistic model of background traffic on the whole Internet. The memory and CPU requirements of such simulation exceed a single machines resources, creating a need for distributed simulation. We propose a design and present implementation of a distributed worm simulator, called PAWS. PAWS runs on Emulab testbed, which facilitates its use by other researchers. We validate PAWS in a variety of scenarios, and evaluate costs and benefits of distributed worm simulation.

cluster computing and the grid | 2002

Representing Dynamic Performance Information in Grid Environments with the Network Weather Service

D. Martin Swany; Richard Wolski

In this paper, we discuss requirements for integrating dynamic performance information from the Network Weather Service (NWS) into the Grid Information Service infrastructure (GIS). We describe the object model that NWS uses internally and provide some rationale for its structure. Finally, we present the NWS s implementation of a caching LDAP daemon that integrates NWS information into the reference GIS -the Glob s MDS.

cluster computing and the grid | 2002

The Internet Backplane Protocol: A Study in Resource Sharing

Alessandro Bassi; Micah Beck; Graham E. Fagg; Terry Moore; James S. Plank; D. Martin Swany; Richard Wolski

international conference on supercomputing | 2009

MPI-aware compiler optimizations for improving communication-computation overlap

Anthony Danalis; Lori L. Pollock; D. Martin Swany; John Cavazos

Several existing compiler transformations can help improve communication-computation overlap in MPI applications. However, traditional compilers treat calls to the MPI library as a black box with unknown side effects and thus miss potential optimizations. This papers contributions enable the development of an MPI-aware optimizing compiler that can perform transformations exploiting knowledge of MPI call effects to increase communication-computa-tion overlap. We formulate a set of data flow equations and rules to describe the side effects of key MPI functions so an MPI-aware compiler can automatically assess the safety of transformations. After categorizing existing compiler transformations based on their effect on the application code, we present an optimization algorithm that specifies when and how to apply these optimizing transformations to achieve improved communication-computation overlap. By manually applying the optimization algorithm to kernels extracted from HYCOM and the NAS benchmarks, we show that even when transforming these highly optimized codes, execution time can be decreased by an average of over 30%.

international conference on cluster computing | 2011

CULZSS: LZSS Lossless Data Compression on CUDA

Adnan Ozsoy; D. Martin Swany

Increasing needs in efficient storage management and better utilization of network bandwidth with less data transfer have led the computing community to consider data compression as a solution. However, compression introduces extra overhead and performance can suffer. The key elements in making the decision to use compression are execution time and compression ratio. Due to negative performance impact, compression is often neglected. General purpose computing on graphic processing units (GPUs) introduces new opportunities where parallelism is available. Our work targets the use of opportunities in GPU based systems by exploiting parallelism in compression algorithms. In this paper we present an implementation of the Lempel-Ziv-Storer-Szymanski (LZSS) loss less data compression algorithm by using NVIDIA GPUs Compute Unified Device Architecture (CUDA) Framework. Our implementation of the LZSS algorithm on GPUs significantly improves the performance of the compression process compared to CPU based implementation without any loss in compression ratio. This can support GPU based clusters in solving application bandwidth problems. Our system outperforms the serial CPU LZSS implementation by up to 18x, the parallel threaded version up to 3x and the BZIP2 program by up to 6x in terms of compression time, showing the promise of CUDA systems in loss less data compression. To give the programmers an easy to use tool, our work also provides an API for in memory compression without the need for reading from and writing to files, in addition to the version involving I/O.

international conference on e-science | 2012

Efficient data transfer protocols for big data

Brian Tierney; Ezra Kissel; D. Martin Swany; Eric Pouyoul

Data set sizes are growing exponentially, so it is important to use data movement protocols that are the most efficient available. Most data movement tools today rely on TCP over sockets, which limits flows to around 20Gbps on todays hardware. RDMA over Converged Ethernet (RoCE) is a promising new technology for high-performance network data movement with minimal CPU impact over circuit-based infrastructures. We compare the performance of TCP, UDP, UDT, and RoCE over high latency 10Gbps and 40Gbps network paths, and show that RoCE-based data transfers can fill a 40Gbps path using much less CPU than other protocols. We also show that the Linux zero-copy system calls can improve TCP performance considerably, especially on current Intel “Sandy Bridge”-based PCI Express 3.0 (Gen3) hosts.

Explore More