Thomas J. Hacker | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Thomas J. Hacker is active.

Explore More

Publication

Featured researches published by Thomas J. Hacker.

international parallel and distributed processing symposium | 2002

The end-to-end performance effects of parallel TCP sockets on a lossy wide-area network

Thomas J. Hacker; Brian D. Athey; Brian D. Noble

This paper examines the effects of using parallel TCP flows to improve end-to-end network performance for distributed data intensive applications. A series of transmission experiments were conducted over a wide-area network to assess how parallel flows improve throughput, and to understand the number of flows necessary to improve throughput while avoiding congestion. An empirical throughput expression for parallel flows based on experimental data is presented, and guidelines for the use of parallel flows are discussed.

international conference on computer communications | 2004

Improving throughput and maintaining fairness using parallel TCP

Thomas J. Hacker; Brian D. Noble; Brian D. Athey

Applications that require good network performance often use parallel TCP streams and TCP modifications to improve the effectiveness of TCP. If the network bottleneck is fully utilized, this approach boosts throughput by unfairly stealing bandwidth from competing TCP streams. Improving the effectiveness of TCP is easy, but improving effectiveness while maintaining fairness is difficult. In this paper, we describe an approach we implemented that uses a long virtual round trip time in combination with parallel TCP streams to improve effectiveness on underutilized networks. Our approach prioritizes fairness at the expense of effectiveness when the network is fully utilized. We compared our approach with standard parallel TCP over a wide-area network, and found that our approach preserves effectiveness and is fairer to competing traffic than standard parallel TCP.

Journal of Parallel and Distributed Computing | 2009

An analysis of clustered failures on large supercomputing systems

Thomas J. Hacker; Fabian Romero; Christopher D. Carothers

Large supercomputers are built today using thousands of commodity components, and suffer from poor reliability due to frequent component failures. The characteristics of failure observed on large-scale systems differ from smaller scale systems studied in the past. One striking difference is that system events are clustered temporally and spatially, which complicates failure analysis and application design. Developing a clear understanding of failures for large-scale systems is a critical step in building more reliable systems and applications that can better tolerate and recover from failures. In this paper, we analyze the event logs of two large IBM Blue Gene systems, statistically characterize system failures, present a model for predicting the probability of node failure, and assess the effects of differing rates of failure on job failures for large-scale systems. The work presented in this paper will be useful for developers and designers seeking to deploy efficient and reliable petascale systems.

conference on high performance computing (supercomputing) | 2002

The Effects of Systemic Packet Loss on Aggregate TCP Flows

Thomas J. Hacker; Brian D. Noble; Brian D. Athey

The use of parallel TCP connections to increase throughput for bulk transfers is common practice within the high performance computing community. However, the effectiveness, fairness, and efficiency of data transfers across parallel connections is unclear. This paper considers the impact of systemic non-congestion related packet loss on the effectiveness, fairness, and efficiency of parallel TCP transmissions. The results indicate that parallel connections are effective at increasing aggregate throughput, and increase the overall efficiency of the network bottleneck. In the presence of congestion related losses, parallel flows steal bandwidth from other single stream flows. A simple modification is presented that reduces the fairness problems when congestion is present, but retains effectiveness and efficiency.

grid computing | 2001

A Methodology for Account Management in Grid Computing Environments

Thomas J. Hacker; Brian D. Athey

A national infrastructure of Grid computing environments will provide access for a large pool of users to a large number of distributed computing resources. Providing access for the complete pool of potential users would put an unacceptably large administrative burden on sites that participate in the Grid. Current approaches to solve this problem require an account for each user at a site, or maps all users into one account. This paper proposes an alternative approach to account allocation that provides the benefits of persistent accounts while minimizing the administrative burden on Grid resource providers. A technique for calculating the upper bound on the number of jobs and users offered to the system from the Grid that is based on historical use is presented. Finally, application of this approach to the National Institutes of Health Visible Human Project is described.

Computing in Science and Engineering | 2011

The NEEShub Cyberinfrastructure for Earthquake Engineering

Thomas J. Hacker; Rudi Eigenmann; Saurabh Bagchi; Ayhan Irfanoglu; Santiago Pujol; Ann Christine Catlin; Ellen M. Rathje

The US Network for Earthquake Engineering Simulation (NEES) operates a shared network of civil engineering experimental facilities aimed at facilitating research on mitigating earthquake damage and loss of life. The NEEShub gateway was created in response to the NEES communitys needs, combining data, simulation, and analysis functionality with collaboration tools.

high performance distributed computing | 2005

Adaptive data block scheduling for parallel TCP streams

Thomas J. Hacker; Brian D. Noble; Brian D. Athey

Applications that use parallel TCP streams to increase throughput must multiplex and demultiplex data blocks over a set of TCP streams transmitting on one or more network paths. When applications use the obvious round robin scheduling algorithm for multiplexing data blocks, differences in transmission rate between individual TCP streams can lead to significant data block reordering. This forces the demultiplexing receiver to buffer out-of-order data blocks, consuming memory and potentially causing the receiving application to stall. This paper describes a new adaptive weighted scheduling approach for multiplexing data blocks over a set of parallel TCP streams. Our new scheduling approach, compared with the scheduling approached used by GridFTP, reduces reordering of data blocks between individual TCP streams, maintains the aggregate throughput gains of parallel TCP, consumes less receiver memory for buffering out-of-order packets, and delivers smoother application goodput. We demonstrate the improved characteristics of our new scheduling approach using data transmission experiments over real and emulated wide-area networks.

ieee international conference on high performance computing data and analytics | 2011

Flexible resource allocation for reliable virtual cluster computing systems

Thomas J. Hacker; Kanak Mahadik

Virtualization and cloud computing technologies now make it possible to create scalable and reliable virtual high performance computing clusters. Integrating these technologies, however, is complicated by fundamental and inherent differences in the way in which these systems allocate resources to computational tasks. Cloud computing systems immediately allocate available resources or deny requests. In contrast, parallel computing systems route all requests through a queue for future resource allocation. This divergence of allocation policies hinders efforts to implement efficient, responsive, and reliable virtual clusters. In this paper, we present a continuum of four scheduling polices along with an analytical resource prediction model for each policy to estimate the level of resources needed to operate an efficient, responsive, and reliable virtual cluster system. We show that it is possible to estimate the size of the virtual cluster system needed to provide a predictable grade of service for a realistic high performance computing workload and estimate the queue wait time for a partial or full resource allocation. Moreover, we show that it is possible to provide a reliable virtual cluster system using a limited pool of spare resources. The models and results we present are useful for cloud computing providers seeking to operate efficient and cost-effective virtual cluster systems.

Journal of Structural Engineering-asce | 2013

Advancing Earthquake Engineering Research through Cyberinfrastructure

Thomas J. Hacker; Rudolf Eigenmann; Ellen M. Rathje

AbstractThis paper describes the cyberinfrastructure (CI) of the George E. Brown, Jr. Network for Earthquake Engineering Simulation (NEES) and examines the evidence that this infrastructure is facilitating earthquake engineering research. Among the key features of the CI are the NEES Project Warehouse (PW), which is a data repository for earthquake engineering, an environment that supports the use of tools for web-based data analysis and simulation, and tools that support research collaboration. The value that such CI offers to the user community is discussed. The CI also gathers a myriad of usage statistics, some of which are presented in this paper. Among them are the number of users, pageviews, recorded NEES projects, and other stored resources. This information demonstrates that the CI is used significantly and increasingly so.

International Journal of Big Data Intelligence | 2014

A new approach for accurate distributed cluster analysis for Big Data: competitive K-Means

Rui Máximo Esteves; Thomas J. Hacker; Chunming Rong

The tremendous growth in data volumes has created a need for new tools and algorithms to quickly analyse large datasets. Cluster analysis techniques, such as K-Means can be distributed across several machines. The accuracy of K-Means depends on the selection of seed centroids during initialisation. K-Means++ improves on the K-Means seeder, but suffers from problems when it is applied to large datasets. In this paper, we describe a new algorithm and a MapReduce implementation we developed that addresses these problems. We compared the performance with three existing algorithms and found that our algorithm improves cluster analysis accuracy and decreases variance. Our results show that our new algorithm produced a speedup of 76 ± 9 times compared with the serial K-Means++ and is as fast as the streaming K-Means. Our work provides a method to select a good initial seeding in less time, facilitating fast accurate cluster analysis over large datasets.

Explore More