Stephen C. Simms
Indiana University Bloomington
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Stephen C. Simms.
siguccs: user services conference | 2010
Craig A. Stewart; Stephen C. Simms; Beth Plale; Matthew R. Link; David Y. Hancock; Geoffrey C. Fox
Cyberinfrastructure is a word commonly used but lacking a single, precise definition. One recognizes intuitively the analogy with infrastructure, and the use of cyber to refer to thinking or computing -- but what exactly is cyberinfrastructure as opposed to information technology infrastructure? Indiana University has developed one of the more widely cited definitions of cyberinfrastructure. Cyberinfrastructure consists of computing systems, data storage systems, advanced instruments and data repositories, visualization environments, and people, all linked together by software and high performance networks to improve research productivity and enable breakthroughs not otherwise possible. A second definition, more inclusive of scholarship generally and educational activities, has also been published and is useful in describing cyberinfrastructure: Cyberinfrastructure consists of computational systems, data and information management, advanced instruments, visualization environments, and people, all linked together by software and advanced networks to improve scholarly productivity and enable knowledge breakthroughs and discoveries not otherwise possible. In this paper, we describe the origin of the term cyberinfrastructure based on the history of the root word infrastructure, discuss several terms related to cyberinfrastructure, and provide several examples of cyberinfrastructure
ieee international conference on high performance computing data and analytics | 2012
Robert Henschel; Stephen C. Simms; David Y. Hancock; Scott Michael; Tom Johnson; Nathan Heald; Thomas William; Donald K. Berry; Matthew Allen; Richard Knepper; Matt Davy; Matthew R. Link; Craig A. Stewart
As part of the SCinet Research Sandbox at the Supercomputing 2011 conference, Indiana University (IU) demonstrated use of the Lustre high performance parallel file system over a dedicated 100 Gbps wide area network (WAN) spanning more than 3,500 km (2,175 mi). This demonstration functioned as a proof of concept and provided an opportunity to study Lustres performance over a 100 Gbps WAN. To characterize the performance of the network and file system, low level iperf network tests, file system tests with the IOR benchmark, and a suite of real-world applications reading and writing to the file system were run over a latency of 50.5 ms. In this article we describe the configuration and constraints of the demonstration and outline key findings.
Proceedings of the 2007 workshop on Service-oriented computing performance: aspects, issues, and approaches | 2007
Stephen C. Simms; Gregory G. Pike; Scott Teige; Bret Hammond; Yu Ma; Larry L. Simms; C. Westneat; Douglas A. Balog
The Indiana University Data Capacitor is a 535 TB distributed parallel filesystem constructed for short to mid term storage of large research data sets. Spanning multiple, geographically distributed compute, storage, and visualization resources and showing unprecedented performance across the wide area network, the Data Capacitors Lustre filesystem can be used as a powerful tool to accommodate loosely coupled, service oriented computing. In this paper we demonstrate single file/single client write performance from Oak Ridge National Laboratory to Indiana University in excess of 750 MB/s. We evaluate client parameters that will allow widely distributed services to achieve data transfer rates closely matching those of local services. Finally, we outline the tuning strategy used to maximize performance, and present the results of this tuning.
teragrid conference | 2010
Joshua Walgenbach; Stephen C. Simms; Kit Westneat; Justin P. Miller
The Indiana University Data Capacitor wide area Lustre file system provides over 350 TB of short- to mid-term storage of large research data sets. It spans multiple geographically distributed compute, storage, and visualization resources. In order to effectively harness the power of these resources from various institutions, it has been necessary to develop software to keep ownership and permission data consistent across many client mounts. This paper describes the Data Capacitors Lustre WAN service and the history, development, and implementation of IUs UID mapping scheme that enables Lustre WAN on the TeraGrid.
international parallel and distributed processing symposium | 2004
Peng Wang; George Turner; Daniel A. Lauer; Matthew Allen; Stephen C. Simms; David Hart; Mary Papakhian; Craig A. Stewart
Summary form only given. As the first geographically distributed supercomputer on the top 500 list, the AVIDD facility of Indiana University ranked 50/sup th/ in June of 2003. It achieved 1.169 tera-flops running the LINPACK benchmark. Here, our work of improving LINPACK performance is reported, and the impact of math kernel, LINPACK problem size and network tuning is analyzed based on the performance model of LINPACK.
Future Generation Computer Systems | 2013
Michael Kluge; Stephen C. Simms; Thomas William; Robert Henschel; Andy Georgi; Christian Meyer; Matthias S. Mueller; Craig A. Stewart; Wolfgang Wünsch; Wolfgang E. Nagel
Digital instruments and simulations are creating an ever-increasing amount of data. The need for institutions to acquire these data and transfer them for analysis, visualization, and archiving is growing as well. In parallel, networking technology is evolving, but at a much slower rate than our ability to create and store data. Single fiber 100 Gbps networking solutions have recently been deployed as national infrastructure. This article describes our experiences with data movement and video conferencing across a networking testbed, using the first commercially available single fiber 100 Gbps technology. The testbed is unique in its ability to be configured for a total length of 60, 200, or 400 km, allowing for tests with varying network latency. We performed low-level TCP tests and were able to use more than 99.9% of the theoretical available bandwidth with minimal tuning efforts. We used the Lustre file system to simulate how end users would interact with a remote file system over such a high performance link. We were able to use 94.4% of the theoretical available bandwidth with a standard file system benchmark, essentially saturating the wide area network. Finally, we performed tests with H.323 video conferencing hardware and quality of service (QoS) settings, showing that the link can reliably carry a full high-definition stream. Overall, we demonstrated the practicality of 100?Gbps networking and Lustre as excellent tools for data management. Highlights? The need for institutions to acquire and transfer data is growing. ? We tested data transfer on the first commercial single fiber 100?Gbps network. ? We used Lustre to simulate user interaction with a remote file system. ? We were able to use more than 94.4% of the theoretical available bandwidth. ? 100?Gbps networking and Lustre are excellent tools for data management.
high performance distributed computing | 2010
Robert Henschel; Scott Michael; Stephen C. Simms
Astrophysical simulations of protoplanetary disks and gas giant planet formation are being performed with a variety of numerical methods. Some of the codes in use today have been producing scientifically significant results for several years, or even decades. Each must simulate millions of resolution elements for millions of time steps, capture and store output data, and rapidly and efficiently analyze this data. To do this effectively, a parallel code is needed that scales to tens or hundreds of processors. Furthermore, an efficient workflow for the transport, analysis, and interpretation of the output data is needed to achieve scientifically meaningful results. Since such simulations are usually performed on moderate to large parallel systems, the compute system is generally located at a remote institution. However, analysis of results is typically performed interactively, and due to the fact that most supercomputing centers do not offer dedicated interactive nodes, the transfer of simulation output data to local resources becomes necessary. Even if interactive resources were available, typical network latencies make X-forwarded displays nearly impossible to work with. Since data sets can be quite large and traditional transfer mechanisms such as scp and sftp offer relatively low throughput, this transfer of data sets becomes a bottleneck in the research workflow. In this article we measure the scalability of the Computational HYdronamics with MultiplE Radiation Algorithms (CHYMERA) code on the SGI Altix architecture. We find that it scales well up to 64 threads for moderate and large sized problems. We also present a novel approach to enable rapid transfer and analysis of simulation data via the Data Capacitor (DC) and Lustre WAN (Wide Area Network) [17]. The usage of a WAN file system to tie batch system operated compute resources and interactive analysis and visualization resources together is of general interest and can be applied broadly.
conference on high performance computing (supercomputing) | 2006
Stephen C. Simms; Matt Davy; Bret Hammond; Matthew R. Link; Craig A. Stewart; Randall Bramley; Beth Plale; Dennis Gannon; Mu-Hyun Baik; Scott Teige; John C. Huffman; Rick McMullen; Doug Balog; Greg Pike
Indiana University provides powerful compute, storage, and network resources to a diverse local and national research community every day. IUs facilities have been used to support data-intensive applications ranging from digital humanities to computational biology.For this years bandwidth challenge, several IU researchers will conduct experiments from the exhibit floor utilizing the resources that University Information Technology Services currently provides.Using IUs newly constructed 535 TB Data Capacitor and an additional component installed on the exhibit floor, we will use Lustre across the wide area network to simultaneously facilitate dynamic weather modeling, protein analysis, instrument data capture, and the production, storage, and analysis of simulation data.
international workshop on data intensive distributed computing | 2012
Scott Michael; Liang Zhen; Robert Henschel; Stephen C. Simms; Eric Barton; Matthew R. Link
As part of the SCinet Research Sandbox at the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC11), Indiana University utilized a dedicated 100 Gbps wide area network (WAN) link spanning more than 3,500 km (2,175 mi) to demonstrate the capabilities of the Lustre high performance parallel file system in a high bandwidth, high latency WAN environment. This demonstration functioned as a proof of concept and provided an opportunity to study Lustres performance over a 100 Gbps WAN. To characterize the performance of the network and file system, a series of benchmarks and tests were undertaken. These included low level iperf network tests, Lustre networking (LNET) tests, file system tests with the IOR benchmark, and a suite of real-world applications reading and writing to the file system. All of the benchmarks were run over a the WAN link with a latency of 50.5 ms. In this article, we describe the configuration and constraints of the demonstration, and focus on the key findings made regarding the Lustre networking layer for this extremely high bandwidth, high latency connection. Of particular interest is the relationship between the peer_credits and max_rpcs_in_flight settings when considering LNET performance.
teragrid conference | 2010
Scott Michael; Stephen C. Simms; W. B. Breckenridge Iii; Roger Smith; Matthew R. Link
In this article we explore the utility of a centralized filesystem provided by the TeraGrid to both TeraGrid and non-TeraGrid sites. We highlight several common cases in which such a filesystem would be useful in obtaining scientific insight. We present results from a test case using Indiana Universitys Data Capacitor over the wide area network as a central filesystem for simulation data generated at multiple TeraGrid sites and analyzed at Mississippi State University. Statistical analysis of the I/O patterns and rates, via detailed trace records generated with VampirTrace, for both the Data Capacitor and a local Lustre filesystem are provided. The benefits of a centralized filesystem and potential hurdles in adopting such a system for both TeraGrid and non-TeraGrid sites are discussed.