Wu-chun Feng
Virginia Tech
                                 Network
                            
                            Latest external collaboration on country level. Dive into details by clicking on the dots.
                                 Publication
                            
                            Featured researches published by Wu-chun Feng.
high performance interconnects | 2001
Fabrizio Petrini; Wu-chun Feng; Adolfy Hoisie; Salvador Coll; Eitan Frachtenberg
The Quadrics interconnection network (QsNet) contributes two novel innovations to the field of high-performance interconnects: (I) integration of the virtual-address spaces of individual nodes into a single, global, virtual-address space and (2) network fault tolerance via link-level and end-to-end protocols that can detect faults and automatically re-transmit packets. QsNet achieves these feats by extending the native operating system in the nodes with a network operating system and specialized hardware support in the network interface. As these and other important features of QsNet can be found in the InfiniBand specification, QsNet can be viewed as a precursor to InfiniBand. In this paper, we present an initial performance evaluation of QsNet. We first describe the main hardware and software features of QsNet, followed by the results of benchmarks that we ran on our experimental, Intel-based, Linux cluster built around QsNet. Our initial analysis indicates that QsNet performs remarkably well, e.g., user-level latency under 2 /spl mu/s and bandwidth over 300 MB/s.
international symposium on microarchitecture | 2002
Fabrizio Petrini; Wu-chun Feng; Adolfy Hoisie; Salvador Coll; Eitan Frachtenberg
The Quadrics network extends the native operating system in processing nodes with a network operating system and specialized hardware support in the network interface. Doing so integrates an individual nodes address spaces into a single, global, virtual address space and provides network fault tolerance.
IEEE Network | 2005
Cheng Jin; David X. Wei; Steven H. Low; J. Bunn; Hyojeong Choe; J.C. Doylle; Harvey B Newman; Sylvain Ravot; S. Singh; Fernando Paganini; G. Buhrmaster; L. Cottrell; Olivier Martin; Wu-chun Feng
We describe a variant of TCP, called FAST, that can sustain high throughput and utilization at multigigabits per second over large distances. We present the motivation, review the background theory, summarize key features of FAST TCP, and report our first experimental results.
international conference on parallel processing | 2007
Rong Ge; Xizhou Feng; Wu-chun Feng; Kirk W. Cameron
Performance and power are critical design constraints in todays high-end computing systems. Reducing power consumption without impacting system performance is a challenge for the HPC community. We present a runtime system (CPU MISER) and an integrated performance model for performance-directed, power-aware cluster computing. CPU MISER supports system-wide, application-independent, fine-grain, dynamic voltage and frequency scaling (DVFS) based power management for a generic power-aware cluster. Experimental results show that CPU MISER can achieve as much as 20% energy savings for the NAS parallel benchmarks. In addition to energy savings, CPU MISER is able to constrain performance loss for most applications within user-specified limits. These constraints are achieved through accurate performance modeling and prediction, coupled with advanced control techniques.
international parallel and distributed processing symposium | 2009
Song Huang; Shucai Xiao; Wu-chun Feng
The graphics processing unit (GPU) has emerged as a computational accelerator that dramatically reduces the time to discovery in high-end computing (HEC). However, while todays state-of-the-art GPU can easily reduce the execution time of a parallel code by many orders of magnitude, it arguably comes at the expense of significant power and energy consumption. For example, the NVIDIA GTX 280 video card is rated at 236 watts, which is as much as the rest of a compute node, thus requiring a 500-W power supply. As a consequence, the GPU has been viewed as a “non-green” computing solution. This paper seeks to characterize, and perhaps debunk, the notion of a “power-hungry GPU” via an empirical study of the performance, power, and energy characteristics of GPUs for scientific computing. Specifically, we take an important biological code that runs in a traditional CPU environment and transform and map it to a hybrid CPU+GPU environment. The end result is that our hybrid CPU+GPU environment, hereafter referred to simply as GPU environment, delivers an energy-delay product that is multiple orders of magnitude better than a traditional CPU environment, whether unicore or multicore.
high performance distributed computing | 2010
Heshan Lin; Xiaosong Ma; Jeremy S. Archuleta; Wu-chun Feng; Mark K. Gardner; Zhe Zhang
MapReduce offers an ease-of-use programming paradigm for processing large data sets, making it an attractive model for distributed volunteer computing systems. However, unlike on dedicated resources, where MapReduce has mostly been deployed, such volunteer computing systems have significantly higher rates of node unavailability. Furthermore, nodes are not fully controlled by the MapReduce framework. Consequently, we found the data and task replication scheme adopted by existing MapReduce implementations woefully inadequate for resources with high unavailability. To address this, we propose MOON, short for MapReduce On Opportunistic eNvironments. MOON extends Hadoop, an open-source implementation of MapReduce, with adaptive task and data scheduling algorithms in order to offer reliable MapReduce services on a hybrid resource architecture, where volunteer computing systems are supplemented by a small set of dedicated nodes. Our tests on an emulated volunteer computing system, which uses a 60-node cluster where each node possesses a similar hardware configuration to a typical computer in a student lab, demonstrate that MOON can deliver a three-fold performance improvement to Hadoop in volatile, volunteer computing environments.
conference on high performance computing (supercomputing) | 2000
Wu-chun Feng; Peerapol Tinnakornsrisuphap
Distributed computational grids depend on TCP to ensure reliable end-to-end communication between nodes across the wide-area network (WAN). Unfortunately, TCP performance can be abysmal even when buffers on the end hosts are manually optimized. Recent studies blame the self-similar nature of aggregate network traffic for TCP’s poor performance because such traffic is not readily amenable to statistical multiplexing in the Internet, and hence computational grids. In this paper, we identify a source of self-similarity previously ignored, a source that is readily controllable - TCP. Via an experimental study, we examine the effects of the TCP stack on network traffic using different implementations of TCP. We show that even when aggregate application traffic ought to smooth out as more applications’ traffic are multiplexed, TCP induces burstiness into the aggregate traffic load, thus adversely impacting network performance. Furthermore, our results indicate that TCP performance will worsen as WAN speeds continue to increase.
international parallel and distributed processing symposium | 2001
Fabrizio Petrini; Adolfy Hoisie; Wu-chun Feng; Richard L. Graham
In this paper we present an in-depth description of the Quadrics interconnection network (QsNET) and an experimental performance evaluation on a 64-node AlphaServer cluster. We explore several performance dimensions and scaling properties of the network by using a collection of benchmarks, based on different traffic patterns. Experiments with permutation patterns and uniform traffic are conducted to illustrate the basic characteristics of the interconnect under conditions commonly created by parallel scientific applications. Moreover, the behavior of the QsNET under I/O traffic, and the influence of the placement of the I/O servers are analyzed. The effects of using dedicated I/O nodes or shared I/O nodes are also exposed. In addition, we evaluate how background I/O traffic interferes with other parallel applications running concurrently. The experimental results indicate that the QsNET provides excellent performance in most cases, with excellent contention resolution mechanisms. Some important guidelines for applications and I/O servers mapping on large-scale clusters are also given.
cluster computing and the grid | 2009
Song Huang; Wu-chun Feng
This paper presents an eco-friendly daemon that reduces power and energy consumption while better maintaining high performance via an accurate workload characterization that infers “processor stall cycles due to off-chip activities.” The eco-friendly daemon is an interval-based, run-time algorithm that uses the workload characterization to dynamically adjust a processor’s frequency and voltage to reduce power and energy consumption with little impact on application performance. Using the NAS Parallel Benchmarks as our workload, we then evaluate our eco-friendly daemon on a cluster computer. The results indicate that our workload characterization allows the power-aware daemon to more tightly control performance (5% loss instead of 11%) while delivering substantial energy savings (11% instead of 8%).
IEEE Computer | 1989
Rangachar Kasturi; Rodney Fernandez; Mukesh L. Amlani; Wu-chun Feng
An overview of geographic information systems (GISs) is given, covering data collection, applications, organization, and data models. Recent trends in map data processing are examined, namely, automatic name placement, map generalization, an automatic digitizer and expert system for land-use analysis, a map oriented system for urban planning, and a knowledge-based GIS. Techniques for extracting information from paper-based images are discussed, and some experimental results are given.<<ETX>>
