Wenji Wu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wenji Wu is active.

Explore More

Publication

Featured researches published by Wenji Wu.

IEEE Communications Letters | 2011

Why Can Some Advanced Ethernet NICs Cause Packet Reordering

Wenji Wu; Phil DeMar; Matt Crawford

The Intel Ethernet Flow Director is an advanced network interface card (NIC) technology. It provides the benefits of parallel receive processing in multiprocessing environments and can automatically steer incoming network data to the same core on which its application process resides. However, our analysis and experiments show that Flow Director can cause packet reordering in multiprocessing environments. In this paper, we use a simplified model to analyze why Flow Director can cause packet reordering. Our experiments verify our analysis.

Computer Networks | 2009

Sorting Reordered Packets with Interrupt Coalescing

Wenji Wu; Phil DeMar; Matt Crawford

TCP performs poorly in networks with serious packet reordering. Processing reordered packets in the TCP-layer is costly and inefficient, involving interaction of the sender and receiver. Motivated by the interrupt coalescing mechanism that delivers packets upward for protocol processing in blocks, we propose a new strategy, Sorting Reordered Packets with Interrupt Coalescing (SRPIC), to reduce packet reordering in the receiver. SRPIC works in the network device driver; it makes use of the interrupt coalescing mechanism to sort the reordered packets belonging to the same TCP stream in a block of packets before delivering them upward; each sorted block is internally ordered. Experiments have proven the effectiveness of SRPIC against forward path reordering.

IEEE Transactions on Parallel and Distributed Systems | 2012

A Transport-Friendly NIC for Multicore/Multiprocessor Systems

Wenji Wu; Phil DeMar; Matt Crawford

Receive side scaling (RSS) is an NIC technology that provides the benefits of parallel receive processing in multiprocessing environments. However, RSS lacks a critical data steering mechanism that would automatically steer incoming network data to the same core on which its application thread resides. This absence causes inefficient cache usage if an application thread is not running on the core on which RSS has scheduled the received traffic to be processed and results in degraded performance. To remedy the RSS limitation, Intels Ethernet Flow Director technology has been introduced. However, our analysis shows that Flow Director can cause significant packet reordering. Packet reordering causes various negative impacts in high-speed networks. We propose an NIC data steering mechanism to remedy the RSS and Flow Director limitations. This data steering mechanism is mainly targeted at TCP. We term an NIC with such a data steering mechanism “A Transport-Friendly NIC” (A-TFN). Experimental results have proven the effectiveness of A-TFN in accelerating TCP/IP performance.

local computer networks | 2011

G-NetMon: A GPU-accelerated network performance monitoring system for large scale scientific collaborations

Wenji Wu; Phil DeMar; Donald J. Holmgren; Amitoj Singh; R. Pordes

We have prototyped a GPU-accelerated network performance monitoring system, called G-NetMon, to support large-scale scientific collaborations at Fermilab. Our system exploits the data parallelism that exists within network flow data to provide fast analysis of bulk data movement between Fermilab and collaboration sites. Experiments demonstrate that our G-NetMon can rapidly detect sub-optimal bulk data movements.

Future Generation Computer Systems | 2018

mdtmFTP and its evaluation on ESNET SDN testbed

Liang Zhang; Wenji Wu; Phil DeMar; Eric Pouyoul

To address the high-performance challenges of data transfer in the big data era, we are developing and implementing mdtmFTP: a high-performance data transfer tool for big data. mdtmFTP has four salient features. First, it adopts an I/O centric architecture to execute data transfer tasks. Second, it more efficiently utilizes the underlying multicore platform through optimized thread scheduling. Third, it implements a large virtual file mechanism to address the lots-of-small-files (LOSF) problem. Finally, mdtmFTP integrates multiple optimization mechanisms, includingzero copy, asynchronous I/O, pipelining, batch processing, and pre-allocated buffer poolsto enhance performance. mdtmFTP has been extensively tested and evaluated within the ESNET 100G testbed. Evaluations show that mdtmFTP can achieve higher performance than existing data transfer tools, such as GridFTP, FDT, and BBCP. An I/O centric architecture is proposed to execute data transfer tasks.Optimized thread scheduling makes more efficient utilization of the underlying multicore platform.A large virtual file mechanism is implemented to address the lots-of-small-files (LOSF) problem.Multiple optimization mechanisms, includingzero copy, asynchronous I/O, pipelining, batch processing, and pre-allocated buffer poolsare integrated to enhance performance.

internet measurement conference | 2014

WireCAP: a novel packet capture engine for commodity NICs in high-speed networks

Wenji Wu; Phil DeMar

Packet capture is an essential function for many network applications. However, packet drop is a major problem with packet capture in high-speed networks. This paper presents WireCAP, a novel packet capture engine for commodity network interface cards (NICs) in high-speed networks. WireCAP provides lossless zero-copy packet capture and delivery services by exploiting multi-queue NICs and multicore architectures. WireCAP introduces two new mechanisms-the ring-buffer-pool mechanism and the buddy-group-based offloading mechanism-to address the packet drop problem of packet capture in high-speed network. WireCAP is efficient. It also facilitates the design and operation of a user-space packet-processing application. Experiments have demonstrated that WireCAP achieves better packet capture performance when compared to existing packet capture engines. In addition, WireCAP implements a packet transmit function that allows captured packets to be forwarded, potentially after the packets are modified or inspected in flight. Therefore, WireCAP can be used to support middlebox-type applications. Thus, at a high level, WireCAP provides a new packet I/O framework for commodity NICs in high-speed networks.

international conference on computer communications | 2013

A GPU-accelerated network traffic monitoring and analysis system

Wenji Wu; Phil DeMar

Data center networks are evolving toward the use of 40GE between access and aggregation layers, and 100GE at the core layer. With such high data rates, network traffic monitoring and analysis applications, particularly those involved in traffic scrutiny on a per-packet basis, require both enormous raw compute power and high I/O throughput. Many monitoring and analysis tools are facing extreme performance and scalability challenges as 40GE/100GE network environments emerge. Recently, GPU technology has been applied to accelerate general purpose scientific and engineering computing. The GPU architecture fits well with the features of packet-based network monitoring and analysis applications. At Fermilab, we have prototyped a GPU-accelerated architecture for network traffic capturing, monitoring, and analyzing. With a single Nvidia M2070 GPU, our system can handle 11 million+ packets per second without packet drops. In this paper, we will describe our architectural approach in developing a generic GPU-assisted packet capture and analysis capability.

Journal of Physics: Conference Series | 2011

An analysis of bulk data movement patterns in large-scale scientific collaborations

Wenji Wu; P DeMar; A Bobyshev

Large-scale research efforts such as LHC experiments, ITER, and climate modelling are built upon large, globally distributed collaborations. For reasons of scalability and agility and to make effective use of existing computing resources, data processing and analysis for these projects is based on distributed computing models. Such projects thus depend on predictable and efficient bulk data movement between collaboration sites. However, the available computing and networking resources to different collaboration sites vary greatly. Large collaboration sites (such as Fermilab, CERN) have created data centres comprising hundreds, and even thousands, of computation nodes to develop massively scaled, highly distributed cluster-computing platforms. These sites are usually well connected to outside worlds with high-speed networks with bandwidth greater than 10Gbps. On the other hand, some small collaboration sites have limited computing resources or poor networking connectivity. Therefore, the bulk data movements across collaboration sites vary greatly. Fermilab is the US-CMS Tier-1 Centre and the main data centre for a few other large-scale research collaborations. Scientific traffic (e.g., CMS) dominates the traffic volumes in both inbound and outbound directions of Fermilab off-site traffic. Fermilab has deployed a Flow- based network traffic collection and analysis system to monitor and analyze the status and patterns of bulk data movement between the Laboratory and its collaboration sites. In this paper, we discuss the current status and patterns of bulk data movement between Fermilab and its collaboration sites.

Journal of Network and Computer Applications | 2018

AmoebaNet: An SDN-enabled network service for big data science

Syed Asif Raza Shah; Wenji Wu; Qiming Lu; Liang Zhang; Sajith Sasidharan; Phil DeMar; Chin Guok; John MacAuley; Eric Pouyoul; Jin Kim; Seo-Young Noh

Author(s): Shah, SAR; Wu, W; Lu, Q; Zhang, L; Sasidharan, S; DeMar, P; Guok, C; Macauley, J; Pouyoul, E; Kim, J; Noh, SY | Abstract:

local computer networks | 2017

MDTM: Optimizing Data Transfer Using Multicore-Aware I/O Scheduling

Liang Zhang; Phil DeMar; Bockjoo Kim; Wenji Wu

Bulk data transfer is facing significant challenges in the coming era of big data. There are multiple performance bottlenecks along the end-to-end path from the source to destination storage system. The limitations of current generation data transfer tools themselves can have a significant impact on end-to-end data transfer rates. In this paper, we identify the issues that lead to underperformance of these tools, and present a new data transfer tool with an innovative I/O scheduler called MDTM. The MDTM scheduler exploits underlying multicore layouts to optimize throughput by reducing delay and contention for I/O reading and writing operations. With our evaluations, we show how MDTM successfully avoids NUMA-based congestion and significantly improves end-to-end data transfer rates across high-speed wide area networks.

Explore More