Lipeng Wan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lipeng Wan is active.

Explore More

Publication

Featured researches published by Lipeng Wan.

ieee conference on mass storage systems and technologies | 2014

SSD-Optimized Workload Placement with Adaptive Learning and Classification in HPC Environments

Lipeng Wan; Zheng Lu; Qing Cao; Feiyi Wang; H Sarp Oral; Bradley W. Settlemyer

In recent years, non-volatile memory devices such as SSD drives have emerged as a viable storage solution due to their increasing capacity and decreasing cost. Due to the unique capability and capacity requirements in large scale HPC (High Performance Computing) storage environment, a hybrid configuration (SSD and HDD) may represent one of the most available and balanced solutions considering the cost and performance. Under this setting, effective data placement as well as movement with controlled overhead become a pressing challenge. In this paper, we propose an integrated object placement and movement framework and adaptive learning algorithms to address these issues. Specifically, we present a method that shuffle data objects across storage tiers to optimize the data access performance. The method also integrates an adaptive learning algorithm where realtime classification is employed to predict the popularity of data object accesses, so that they can be placed on, or migrate between SSD or HDD drives in the most efficient manner. We discuss preliminary results based on this approach using a simulator we developed to show that the proposed methods can dynamically adapt storage placements and access pattern as workloads evolve to achieve the best system level performance such as throughput.

Archive | 2014

A Report on Simulation-Driven Reliability and Failure Analysis of Large-Scale Storage Systems

Lipeng Wan; Feiyi Wang; H Sarp Oral; Sudharshan S. Vazhkudai; Qing Cao

High-performance computing (HPC) storage systems provide data availability and reliability using various hardware and software fault tolerance techniques. Usually, reliability and availability are calculated at the subsystem or component level using limited metrics such as, mean time to failure (MTTF) or mean time to data loss (MTTDL). This often means settling on simple and disconnected failure models (such as exponential failure rate) to achieve tractable and close-formed solutions. However, such models have been shown to be insufficient in assessing end-to-end storage system reliability and availability. We propose a generic simulation framework aimed at analyzing the reliability and availability of storage systems at scale, and investigating what-if scenarios. The framework is designed for an end-to-end storage system, accommodating the various components and subsystems, their interconnections, failure patterns and propagation, and performs dependency analysis to capture a wide-range of failure cases. We evaluate the framework against a large-scale storage system that is in production and analyze its failure projections toward and beyond the end of lifecycle. We also examine the potential operational impact by studying how different types of components affect the overall system reliability and availability, and present the preliminary results

modeling, analysis, and simulation on computer and telecommunication systems | 2013

Towards Instruction Level Record and Replay of Sensor Network Applications

Lipeng Wan; Qing Cao

Debugging wireless sensor network (WSN) applications has been complicated for multiple reasons, among which the lack of visibility is one of the most challenging. To address this issue, in this paper, we present a systematic approach to record and replay WSN applications at the granularity of instructions. This approach differs from previous ones in that it is purely software based, therefore, no additional hardware component is needed. Our key idea is to combine the static, structural information of the assembly-level code with their dynamic, run-time traces as measured by timestamps and basic block counters, so that we can faithfully infer and replay the actual execution paths of applications at instruction level in a post-mortem manner. The evaluation results show that this approach is feasible despite of the resource constraints of sensor nodes. We also provide two case studies to demonstrate that our instruction level record-and-replay approach can be used to: (1) discover randomness of EEPROM writing time, (2) localize stack smashing bugs in sensor network applications.

ieee international conference on high performance computing data and analytics | 2015

A practical approach to reconciling availability, performance, and capacity in provisioning extreme-scale storage systems

Lipeng Wan; Feiyi Wang; H Sarp Oral; Devesh Tiwari; Sudharshan S. Vazhkudai; Qing Cao

The increasing data demands from high-performance computing applications significantly accelerate the capacity, capability and reliability requirements of storage systems. As systems scale, component failures and repair times increase, significantly impacting data availability. A wide array of decision points must be balanced in designing such systems. We propose a systematic approach that balances and optimizes both initial and continuous spare provisioning based on a detailed investigation of the anatomy and field failure data analysis of extreme-scale storage systems. We consider the component failure characteristics and its cost and impact at the system level simultaneously. We build a tool to evaluate different provisioning schemes, and the results demonstrate that our optimized provisioning can reduce the duration of data unavailability by as much as 52% under a fixed budget. We also observe that non-disk components have much higher failure rates than disks, and warrant careful considerations in the overall provisioning process.

Archive | 2014

System Architecture and Operating Systems

Yanjun Yao; Lipeng Wan; Qing Cao

The emergence of resource constrained embedded systems such as sensor networks have introduced unique challenges for the design and implementation of operating systems. In OS designs for these systems, only partial functionality is required compared to conventional ones, as their code is running on a much more restricted and homogeneous platform. In fact, as illustrated by microcontrollers, most hardware platforms in wireless sensor networks (WSNs) simply do not have the required resources to support a full-fledged operating system. Instead, operating systems for WSNs should adapt to their unique properties, which motivate the design and development of a range of unique operating systems for WSNs in recent years. In this chapter, we systematically survey these operating systems, compare them in their unique designs, and provide our insights on their strengths and weaknesses. We hope that such an approach is helpful for the reader to get a clear view of recent developments of wireless sensor network operating systems.

Journal of Parallel and Distributed Computing | 2017

Optimizing checkpoint data placement with guaranteed burst buffer endurance in large-scale hierarchical storage systems

Lipeng Wan; Qing Cao; Feiyi Wang; Sarp Oral

Non-volatile devices, such as SSDs, will be an integral part of the deepening storage hierarchy on large-scale HPC systems. These devices can be on the compute nodes as part of a distributed burst buffer service or they can be external. Wherever they are located in the hierarchy, one critical design issue is the SSD endurance under the write-heavy workloads, such as the checkpoint I/O for scientific applications. For these environments, it is widely assumed that checkpoint operations can occur once every 60 min and for each checkpoint step as much as half of the system memory can be written out. Unfortunately, for large-scale HPC applications, the burst buffer SSDs can be worn out much more quickly given the extensive amount of data written at every checkpoint step. One possible solution is to control the amount of data written by reducing the checkpoint frequency. However, a direct effect caused by reduced checkpoint frequency is the increased vulnerability window of system failures and therefore potentially wasted computation time, especially for large-scale compute jobs.In this paper, we propose a new checkpoint placement optimization model which collaboratively utilizes both the burst buffer and the parallel file system to store the checkpoints, with design goals of maximizing computation efficiency while guaranteeing the SSD endurance requirements. Moreover, we present an adaptive algorithm which can dynamically adjust the checkpoint placement based on the systems dynamic runtime characteristics and continuously optimize the burst buffer utilization. The evaluation results show that by using our adaptive checkpoint placement algorithm we can guarantee the burst buffer endurance with at most 5% performance degradation per application and less than 3% for the entire system. A thorough analysis of both failure patterns and runtime characteristics of HPC systems.A new checkpoint placement model for optimizing large-scale hierarchical storage systems usage.A novel adaptive algorithm that can dynamically optimize the checkpoint placement.

advances in geographic information systems | 2014

Towards approximate spatial queries for large-scale vehicle networks

Lipeng Wan; Zhibo Wang; Zheng Lu; Hairong Qi; Wenjun Zhou; Qing Cao

With advances in vehicle-to-vehicle communication, future vehicles will have access to a communication channel through which messages can be sent and received when two get close to each other. This enabling technology makes it possible for authenticated users to send queries to those vehicles of interest, such as those that are located within a geographic region, over multiple hops for various application goals. However, a naive method that requires flooding the queries to each active vehicle in a region will incur a total communication overhead that is proportional to the size of the area and the density of vehicles. In this paper, we study the problem of spatial queries for vehicle networks by investigating probabilistic methods, where we only try to obtain approximate estimates within desired confidence intervals using only sublinear overheads. We consider this to be particularly useful when spatial query results can be made approximate or not precise, as is the case with many potential applications. The proposed method has been tested on snapshots from real world vehicle network traces.

Pervasive and Mobile Computing | 2017

Optimizing the performance of sensor network programs through estimation-based code profiling

Lipeng Wan; Qing Cao; Wenjun Zhou

Abstract With the development of sensor technology and embedded systems, building large-scale, low-cost sensor networks, which is a critical step to facilitating the application of pervasive sensing in the future, becomes possible. One of the major challenges in developing sensor network applications is to improve the execution efficiency of programs running on power-constrained embedded devices. While profiling-guided code optimization has been widely used as a compiler-level optimization technique for improving the performance of programs running on general-purpose computers, it has not been applied to sensor network programs due to some defects. In this paper, we overcome these defects and design a more effective profiling-guided code placement approach for sensor network programs. Specifically, we model the execution of sensor network programs taking nondeterministic inputs as discrete-time Markov processes, and propose a novel approach named Code Tomography to estimate parameters of the Markov models that reflect sensor network programs’ dynamic execution behaviors by only using end-to-end timing information measured at the start and end points of each procedure in the source code. The parameters estimated by Code Tomography are fed back to compilers to optimize the code placement. The evaluation results demonstrate that Code Tomography can achieve satisfactory estimation accuracy with low profiling overhead and the branch misprediction rate can be reduced after reorganizing the code placement based on the profiling results. Besides, Code Tomography can also be useful for purposes such as post-mortem analysis, debugging and energy profiling of sensor network programs.

IEEE Sensors Journal | 2014