Yuxuan Xing
National University of Defense Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yuxuan Xing.
international performance computing and communications conference | 2015
Songping Yu; Nong Xiao; Mingzhu Deng; Yuxuan Xing; Fang Liu; Zhiping Cai; Wei Chen
The non-volatile memory (NVM) has the illustrious merits of byte-addressability, fast speed, persistency and low power consumption, which make it attractive to be used as main memory. Commonly, user process dynamically acquires memory through memory allocators. However, traditional memory allocators designed with in-place data writes are not appropriate for non-volatile main memory (NVRAM) due to the limited endurance. For instance, the number of write operations is merely 108 times per PCM cell. In this paper, we quantitatively analyze the wear-oblivious of DRAM-oriented designed allocator-glibc malloc and the inefficiency of wear-conscious allocator-NVMalloc. For example, the average imbalance factor (the maximum/the average) of memory allocation is about 7.5 and 3, respectively. Based on our observations, we propose WAlloc, an efficient wear-aware manual memory allocator designed for NVRAM, decouples metadata and data, uses Less Allocated First Out allocation policy and redirects the data writes. Experimental results show that the wear-leveling of WAlloc outperforms that of NVMalloc about 30% and 60% under random workloads and well-distributed workloads, respectively. In addition, considering the trade-off between space and wear-leveling, WAlloc reduces average data memory writes in 64 bytes block by average 1.5X comparing with malloc with almost 8% extra space overhead.
international conference on big data and cloud computing | 2014
Xiaoquan Wu; Nong Xiao; Fang Liu; Zhiguang Chen; Yimo Du; Yuxuan Xing
Flash memory-based SSD RAID has an excellent I/O performance with high stability, which making it get more and more attention from companies and manufacturers, especially in I/O-intensive environments. However, frequently updating parity also makes the SSD have a higher overhead in the process of garbage collection. To this end, we propose RAID-Aware SSD (RA-SSD) that could distinguish user data from parity by detecting the different access patterns from the upper RAID layer, and store them separately at different flash blocks. RA-SSD could effectively reduce the overhead of garbage collection. Simulation results show that, being deployed in a RAID-5 System, RA-SSD could reduce the number of pages copied in the process of garbage collection by up to 10%. As the overhead of garbage collection decreases, the write performance and lifespan will be improved. The extra space consumed by RA-SSD is very small, it is only about 1/10000 of the capacity of the device. Moreover, the processing logic of RA-SSD is so simple that it has very little impact on read performance.
networking architecture and storages | 2017
Songping Yu; Nong Xiao; Mingzhu Deng; Yuxuan Xing; Fang Liu; Wei Chen
As the expected emerging Non-Volatile Memory (NVM) technologies, such as 3DXPoint, are in production, there has been a recent push in the big data processing community from storage-centric towards memory-centric. Generally, in large-scale systems, distributed memory management through traditional network with TCP/IP protocol exposes performance bottleneck. Briefly, CPU- centric network involves context switching, memory copy etc. Remote Direct Memory Access (RDMA) technology reveals the tremendous performance advantage over than TCP/IP: Allowing access to remote memory directly bypassing OS kernel. In this paper, we propose Megalloc, a distributed NVM allocator exposes NVMs as a shared address space of a cluster of machines based-on RDMA. Firstly, it makes memory allocation metadata accessed directly by each machine, allocating NVM in coarse-grained way; secondly, adopting fine-grained memory chunk for applications to read or store data; finally, it guarantees high distributed memory allocation performance.
International Symposium on Parallel Architecture, Algorithm and Programming | 2017
Hongbo Li; Yuxuan Xing; Nong Xiao; Zhiguang Chen; Yutong Lu
With large-scale data exploding so quickly that the traditional big data processing framework Hadoop has met its bottleneck on data storing layer. Running Hadoop on modern HPC clusters has attracted much attention due to its unique data processing and analyzing capabilities. Lustre file system is a promising parallel storage file system occupied HPC file system market for many years. Thus, Lustre-based Hadoop platform will pose many new opportunities and challenges on today’s data era. In this paper, we customized our LustreFileSystem class which inherits from FileSystem class (inner Hadoop source code) to build our Lustre-based Hadoop. And to make full use of the high-performance in Lustre file system, we propose a novel dynamic stripe strategy to optimize stripe size during writing data to Lustre file system. Our results indicate that, we can improve the performance obviously in throughput (mb/sec) about 3x in writing and 11x in reading, and average IO rate (mb/sec) at least 3 times at the same time when compared with initial Hadoop. Besides, our dynamic stripe strategy can smooth the reading operation and give a slight improvement on writing procedure when compared with existing Lustre-based Hadoop.
International Conference on Security, Privacy and Anonymity in Computation, Communication and Storage | 2017
Yuxuan Xing; Siqi Gao; Nong Xiao; Fang Liu; Wei Chen
With the rapid development of technologies such as cloud computing, the increasingly popularity of social network and other Internet applications, the data scale that human can access is growing at an unprecedented rate. Recently, technological changes associated with big data are hot in academy and industrial, and it’s meaningful to dig out the potential information in massive data. Many real-world problems can be represented as graphs, such as supply chain analysis, genealogy, web graphs, etc. Large graphs demand efficiently processing technologies to derive valuable knowledge and many graph processing engines have been developed. This paper first introduces concepts of graphs and categories of graph processing engine on a single machine. Thereafter, it focuses on analyzing and summarizing current researches about key techniques on graph processing, including data structure, parallel programing, and partitioning strategies. Finally, current research work about graph processing engine on a single machine is summarized and further research directions are pointed out.
International Conference on Security, Privacy and Anonymity in Computation, Communication and Storage | 2017
Yuxuan Xing; Nong Xiao; Yutong Lu; Ronghua Li; Songping Yu; Siqi Gao
The k-truss is a type of cohesive subgraphs proposed for the analysis of massive network. Existing in-memory algorithms for computing k-truss are inefficient for searching and parallel. We propose a novel traversal algorithm for truss decomposition: it effectively reduces computation complexity, we fully exploit the parallelism thanks to the optimization, and overlap IO and computation for a better performance. Our experiments on real datasets verify that it is 2x–5x faster than the exiting fastest in-memory algorithm.
International Conference on Security, Privacy and Anonymity in Computation, Communication and Storage | 2017
Songping Yu; Mingzhu Deng; Yuxuan Xing; Nong Xiao; Fang Liu; Wei Chen
Remote Direct Memory Access (RDMA) provides the ability to direct access remote user space memory without remote CPU’s involvement, shortening the network latency tremendously; in addition, a new generation of fast Non-Volatile Memory (NVM) technologies, such as 3D XPoint, is in production, and its property has the promise to access-speed like memory and durability-like storage. So, Remote access Non-Volatile Main Memory is reasonable. Traditional local memory extension is bounded by slow storage media (HDD/SSD). In this paper, first, we revisit local memory extension and propose a new memory extension model, Pyramid, extending memory with remote NVM; then, discussing the mechanism of remote data consistency, which can be delivered with RDMA operation of write-with-immediate in Pyramid; besides, we evaluate the performance of random access to remote NVM and manifest the performance opportunity brought by remote accessible NVM through comparing it with new technologies of storage-NVMe-SSD and PCM-based SSD. Finally, we argue that Pyramid promises memory scalability with good performance guarantee.
high performance computing and communications | 2016
Zhengguo Chen; Nong Xiao; Fang Liu; Zhiguang Chen; Wei Chen; Yuxuan Xing
The Ultra-DIMM constituted by DRAM and Flash memory is a promising solution used to tackle the challenges existing in traditional DRAM in terms of energy consuming and scalability. In this hybrid memory system, DRAM is used as the data buffer of Flash memory due to the performance and endurance gaps between main memory and Flash. However, the inconsistency of access granularity between the main memory and Flash makes the DRAM-based buffer complex. On the one hand, the basic access unit of Flash is page. However, buffering pages of Flash in DRAM without distinguishing the hot cache lines from cold cache lines within each page leads to a waste of cache capacity. On the other hand, general-purpose replacement schemes focus on high hit rate and do not consider the peculiarities of Flash, thus leads to performance and lifespan overhead. In this paper, we propose TBuffer, an additional buffer in DRAM enhanced by history-aware identification and LazyFlush. History-aware identification can increase hit rate by evicting cold cache lines and keeping more hot cache lines in DRAM, while LazyFlush can further improve performance and lifespan by delaying flushing dirty objects and reducing writes to Flash. We evaluate the TBuffer via trace-driven simulations. Experimental results have shown that it outperforms other existing schemes, increases hit rate by up to 12%, reduces the access latency by a factor of up to 50.8% with an average of 19.7%, and achieves 16.6% lifespan improvement on average.
high performance computing and communications | 2016
Yuxuan Xing; Ya Feng; Songping Yu; Zhengguo Chen; Fang Liu; Nong Xiao
Solid state disks (SSDs) become more and more popular in personal devices and data centers. Flash chips can be packaged in Hard disk drive (HDD) form factors and provide the same interface as HDDs. This character makes SSDs easily replace HDDs in existing storage systems. PCIe-based SSD can provide a higher I/O performance, but it is still a little expensive. This paper studies the feasibility of Redundant Arrays of Independent SSDs (RAIS) with different filesystems. We comprehensively analyze the performance of RAIS constructed by SATA SSD and PCIe SSD individually. We investigate different RAIS configurations (RAIS0, 5, 6) and filesystems under various I/O access patterns. Finally, we illustrate our serval key findings and recommendations for building RAIS.
high performance computing and communications | 2016
Yuxuan Xing; Ya Feng; Songping Yu; Zhengguo Chen; Fang Liu; Nong Xiao
Large graphs analytics has been an important aspect of many big data applications, such as web search, social networks and recommendation systems. Many research focuses on processing large scale graphs using distributed system over past few years. And numbers of studies turn to construct graph processing system on a single server-class machine in consideration of cost, usability and maintainability. HPGraph is a high parallel graph processing system which adopts the edge-centric model, our contributions are as follows: (1) designing an efficient data allocation and access strategy for NUMA machine, and providing tasks scheduling to keep load balance, (2) raising a fine-grained edge-block filtering mechanism to avoid accessing unnecessary edge data, (3) constructing a high-speed flash array as the second storage. We made a detailed evaluation on a 16-core machine using asset of popular real word and synthetic data sets, and the results show that HPGraph always outperforms the state-of-the-art single machine graph processing systems-GridGraph. And HPGraph can achieve 1.27X faster than GridGraph for specific application. Our source code is available at https://github.com/xinghuan1990/HPGraph.