Runqun Xiong
Southeast University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Runqun Xiong.
ieee/acm international symposium cluster, cloud and grid computing | 2011
Jiahui Jin; Junzhou Luo; Aibo Song; Fang Dong; Runqun Xiong
Large scale data processing is increasingly common in cloud computing systems like MapReduce, Hadoop, and Dryad in recent years. In these systems, files are split into many small blocks and all blocks are replicated over several servers. To process files efficiently, each job is divided into many tasks and each task is allocated to a server to deals with a file block. Because network bandwidth is a scarce resource in these systems, enhancing task data locality(placing tasks on servers that contain their input blocks) is crucial for the job completion time. Although there have been many approaches on improving data locality, most of them either are greedy and ignore global optimization, or suffer from high computation complexity. To address these problems, we propose a heuristic task scheduling algorithm called Balance-Reduce(BAR), in which an initial task allocation will be produced at first, then the job completion time can be reduced gradually by tuning the initial task allocation. By taking a global view, BAR can adjust data locality dynamically according to network state and cluster workload. The simulation results show that BAR is able to deal with large problem instances in a few seconds and outperforms previous related algorithms in term of the job completion time.
Cluster Computing | 2015
Runqun Xiong; Junzhou Luo; Fang Dong
Data placement decision of Hadoop distributed file system (HDFS) is very important for the data locality which is a primary criterion for task scheduling of MapReduce model and eventually affects the application performance. The existing HDFS’s rack-aware data placement strategy and replication scheme are work well with MapReduce framework in homogeneous Hadoop clusters, but in practice, such data placement policy can noticeably reduce MapReduce performance and may cause increasingly energy dissipation in heterogeneous environments. Besides that, HDFS employs an inflexible replica factor acquiescently for each data block, which will give rise to unnecessary waste of storage space when there is a lot of inactive data in Hadoop system. In this paper, we propose a novel data placement strategy (SLDP) for heterogeneous Hadoop clusters. SLDP adopts a heterogeneity aware algorithm to divide various nodes into several virtual storage tiers (VSTs) firstly, and then places data blocks across nodes in each VST circuitously according to the hotness of data. Furthermore, SLDP uses a hotness proportional replication to save disk space and also has an effective power control function. Experimental results on two real data-intensive applications show that SLDP is energy-efficient, space-saving and able to improve MapReduce performance in a heterogeneous Hadoop cluster significantly.
international conference on advanced cloud and big data | 2014
Runqun Xiong; Junzhou Luo; Fang Dong
Since ad hoc networks do not have fixed or predefined infrastructures, nodes need to frequently flood control messages to discover and maintain routes, which causes performance problems in terms of unnecessary traffic and energy consumption, contention, and collision. A general solution is to construct a virtual backbone as the basis of routing and broadcasting, and the Connected Dominating Set (CDS) has been widely used. This paper gives a distributed approach to constructing k-Hop CDS with three unique characteristics: (1) the limitation on the range of k has been removed, (2) a token-based conflict avoidance mechanism has been introduced which can make the construction process faster and more effective, (3) a tree-type CDS can be constructed in bottom-up processing. Simulation experiments have been conducted to demonstrate the effectiveness of the proposed approach.An Image Signal Processor (ISP) converts raw imaging sensor data into a format appropriate for further processing and human inspection. This work explores FPGA-based ISP designs considering specialized and programmable implementations and proposes an ISP using a programmable generic processing unit with comparable performance versus the dedicated implementations.Hadoop as a popular open-source implementation of MapReduce is widely used for large scale data-intensive applications like data mining, web indexing and scientific computing. The current Hadoop implementation assumes that nodes in a cluster are homogeneous in nature, and Hadoop distributed file system(HDFS) distributes data to multiple nodes based on disk space availability. Such data placement strategy is very efficient for homogeneous environments, where nodes are identical in terms of both computing power and disk capacity. Unfortunately, in practice, the homogeneity assumptions do not always hold. Hadoops scheduler will lead to severe performance degradation and energy dissipation in heterogeneous environments by using default data placement strategy of HDFS. In this paper, we propose a novel snakelike data placement mechanism (SLDP) for large-scale heterogeneous Hadoop cluster. SLDP adopts a heterogeneity aware algorithm to divide various nodes into several virtual storage tiers(VST) firstly, and then places data blocks across nodes in each VST circuitously according to the hotness of data. Furthermore, SLDP uses a hotness proportional replication to reduce disk space consumption and also has an effective power control function. Experimental results on two real data-intensive applications show that SLDP is energy-efficient, space-saving and able to improve MapReduce performance in a heterogeneous Hadoop cluster significantly.
international conference on advanced cloud and big data | 2017
Yunhao Li; Jiahui Jin; Runqun Xiong; Junzhou Luo
The generalized suffix tree (GST) is a tree structure that is widely used by string-based applications such as DNA sequence pattern search, data compression and time series analysis. It can efficiently accelerate the string operations like matching approximate strings and finding the longest common substring. In the big data era, the applications processing large-scale strings (e.g., genomic sequences) are common, so it is important to design scalable approaches for constructing massive GSTs. In this paper, we introduce a distributed approach for constructing GSTs on top of Apache Spark, a general-purpose big data processing system. The framework of our approach is based on the Elastic Range Algorithm (ERA), a state-of-the-art GST construction algorithm. Different from the original ERA, our approach optimizes the structure of GST and the subtree construction, which greatly reduces the memory requirement of GST construction and storage. In addition, we propose serval optimization techniques to speed up our approach. Our experimental results show that our approach can index billion-symbol strings within 5 minutes on a 8-worker Spark cluster. Moreover, the optimization techniques get about 5x speedup on the overall indexing time.
international conference on advanced cloud and big data | 2017
Yao Du; Runqun Xiong; Jiahui Jin; Junzhou Luo
In the big data era, more and more enterprises use Hadoop distributed file system (HDFS) to provide the function of managing and storing big data for upper applications. However, the default three replicas strategy of HDFS brings tremendous storage cost for a data center, as the volume of big data is increasing especially when the cold data is also growing. On the other hand, in the heterogeneous Hadoop clusters, the rack-aware data placement of HDFS ignores the differences of each node, data blocks of high reliability requirements may be placed on the nodes with poor reliability, which may cause that the reliability of the data cannot be guaranteed effectively. In order to solve the above problems, this paper presents a data placement theoretical model and designs a double sort exchange algorithm (DSEC) to guarantee the reliability of cold data and lowers storage cost. Specifically, for the cold data based on erasure code, the algorithm uses the information of nodes which contributes to selecting the result set. Then, through double-sorting the result set and the remaining set, the elements of two sets are exchanged until finding the lowest cost and satisfying reliability. Finally, we make some experiments which shows that DSEC can guarantee the reliability, but also has the lowest storage cost compared with other data placement strategies.
computer supported cooperative work in design | 2017
Huanhuan Zhang; Fang Dong; Dian Shen; Runqun Xiong; Jiahui Jin
Diagnosing faults in virtual networks is always a popular research area. Existing researches primarily focus on diagnosing faults in physical networks, while they could not identify the faults introduced by virtual networks. Besides, the high complexity of algorithms and the requirement for modifying hardware may limit their scope of use. To address these drawbacks, in this paper, we propose a novel approach to diagnose faults in virtual networks. The rational of our approach is that the faults can be identified when located in the packet traces, with the knowledge that the possible known faults that can happen in that location. To achieve this goal, we apply packet marking, fault injection and machine learning techniques to provide precise fault diagnosis. Experimental results show that our approach can efficiently identify 73% of the faults while for virtual network-specific faults, our approach can diagnose 86% of them. Our system can also support real-time or near real-time fault analysis.
Journal of Physics: Conference Series | 2017
Junzhou Luo; Jinghui Zhang; Fang Dong; Aibo Song; Runqun Xiong; J. Y. Shi; Feiqiao Huang; Renli Shi; Zijian Liu; V. Choutko; Alexander Egorov; Alexandre Eline
Southeast University (SEU) Science Operation Centre (SOC) is one of the computing centres of the Alpha Magnetic Spectrometer (AMS-02) experiment. It provides 2016 CPU cores for AMS Monte Carlo production and a dedicated ~1Gbps Long Fat Network (LFN) for AMS data transmission between SEU and CERN. In this paper, the development and deployment of SEU SOCs automated Monte Carlo production management system is discussed in detail. Data transmission optimizations are further introduced in order to speed up the data transfer in LFN between SEU SOC and CERN. In addition, monitoring tool for SEU SOCs Monte Carlo production is also presented.
Journal of Physics: Conference Series | 2017
Runqun Xiong; R.L. Shi; F.Q. Huang; B.S. Shan; V. Choutko; A. Egorov; A. Eline; O. Demakov; Jinghui Zhang; Fang Dong; Junzhou Luo
Monte Carlo (MC) simulation production plays an important role in physics analysis of the Alpha Magnetic Spectrometer (AMS-02) experiment. To facilitate the metadata retrieving for data analysis among millions of database records, we developed a monitoring tool to analyse and visualize the production status and progress. In this paper, we discuss the workflow of the monitoring tool and present its features and technical details.
international conference on advanced cloud and big data | 2016
Pengcheng Zhou; Fang Dong; Zhuqing Xu; Junxue Zhang; Runqun Xiong; Junzhou Luo
With the rapid development of information technology, enormous volumes of data is being generated by many enterprises at all times. A reasonable storage of these large scale data to reduce cost and achieve internal data sharing and collaboration has always been a challenge for enterprises. Cloud storage technology, as an important branch in the field of cloud computing, is becoming a trend to solve this problem. At present, there have been some works on cloud storage, but most of them focus on individual users. Even if there are a few enterprise level products, they still cannot meet the actual needs of enterprises. To address these problems, in this paper, we propose a flexible enterprise-oriented cloud storage system (ECStor) based on GlusterFS. ECStor provides a directory-level fine-grained access control solution, so that storage space can be allocated flexibly according to different role of users or IP addresses. Meanwhile, we achieved a load balancer to distribute user load to NFS servers on different storage nodes, and also implemented a friendly management system interface based on B/S architecture. Finally, we provide an easy-to-use client-side software on Windows platform. We deployed ECStor to the data center, and evaluated the performance of LIST and load balance and found that the latency of LIST acted out no significant losses compared with the original GlusterFS, and achieved a good load balance.
international conference on parallel processing | 2011
Runqun Xiong; Junzhou Luo; Aibo Song; Bo Liu; Fang Dong