Qiuming Luo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Qiuming Luo is active.

Explore More

Publication

Featured researches published by Qiuming Luo.

parallel and distributed computing: applications and technologies | 2010

A Novel Model and a Simulation Tool for Churn of P2P Network

Qiuming Luo; Yun Li; Wentao Dong; Gang Liu; Rui Mao

The prior studies setup the churn model by measuring the historical logs or records of a P2P network, and treat it as one whole black-box without understanding the inside of peer’s population. The metrics used to characterize the churn is distributions of the node session lengths and arrival intervals. We investigate churn in a higher level point of view, and find that modeling it based on the global geographical distribution of peer nodes will result in a system which explain the fluctuation and cyclic phenomenon of network size. This model considers the user behavior pattern into account. Then we provide a Matlab tools that can provide churn events according to this model. From the output events of simulation, we do see some more future things than other models. We might expect or predict when and what nodes would return back, as well as when and what nodes would disappear at high possibility. So it is useful when designing a system optimized both to the pass and the future, which could reduce the overhead of the maintenance of underlying overlay network of DHTs and lower the redundant level of replications for P2P storage system.

international symposium on parallel architectures algorithms and programming | 2014

Characteristic Analysis of Operating Systems for Large Scale Hierarchical NUMA System

Qiuming Luo; Yuanyuan Zhou; Mei Wang; Ye Cai

LH-NUMAs are essentially clusters of NUMA nodes with a globally shared memory abstraction, either in hardware or in software. This Abstract architecture just like expensive mainframe, while keep the price low as commercial clusters of servers. It is a prospective optional major architecture for cloud computing in todays era of big data. This article studies the special needs for OS requested by LH-NUMA. And we push the OS design principles of Hive, Fos, Multikernel, etc., a little further by embracing the characteristics of LH-NUMA. The contribution includes: 1) Analyzing the architectural of LH-NUMA, a distinctive architecture located between cluster and mainframe. 2) Analyzing the challenges to the OS for LH-NUMA according to the characteristics of LH-NUMA. 3) Analyzing the inefficiency of running current OSes for many-core system on LH-NUMA. 4) Try to apply the Hive, Fos, Multikernel and other design principles of many-core OS to LH-NUMA and analyzing the advantages and insufficiency for LH-NUMA. 5) Listing the requirements and picturing the framework principle for OS on LH-NUMA.

parallel and distributed computing: applications and technologies | 2012

Quantitatively Measuring the Memory Locality Leakage on NUMA Systems Based on Instruction-Based-Sampling

Qiuming Luo; Chengjian Liu; Chang Kong; Ye Cai

Sustaining the memory locality is critical for obtaining high performance in NUMA system. But how to identify a locality leakage problem and how to measure the leakage is still open issue. This paper provides an algorithm to quantitatively measure the locality leakage based on the memory trace produced by IBS (Instruction-Based-Sampling). A “perfect matrix” PM is generated from virtual memory address trace, which represents the highest locality pattern. A “communication matrix” CM is obtained from physical memory address trace to describe the actual memory access pattern. The penalty factors are calculated from PM or CM with considering of the hardware NUMA factor. The leakage is measured by the difference between the penalty factors of PM and the penalty factors of CM, which can be used to estimate the performance decrease and guide the optimization. Some experiment results are show to testify the effectiveness and accuracy of our quantitative measurement.

network and parallel computing | 2012

MAP-numa: Access Patterns Used to Characterize the NUMA Memory Access Optimization Techniques and Algorithms

Qiuming Luo; Chengjian Liu; Chang Kong; Ye Cai

Some typical memory access patterns are provided and programmed in C, which can be used as benchmark to characterize the various techniques and algorithms aim to improve the performance of NUMA memory access. These access patterns, called MAP-numa (Memory Access Patterns for NUMA), currently include three classes, whose working data sets are corresponding to 1-dimension array, 2-dimension matrix and 3-dimension cube. It is dedicated for NUMA memory access optimization other than measuring the memory bandwidth and latency. MAP-numa is an alternative to those exist benchmarks such as STREAM, pChase, etc. It is used to verify the optimizations’ (made automatically/manually to source code/executive binary) capacities by investigating what locality leakage can be remedied. Some experiment results are shown, which give an example of using MAP-numa to evaluate some optimizations based on Oprofile sampling.

parallel and distributed computing: applications and technologies | 2011

Performance Evaluation of OpenMP Constructs and Kernel Benchmarks on a Loongson-3A Quad-Core SMP System

Qiuming Luo; Chang Kong; Ye Cai; Gang Liu

As a competitor and alternative to mainstream general-purpose CPU (Intel/AMD/etc.), Loongson is a family of general-purpose MIPS-compatible CPUs developed at the ICT of CAS in China. The quad-core Loongson 3A is evaluated in this paper. The performance of the basic OpenMP constructs on Loongson-3A quad-core SMP is obtained by applying the EPCC Micro benchmarks. And then the performance of NAS kernel codes is obtained by applying NAS Parallel Benchmarks (NPB). These benchmarking are carried out for three different OpenMP compilers (and the runtime system), which includes GCC, OMPipth (OMPi with pthread library) and OMPi-psth (OMPi with psthread library). The results show that OMPI-pths performance is the best and OMPi-psths performance is the worst. Those test results might help to program the OpenMP codes as well as to select the appropriate compiler and its runtime system. And an Intel core i5 quad-core platform is used for comparison purpose, by running NPB, which implies that Loongson 3As performance is nearly one tenth of i5s. The NPB results can help to defining a Loongson systems scale when replacing an Intel i5 system for a given problem size.

international symposium on distributed computing | 2010

A Churn Model Based on the Global Geographical Distribution of Nodes

Qiuming Luo; Yun Li; Wentao Dong; Xiaohui Lin

The dynamics of peer participation, or churn, is critical for design, implementation and evaluation of Peer-to-peer (P2P) systems. The metrics used to characterize the churn is distributions of the node session lengths and arrival intervals. The prior studies setup the model by measuring the historical logs or records, and treat the churn as one whole black-box without understanding the inside of peer’s population. We investigate churn in another point of view, and find that modeling it based on the global geographical distribution of peer nodes will result in a system which behave in the way compliant to the measurement taken by previous works. By this model, we do see some more future things than other models. We might expect or predict when and what nodes would return back, as well as when and what nodes would disappear at high possibility. So it is useful when designing a system optimized both to the pass and the future, which could reduce the overhead of the maintenance of underlying overlay network of DHTs and lower the redundant level of replications for P2P storage system.

Computers in Industry | 2018

WebGlusterFS: A web-based administration tool for GlusterFS with resource assignment for various storage demands

Qiuming Luo; Cuiping Zhu; Gang Liu; Rui Mao

Abstract Facing the complex tasks involves making decisions about assignment of workloads to storage backends as well as dynamic and timely adjustment according to the storage demands in Cloud and Big-data environment, an administration tool for GlusterFS, WebGlusterFS, is presented in this article to ease the management and help to assign the storage resource. WebGlusterFS is a web-based tool designed to substitute the command line console manager of GlusterFS and provides an interface for auto-assignment module to build volumes from heterogeneous backend devices. A simple demo module is also implemented to show how various storage demands are fulfilled by building the volumes from properly matched storage resource with minimum cost. The characteristics of underlying storage resource are obtained by benchmarking and used to make the assignment decision. WebGlusterFS setups a base framework for workload aware storage platform for large scale computing environments.

International Symposium on Parallel Architecture, Algorithm and Programming | 2017

Porting Referential Genome Compression Tool on Loongson Platform

Zheng Du; Chao Guo; Yijun Zhang; Qiuming Luo

With the fast development of genome sequencing technology, genome sequencing become faster and affordable. Consequently, genomic scientists are now facing an explosive increase of genomic data. Managing, storing and analyzing this quickly growing amount of data is challenging. It is desirable to apply some compression techniques to reduce storage and transferring cost. Referential genome compression is one of these techniques, which exploited the highly similarity of the same or an evolutionary close species (e.g., two randomly selected humans have at least 99% of genetic similarity) and store only the differences between the compressed file and well-known reference genome sequence. In this paper, we port two referential compression algorithm to Loongson platform and profiling their performance. And we use multi-process technology to improve the speed of compression.

International Symposium on Parallel Architecture, Algorithm and Programming | 2017

A Cost-Effective Wide-Sense Nonblocking k-Fold Multicast Network

Gang Liu; Qiuming Luo; Cunhuang Ye; Rui Mao

Multicast is one of the most dense communication patterns. Any destination node of a k-fold multicast network can be involved in up to k simultaneous multicast connection. The hardware cost of traditional k-fold switching network for wide-sense nonblocking multicast is typically very high. In this paper, we propose a new wide-sense nonblocking k-fold multicast network and multicast routing algorithm. The k-fold design has significantly lower network cost than that of k copies of 1-fold multicast networks. The time complexity of the corresponding routing algorithm is no higher than that of previous works.

high performance computing and communications | 2016

Compression and De-calcification for Memcached

Qiuming Luo; Yijun Zhang; Chao Guo; Jie Liu

Memcached is an in-memory key-value caching system, which is used to resolve the principal contradiction of the disk-based database between the CPU and input/output, has been widely used as an effective way to solve the distance and improve the capacity of the source server. We optimize the performance of Memcached memory access through two ways: improve the density of memory storage and ameliorate the usage of memory. In this paper, we first enhanced Memcached with data compression. Then, we analyze Memcached calcification problem and discuss the influence of the data compression functionality on this problem. At last, we solve the slab selection problem of Memcached memory reclamation with data compression. Experiment results show that to achieve the same hit rate, only 70% of memory is needed with data compression and our de-calcification method for Memcached with data compression increased hit rate of 10%.

Explore More