Meirui Ren | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Meirui Ren is active.

Explore More

Publication

Featured researches published by Meirui Ren.

international parallel and distributed processing symposium | 2012

Parallel Algorithms for Approximate String Matching with k Mismatches on CUDA

Yu Liu; Longjiang Guo; Jinbao Li; Meirui Ren; Keqin Li

Approximate string matching using the k-mismatch technique has been widely applied to many fields such as virus detection and computational biology. The traditional parallel algorithms are all based on multiple processors, which have high costs of computing and communication. GPU has high parallel processing capability, low cost of computing, and less time of communication. To the best of our knowledge, there is no any parallel algorithm for approximate string matching with k mismatches on GPU. With a new parallel programming model based on CUDA, we present three parallel algorithms and their implementations on GPU, namely, the thread parallel algorithm, the block-thread parallel algorithm, and the OPT-block-thread parallel algorithm. The OPT-block thread parallel algorithm can take full advantage of the powerful parallel capability of GPU. Furthermore, it balances the load among the threads and optimizes the execution time with the memory model of GPU. Experimental results show that compared with the traditional sequential algorithm on CPU, our best parallel algorithm on GPU in this paper achieves speedup of 40-80.

Tsinghua Science & Technology | 2012

Distributed aggregation algorithms for mobile sensor networks with group mobility model

Qianqian Ren; Longjiang Guo; Jinghua Zhu; Meirui Ren; Junqing Zhu

In many applications of mobile sensor networks, such as water flow monitoring and disaster rescue, the nodes in the network can move together or separate temporarily. The dynamic network topology makes traditional spanning-tree-based aggregation algorithms invalid in mobile sensor networks. In this paper, we first present a distributed clustering algorithm which divides mobile sensor nodes into several groups, and then propose two distributed aggregation algorithms, Distance-AGG (Aggregation based on Distance), and Probability-AGG (Aggregation based on Probability). Both of these two algorithms conduct an aggregation query in three phases: query dissemination, intra-group aggregation, and inter-group aggregation. These two algorithms are efficient especially in mobile networks. We evaluate the performance of the proposed algorithms in terms of aggregation accuracy, energy efficiency, and query delay through ns-2 simulations. The results show that Distance-AGG and Probability-AGG can obtain higher accuracy with lower transmission and query delay than the existing aggregation algorithms.

Int'l J. of Communications, Network and System Sciences | 2010

A New Scheduling Algorithm for Reducing Data Aggregation Latency in Wireless Sensor Networks

Meirui Ren; Longjiang Guo; Jinbao Li

Existing works on data aggregation in wireless sensor networks (WSNs) usually use a single channel which results in a long latency due to high interference, especially in high-density networks. Therefore, data aggre- gation is a fundamental yet time-consuming task in WSNs. We present an improved algorithm to reduce data aggregation latency. Our algorithm has a latency bound of 16R + Δ – 11, where Δ is the maximum degree and R is the network radius. We prove that our algorithm has smaller latency than the algorithm in [1]. The simulation results show that our algorithm has much better performance in practice than previous works.

networking architecture and storages | 2012

Implementing the Jacobi Algorithm for Solving Eigenvalues of Symmetric Matrices with CUDA

Tao Wang; Longjiang Guo; Guilin Li; Jinbao Li; Renda Wang; Meirui Ren; Jing He

Solving the eigenvalues of matrices is an open problem which is often related to scientific computation. With the increasing of the order of matrices, traditional sequential algorithms are unable to meet the needs for the calculation time. Although people can use cluster systems in a short time to solve the eigenvalues of large-scale matrices, it will bring an increase in equipment costs and power consumption. This paper proposes a parallel algorithm named Jacobi on gpu which is implemented by CUDA (Computer Unified Device Architecture) on GPU (Graphic Process Unit) to solve the eigenvalues of symmetric matrices. In our experimental environment, we have Intel Core i5-760 quad-core CPU, NVIDIA GeForce GTX460 card, and Win7 64-bit operating system. When the size of matrix is 10240×10240, the number of iterations is 10000 times, the speedup ratio is 13.71. As the size of matrices increase, the speedup ratio increases correspondingly. Moreover, as the number of iterations increases, the speedup ratio is very stable. When the size of matrix is 8192×8192, the number of iterations are 1000, 2000, 4000, 8000 and 16000 respectively, the standard deviation of the speedup ratio is 0.1161. The experimental results show that the Jacobi on gpu algorithm can save more running time than traditional sequential algorithms and the speedup ratio is 3.02~13.71. Therefore, the computing time of traditional sequential algorithms to solve the eigenvalues of matrices is reduced significantly.

networking architecture and storages | 2013

Parallel Algorithm for Approximate String Matching with K Differences

Longjiang Guo; Shufang Du; Meirui Ren; Yu Liu; Jinbao Li; Jing He; Ning Tian; Keqin Li

Approximate string matching using the k-difference technique has been widely applied to many fields such as pattern recognition and computational biology. Data dependency exists in the traditional sequential algorithm. Therefore, it is hard to design a parallel algorithm for approximate string matching with k differences. This paper presents a technique to eliminate data dependency. Based on this technique, this paper also presents a parallel algorithm which can calculate the elements in the same row of the edit distance matrix in parallel by eliminating data dependency. The algorithm has high parallelism, but requires synchronization. To validate the proposed algorithm, it is implemented on GPU and multiple-core CPUs. Moreover, the CUDA optimization techniques are also presented in the paper. Finally, experimental results show that, compared with the traditional sequential algorithm on CPU with twenty-four cores, the proposed parallel algorithm achieves speedup of 7-42 on GPU.

wireless algorithms systems and applications | 2015

Rogue Access Point Detection in Vehicular Environments

Hao Qu; Longjiang Guo; Weiping Zhang; Jinbao Li; Meirui Ren

The threat of rogue access points (APs) has attracted significant attentions from both industrial and academic researchers. This paper considers a category of rogue APs that are set up in moving vehicles to lure users. Usually, rogue access points are on the moving vehicles that can keep close distance to the users at all time, thus the adversary has more time to expose private information of users. This paper proposes a practical detecting rogue APs algorithm based on received signal strength (RSS), which calculates the distance between the user and the APs to defend against the rogue APs. The paper also develops a relative position algorithm to find the position of AP. If the position of AP is on the road rather than on the road side, the algorithm suspects that the AP is rogue AP. In this paper, we also have implemented the detection algorithm on real vehicular environments and also have evaluated the performance of the proposed algorithm.

high performance computing and communications | 2013

An Efficient Graph Isomorphism Algorithm Based on Canonical Labeling and Its Parallel Implementation on GPU

Renda Wang; Longjiang Guo; Chunyu Ai; Jinbao Li; Meirui Ren; Keqin Li

The Graph Isomorphism (GI) problem has been extensively studied due to its significant applications. The most effective class of GI algorithms, i.e., canonical labeling algorithms, are suitable for either graphs with high randomness or symmetry, or graphs for which both of them are not strongly held. Also, the core operations of canonical labeling algorithms, i.e., individuation-refinement (IR) and certificate comparison, usually occupy more than 70% of the total running time. How to weaken the limitations of structures and improve the efficiency of these operations are challenges. In this paper, we present an efficient GI algorithm called PEACE, which is particularly suitable for graphs with high randomness or symmetry. We present a parallel implementation of our algorithm on GPUs. We design some new techniques and also use some existing methods to speed up calculations under CUDA. More importantly, these techniques can be applied to all IR-based GI algorithms. We evaluate the proposed algorithm on various graphs to make comprehensive comparison with currently the most efficient canonical labeling algorithms on CPUs. Experimental results show that PEACE is superior to other algorithms on graphs with high symmetry or many automorphisms, and up to 50% performance increase can be achieved in the best case. We also apply our parallel techniques on these algorithms, and compare the performance on CPU and multiple GPU devices. The results show that the techniques make all algorithms gain 15~55 speedup.

international conference on parallel and distributed systems | 2016

Data Dissemination Protocols Based on Opportunistic Sharing for Data Offloading in Mobile Social Networks

Na Jiang; Longjiang Guo; Jinbao Li; Meirui Ren; Sisi Cheng; Xiaodan Guo

Due to the increasing popularity of smart mobile devices, the amount of mobile data communications has led to explosive growth of data traffic in cellular networks. Cellular networks have to face the challenge of huge communication traffic. Offloading data traffic through opportunistic communication among smart mobile devices is a promising solution to partially solve this problem since there is almost no monetary cost for it. Large amount of smart mobile devices can communicate each other using Bluetooth or WIFI Direct in short communication range and they can form an opportunistic mobile social network. The opportunistic communications among smart mobile devices can effectively reduce the amount of cellular data traffic. However, mobile users take a long time to obtain useful data. In order to reduce data communication latency, this paper proposes three data dissemination protocols named RRDP(Request-Reply Dissemination Protocol), RDP(Random Dissemination Protocol) and LDP(LRU Dissemination Protocol) respectively. The three proposed protocols are based on opportunistic sharing policy. Extensive NS-2 simulation results show that (1) on the campus situation, the users access delay of RDP is 56.4% less than the RRDP and LDP is 44.8% less than RRDP. (2) in the vehicular environment, the users access delay of RDP is 32.5% less than the RRDP and LDP is 28.1% less than RRDP. RDP is the best protocol.

wireless algorithms systems and applications | 2014

Implementing the Matrix Inversion by Gauss-Jordan Method with CUDA

Ning Tian; Longjiang Guo; Meirui Ren; Chunyu Ai

Solving the matrix inversion is an open problem which is often related to scientific computation. Moreover, matrix inverse also has wide applications in social networks. Individuals in social networks are described as nodes, and the similarity among nodes are significant for link prediction. Usually, the problem of calculating similarities among nodes is converted to the problem of matrix inversion. With the increasing of the orders of matrices, traditional sequential algorithms are unable to meet the needs for the short calculation time. Although cluster systems can solve the inversion of large-scale matrices efficiently, the equipment cost and power consumption are very high. This paper proposes a parallel algorithm PA-Gauss, which is based on the Gauss-Jordan method of selecting the main element. CUDA (Computer Unified Device Architecture) of GPU (Graphic Process Unit) is used to implement the proposed algorithm to solve inversions of the real and complex matrices. The experimental results show that the Gauss-Jordan algorithm can save more running time than traditional sequential algorithms and the speedup ratio of PA-Gauss for Real Matrices is 633~100435, and the speedup ratio of PA-Gauss for Complex Matrices is 224~36508. Therefore,the computing time of solving the matrix inversions is reduced significantly.

international performance computing and communications conference | 2014

GPU acceleration of finding LPRs in DNA sequence based on SUA index

Shufang Du; Longjiang Guo; Chunyu Ai; Meirui Ren; Hao Qu; Jinbao Li

The repetitions in biological sequence analysis are of great biological significance. Finding the repetitions has been a hot topic in gene projects naturally. In recent years, graphics processing unit (GPU) has been far exceeded the CPU in terms of computing capability and memory bandwidth, especially CUDA dramatically increases in computing performance by harnessing the power of the GPUs. This paper proposes efficient parallel algorithms on CUDA to accelerate finding PTRs which is redefined as LPRs based on the SUA Index. The proposed parallel algorithms have been utilized with the parallel primitives offered by Thrust library and the effective parallel bit compression technology based on division to achieve better acceleration. Optimization techniques include CUDA streams technology are also realized to reduce transmission latency. Experimental results show that the proposed parallel algorithms are faster than the benchmark with 1.6~5.4 speedup.

Explore More