Guoren Wang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guoren Wang is active.

Explore More

Publication

Featured researches published by Guoren Wang.

ACM Transactions on Database Systems | 2011

Efficient similarity joins for near-duplicate detection

Chuan Xiao; Wei Wang; Xuemin Lin; Jeffrey Xu Yu; Guoren Wang

With the increasing amount of data and the need to integrate data from multiple data sources, one of the challenging issues is to identify near-duplicate records efficiently. In this article, we focus on efficient algorithms to find a pair of records such that their similarities are no less than a given threshold. Several existing algorithms rely on the prefix filtering principle to avoid computing similarity values for all possible pairs of records. We propose new filtering techniques by exploiting the token ordering information; they are integrated into the existing methods and drastically reduce the candidate sizes and hence improve the efficiency. We have also studied the implementation of our proposed algorithm in stand-alone and RDBMS-based settings. Experimental results show our proposed algorithms can outperform previous algorithms on several real datasets.

Neurocomputing | 2008

A protein secondary structure prediction framework based on the Extreme Learning Machine

Guoren Wang; Yi Zhao; Di Wang

In this paper we propose an Extreme Learning Machine (ELM) based protein secondary structure prediction framework which can provide good performance at extremely fast speed. To achieve better performance, in this framework: (i) the three secondary structures are independently predicted by a binary ELM classifier first; (ii) a probability based combination (PBC) method is then proposed to combine these binary prediction results into the expected three-classification results and (iii) a helix postprocessing (HPP) method is finally proposed to further improve the overall performance of the framework based on biological features. Experiments conducted on the real data sets CB513 and RS126 demonstrate that our algorithm can achieve as good prediction accuracy as other popular methods; however, at very fast learning speed.

Neurocomputing | 2011

An OS-ELM based distributed ensemble classification framework in P2P networks

Yongjiao Sun; Ye Yuan; Guoren Wang

Abstract Although classification in centralized environments has been widely studied in recent years, it is still an important research problem for classification in P2P networks due to the popularity of P2P computing environments. The main target of classification in P2P networks is how to efficiently decrease prediction error with small network overhead. In this paper, we propose an OS-ELM based ensemble classification framework for distributed classification in a hierarchical P2P network. In the framework, we apply the incremental learning principle of OS-ELM to the hierarchical P2P network to generate an ensemble classifier. There are two kinds of implementation methods of the ensemble classifier in the P2P network, one-by-one ensemble classification and parallel ensemble classification. Furthermore, we propose a data space coverage based peer selection approach to reduce high the communication cost and large delay. We also design a two-layer index structure to efficiently support peer selection. A peer creates a local Quad-tree to index its local data and a super-peer creates a global Quad-tree to summarize its local indexes. Extensive experimental studies verify the efficiency and effectiveness of the proposed algorithms.

IEEE Transactions on Knowledge and Data Engineering | 2012

Efficiently Indexing Large Sparse Graphs for Similarity Search

Guoren Wang; Bin Wang; Xiaochun Yang; Ge Yu

The graph structure is a very important means to model schemaless data with complicated structures, such as protein-protein interaction networks, chemical compounds, knowledge query inferring systems, and road networks. This paper focuses on the index structure for similarity search on a set of large sparse graphs and proposes an efficient indexing mechanism by introducing the Q-Gram idea. By decomposing graphs to small grams (organized by κ-Adjacent Tree patterns) and pairing-up on those κ-Adjacent Tree patterns, the lower bound estimation of their edit distance can be calculated for candidate filtering. Furthermore, we have developed a series of techniques for inverted index construction and online query processing. By building the candidate set for the query graph before the exact edit distance calculation, the number of graphs need to proceed into exact matching can be greatly reduced. Extensive experiments on real and synthetic data sets have been conducted to show the effectiveness and efficiency of the proposed indexing mechanism.

IEEE Transactions on Parallel and Distributed Systems | 2010

Optimal Resource Placement in Structured Peer-to-Peer Networks

Weixiong Rao; Lei Chen; Ada Wai-Chee Fu; Guoren Wang

Utilizing the skewed popularity distribution in P2P systems, common in Gnutella and KazaA like P2P applications, we propose an optimal resource (replica or link) placement strategy, which can optimally trade off the performance gain and paid cost. The proposed resource placement strategy, with better results than existing works, can be generally applied in randomized P2P systems (Symphony) and deterministic P2P systems (e.g., Chord, Pastry, Tapestry, etc.). We apply the proposed resource placement strategy, respectively, to two novel applications: PCache (a P2P-based caching scheme) and PRing (a P2P ring structure). The simulation results as well as a real deployment on Planetlab demonstrate the effectiveness of the proposed resource placement strategy in reducing the average search cost of the whole system.

database systems for advanced applications | 2007

Continuously maintaining sliding window skylines in a sensor network

Junchang Xin; Guoren Wang; Lei Chen; Xiaoyi Zhang; Zhenhua Wang

Currently, wireless sensor network has been widely used in environment monitoring. The skyline query, as an important operator for multiple criteria decision making and data mining, plays an important role in many sensing applications. Though skyline queries have been well-studied in traditional database system, the existing solutions designed for data stored in a centralized site are not directly applicable to sensor environment due to the unique characteristics of wireless sensor network. In this paper, we propose an energy-efficient algorithm, called Sliding Window Skyline Monitoring Algorithm (SWSMA), to continuously maintain sliding window skylines over a wireless sensor network. Specifically, SWSMA employs two types of filters within each sensor to reduce the amount of data transferred and save the energy consumption as a consequence. In addition to SWSMA, a set of optimization mechanisms are also discussed to improve the performance of SWSMA. Our extensive simulation studies show that SWSMA together with the optimization techniques performs effectively on reducing communication cost and saving the energy on monitoring sliding window skylines.

Neurocomputing | 2011

XML document classification based on ELM

Xiangguo Zhao; Guoren Wang; Xin Bi; Peizhen Gong; Yuhai Zhao

Abstract In this paper, we describe an XML document classification framework based on extreme learning machine (ELM). On the basis of Structured Link Vector Model (SLVM), an optimized Reduced Structured Vector Space Model (RS-VSM) is proposed to incorporate structural information into feature vectors more efficiently and optimize the computation of document similarity. We apply ELM in the XML document classification to achieve good performance at extremely high speed compared with conventional learning machines (e.g., support vector machine). A voting-ELM algorithm is then proposed to improve the accuracy of ELM classifier. Revoting of Equal Votes (REV) method and Revoting of Confusing Classes (RCC) method are also proposed to postprocess the voting result of v-ELM and further improve the performance. The experiments conducted on real world classification problems demonstrate that the voting-ELM classifiers presented in this paper can achieve better performance than ELM algorithms with respect to precision, recall and F-measure.

database systems for advanced applications | 2010

Efficiently answering probability threshold-based shortest path queries over uncertain graphs

Ye Yuan; Lei Chen; Guoren Wang

Efficiently processing shortest path (SP) queries over stochastic networks attracted a lot of research attention as such queries are very popular in the emerging real world applications such as Intelligent Transportation Systems and communication networks whose edge weights can be modeled as a random variable. Some pervious works aim at finding the most likely SP (the path with largest probability to be SP), and others search the least-expected-weight path. In all these works, the definitions of the shortest path query are based on simple probabilistic models which can be converted into the multi-objective optimal issues on a weighted graph. However, these simple definitions miss important information about the internal structure of the probabilistic paths and the interplay among all the uncertain paths. Thus, in this paper, we propose a new SP definition based on the possible world semantics that has been widely adopted for probabilistic data management, and develop efficient methods to find threshold-based SP path queries over an uncertain graph. Extensive experiments based on real data sets verified the effectiveness of the proposed methods.

IEEE Transactions on Knowledge and Data Engineering | 2012

Energy-Efficient Reverse Skyline Query Processing over Wireless Sensor Networks

Guoren Wang; Junchang Xin; Lei Chen; Yunhao Liu

Reverse skyline query plays an important role in many sensing applications, such as environmental monitoring, habitat monitoring, and battlefield monitoring. Due to the limited power supplies of wireless sensor nodes, the existing centralized approaches, which do not consider energy efficiency, cannot be directly applied to the distributed sensor environment. In this paper, we investigate how to process reverse skyline queries energy efficiently in wireless sensor networks. Initially, we theoretically analyzed the properties of reverse skyline query and proposed a skyband-based approach to tackle the problem of reverse skyline query answering over wireless sensor networks. Then, an energy-efficient approach is proposed to minimize the communication cost among sensor nodes of evaluating range reverse skyline query. Moreover, optimization mechanisms to improve the performance of multiple reverse skylines are also discussed. Extensive experiments on both real-world data and synthetic data have demonstrated the efficiency and effectiveness of our proposed approaches with various experimental settings.

Neurocomputing | 2015

Elastic extreme learning machine for big data classification

Junchang Xin; Zhiqiong Wang; Luxuan Qu; Guoren Wang

Extreme Learning Machine (ELM) and its variants have been widely used for many applications due to its fast convergence and good generalization performance. Though the distributed ELM* based on MapReduce framework can handle very large scale training dataset in big data applications, how to cope with its rapidly updating is still a challenging task. Therefore, in this paper, a novel Elastic Extreme Learning Machine based on MapReduce framework, named Elastic ELM (E2LM), is proposed to cover the shortage of ELM* whose learning ability is weak to the updated large-scale training dataset. Firstly, after analyzing the property of ELM* adequately, it can be found out that its most computation-expensive part, matrix multiplication, can be incrementally, decrementally and correctionally calculated. Next, the Elastic ELM based on MapReduce framework is developed, which first calculates the intermediate matrix multiplications of the updated training data subset, and then update the matrix multiplications by modifying the old matrix multiplications with the intermediate ones. Then, the corresponding new output weight vector can be obtained with centralized computing using the update the matrix multiplications. Therefore, the efficient learning of rapidly updated massive training dataset can be realized effectively. Finally, we conduct extensive experiments on synthetic data to verify the effectiveness and efficiency of our proposed E2LM in learning massive rapidly updated training dataset with various experimental settings.

Explore More