Yongjiao Sun | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yongjiao Sun is active.

Explore More

Publication

Featured researches published by Yongjiao Sun.

Neurocomputing | 2011

An OS-ELM based distributed ensemble classification framework in P2P networks

Yongjiao Sun; Ye Yuan; Guoren Wang

Abstract Although classification in centralized environments has been widely studied in recent years, it is still an important research problem for classification in P2P networks due to the popularity of P2P computing environments. The main target of classification in P2P networks is how to efficiently decrease prediction error with small network overhead. In this paper, we propose an OS-ELM based ensemble classification framework for distributed classification in a hierarchical P2P network. In the framework, we apply the incremental learning principle of OS-ELM to the hierarchical P2P network to generate an ensemble classifier. There are two kinds of implementation methods of the ensemble classifier in the P2P network, one-by-one ensemble classification and parallel ensemble classification. Furthermore, we propose a data space coverage based peer selection approach to reduce high the communication cost and large delay. We also design a two-layer index structure to efficiently support peer selection. A peer creates a local Quad-tree to index its local data and a super-peer creates a global Quad-tree to summarize its local indexes. Extensive experimental studies verify the efficiency and effectiveness of the proposed algorithms.

Neurocomputing | 2014

Extreme learning machine for classification over uncertain data

Yongjiao Sun; Ye Yuan; Guoren Wang

Conventional classification algorithms assume that the input data is exact or precise. Due to various reasons, including imprecise measurement, network delay, outdated sources and sampling errors, data uncertainty is common and widespread in real-world applications, such as sensor database, location database, biometric information systems. Though there exist a lot of approaches for classification, few of them address the problem of classification over uncertain data in database. Therefore, in this paper, we propose classification algorithms based on conventional and optimized ELM to conduct classification over uncertain data. Firstly we view the instances of each uncertain data as the training data for learning. Then, the probabilities of uncertain data in any class are computed according to learning results of each instance. Finally, using a bound-based approach, we implement the final classification. We also extend the proposed algorithms to classification over uncertain data in a distributed environment based on OS-ELM and Monte Carlo theory. The experiments verify the performance of our proposed algorithms.

IEEE Transactions on Parallel and Distributed Systems | 2016

DistR: A Distributed Method for the Reachability Query over Large Uncertain Graphs

Yurong Cheng; Ye Yuan; Lei Chen; Guoren Wang; Christophe G. Giraud-Carrier; Yongjiao Sun

Among uncertain graph queries, reachability, i.e., the probability that one vertex is reachable from another, is likely the most fundamental one. Although this problem has been studied within the field of network reliability, solutions are implemented on a single computer and can only handle small graphs. However, as the size of graph applications continually increases, the corresponding graph data can no longer fit within a single computers memory and must therefore be distributed across several machines. Furthermore, the computation of probabilistic reachability queries is #P-complete making it very expensive even on small graphs. In this paper, we develop an efficient distributed strategy, called DistR, to solve the problem of reachability query over large uncertain graphs. Specifically, we perform the task in two steps: distributed graph reduction and distributed consolidation. In the distributed graph reduction step, we find all of the maximal subgraphs of the original graph, whose reachability probabilities can be calculated in polynomial time, compute them and reduce the graph accordingly. After this step, only a small graph remains. In the distributed consolidation step, we transform the problem into a relational join process and provide an approximate answer to the #P-complete reachability query. Extensive experimental studies show that our distributed approach is efficient in terms of both computational and communication costs, and has high accuracy.

international world wide web conferences | 2010

PeerLearning: A Content-Based e-Learning Material Sharing System Based on P2P Network

Guoren Wang; Ye Yuan; Yongjiao Sun; Junchang Xin; Ying Zhang

Managing and retrieving reusable learning materials in a content-based way is a big challenge in e-Learning material sharing systems. E-Learning materials are highly heterogeneous; they may exist in the form of video, audio, image, slide or plain text. Furthermore, the learning systems are highly dynamic in the presence of massively increasing multimedia materials. P2P network seems to be one of the most promising infrastructures to deal with the challenge in such highly dynamic environments. In this paper we propose a Peer-to-Peer (P2P) infrastructure based on the trie tree and the deBruijn structure. It can support efficiently query processing in highly dynamic scenarios. Furthermore we develop a P2P e-Learning system PeerLearning to provide two content-based learning material sharing services: a keyword search component for supporting content-based document sharing and a content-based retrieval method for multimedia materials. Extensive experiments are conducted in this study to verify the superiority of our methods over the existing works.

World Wide Web | 2015

ELM-based name disambiguation in bibliography

Donghong Han; Siqi Liu; Yachao Hu; Bin Wang; Yongjiao Sun

It is common that different people share the same name. When it occurs in bibliography databases, it worsens the performance of information retrieval and data management. In this paper, we address the problem of name disambiguation and propose two different strategies, one classifier for each name (OCEN) and one classifier for all names (OCAN). Both strategies OCEN and OCAN are based on extreme learning machine (ELM) which shows similar or better generalization performance and faster learning speed than support vector machines (SVM) and least squares support vector machines (LS-SVM). We conduct experiments to compare the performance of ELM, SVM and LS-SVM in the two strategies.

Neurocomputing | 2015

An on-line sequential learning method in social networks for node classification

Yongjiao Sun; Ye Yuan; Guoren Wang

Social networks have been a common platform for human interactions due to the rapid development of Internet. Along with the rising demand in network analysis, the issue of node classification have became an important research field. This article aims to address the task of node classification using an on-line sequential learning method with the role of links and node features. Compared to the conventional classification methods, we should not only use the node features for node classification, but also consider the interaction among the linked nodes. In this paper, we assume that the nodes have been partially labeled in a social network, and we use these labeled nodes to predict the categories of unlabeled nodes. Based on OS-ELM learning method, three node classification approaches are proposed. First, considering the influence of other nodes, we combine the linkage information with the features of the trained nodes to learn the classifier. Then, for reducing the learning time, we present a method to refine the node features. Finally, according to graph structures, we present an optimization node classification method. Extensive experiments were conducted to verify the performance of our proposed methods.

asia-pacific web conference | 2010

Efficient Peer-to-Peer Similarity Query Processing for High-dimensional Data

Ye Yuan; Guoren Wang; Yongjiao Sun

Objects, such as a digital image, a text document or a DNA sequence are usually represented in a high dimensional feature space. A fundamental issue in (peer-to-peer) P2P systems is to support an efficient similarity search for high-dimensional data in metric spaces. Prior works suffer from some fundamental limitations, such as being not adaptive to a highly dynamic network, poor search efficiency under skewed data scenarios, large maintenance overhead and etc. In this study, we propose an efficient scheme, Dragon, to support P2P similarity search in metric spaces. Dragon achieves the efficiency through the following designs: 1) Dragon is based on our previous designed P2P network, Phoenix, which has the optimal routing efficiency in dynamic scenarios. 2) We design a locality-preserving naming algorithm and a routing tree for each peer in Phoenix to support range queries. A radius-estimated method is proposed to transform a kNN query to a range query. 3) A load-balancing algorithm is given to support strong query processing under skewed data distributions. Extensive experiments verify the superiority of Dragon over existing works.

IEEE Transactions on Knowledge and Data Engineering | 2017

Keyword Search over Distributed Graphs with Compressed Signature

Ye Yuan; Xiang Lian; Lei Chen; Jeffery Xu Yu; Guoren Wang; Yongjiao Sun

Graph keyword search has drawn many research interests, since graph models can generally represent both structured and unstructured databases and keyword searches can extract valuable information for users without the knowledge of the underlying schema and query language. In practice, data graphs can be extremely large, e.g., a Web-scale graph containing billions of vertices. The state-of-the-art approaches employ centralized algorithms to process graph keyword searches, and thus they are infeasible for such large graphs, due to the limited computational power and storage space of a centralized server. To address this problem, we investigate keyword search for Web-scale graphs deployed in a distributed environment. We first give a naive search algorithm to answer the query efficiently. However, the naive search algorithm uses a flooding search strategy that incurs large time and network overhead. To remedy this shortcoming, we then propose a signature-based search algorithm. Specifically, we design a vertex signature that encodes the shortest-path distance from a vertex to any given keyword in the graph. As a result, we can find query answers by exploring fewer paths, so that the time and communication costs are low. Moreover, we reorganize the graph data in the cluster after its initial random partitioning so that the signature-based techniques are more effective. Finally, our experimental results demonstrate the feasibility of our proposed approach in performing keyword searches over Web-scale graph data.

World Wide Web | 2012

Top-

Yongjiao Sun; Ye Yuan; Guoren Wang

Although top-k queries over uncertain data in centralized databases have been studied widely in recent years, it is still a challenging issue in distributed environments. In distributed environments, such as Peer-to-Peer (P2P) systems and sensor networks, there exists an inherent uncertainty on the data objects due to imprecise measurements and network delays. Therefore, it is necessary to study the problem of how to efficiently retrieve top-k uncertain data objects over distributed environments with minimum network overhead. In this paper, we propose a novel approach of processing uncertain top-k queries in large-scale P2P networks, where datasets are horizontally partitioned over peers. In our approach, each peer constructs an Uncertain Quad-Tree (UQ-Tree) index for its local uncertain data, while the P2P network constructs a global index by summarizing the local indexes. Based on the global index, we propose a spatial-pruning algorithm to reduce communication costs and a distributed-pruning algorithm to reduce computation costs. Extensive experiments are conducted to verify the effectiveness and efficiency of the proposed methods in terms of communication costs and response time.

Knowledge and Information Systems | 2016

\boldsymbol{k}

Yongjiao Sun; Ye Yuan; Guoren Wang; Yurong Cheng

Large amount of personal social information is collected and published due to the rapid development of social network technologies and applications, and thus, it is quite essential to take privacy preservation and prevent sensitive information leakage. Most of current anonymizing techniques focus on the preservation to privacies, but cannot provide accurate answers to utility queries even at a high price. To solve the problem, a novel anonymizing approach, called splitting anonymization, is introduced in this paper to point against the contradiction of privacy and utility. This approach provides a high-level preservation to the privacy of social network data that is unknown to attackers, which avoids the low utility caused by the enforced noises on knowledge that is already known to the attackers. Social network processed by splitting anonymization can refuse any direct attack, and these strategies are also safe enough to indirect attacks which are usually more dangerous than direct attacks. Finally, strict theoretical analysis and large amount of evaluation results based on real data sets verified the design of this paper.

Explore More