Donghua Yang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Donghua Yang is active.

Explore More

Publication

Featured researches published by Donghua Yang.

IEEE Transactions on Knowledge and Data Engineering | 2013

Efficient Skyline Computation on Big Data

Xixian Han; Donghua Yang; Jinbao Wang

Skyline is an important operation in many applications to return a set of interesting points from a potentially huge data space. Given a table, the operation finds all tuples that are not dominated by any other tuples. It is found that the existing algorithms cannot process skyline on big data efficiently. This paper presents a novel skyline algorithm SSPL on big data. SSPL utilizes sorted positional index lists which require low space overhead to reduce I/O cost significantly. The sorted positional index list Lj is constructed for each attribute Aj and is arranged in ascending order of Aj. SSPL consists of two phases. In phase 1, SSPL computes scan depth of the involved sorted positional index lists. During retrieving the lists in a round-robin fashion, SSPL performs pruning on any candidate positional index to discard the candidate whose corresponding tuple is not skyline result. Phase 1 ends when there is a candidate positional index seen in all of the involved lists. In phase 2, SSPL exploits the obtained candidate positional indexes to get skyline results by a selective and sequential scan on the table. The experimental results on synthetic and real data sets show that SSPL has a significant advantage over the existing skyline algorithms.

Future Generation Computer Systems | 2013

Ad-hoc aggregate query processing algorithms based on bit-store for query intensive applications in cloud computing

Donghua Yang; Yuqiang Feng; Ye Yuan; Xixian Han; Jinbao Wang

Ad-hoc Aggregate query is extremely important for query intensive applications in cloud computing which extracts valuable summary information on massive datasets to help the decision-maker make right decisions. Current data storage schemes (row-store and column-store) cannot efficiently answer ad-hoc aggregate query on massive data sets in cloud computing. A new data storage structure (bit vector storage structure, bit-store for short) is proposed in this paper. The paper focuses on proposing ad-hoc aggregate query algorithms based on bit-store. Firstly, the storage model of bit-store including its attribute encoding schemes and bit file organization is introduced. Secondly, different aggregate operations for query processing are presented based on different encoding schemes. Thirdly, cost analysis for different aggregate operations is presented. Finally, the effectiveness and efficiency of the proposed algorithms is showed by the analytical and experimental results.

Knowledge and Information Systems | 2012

PI-Join: Efficiently processing join queries on massive data

Xixian Han; Donghua Yang

The ratio of disk capacity to disk transfer rate typically increases by 10× per decade. As a result, disk is becoming slower from the view of applications because of the much larger data volume that they need to store and process. In database systems, the less the data volume that is involved in query processing, the better the performance that is achieved. Disk-based join operation is a common but time-consuming database operation, especially in an environment of massive data in which I/O cost dominates the execution time. However, current join algorithms are only suitable for moderate or small data volume. They will incur high I/O cost when performing on massive data because of multi-pass I/O operations on the joined tables and the insensitivity to join selectivity. This paper proposes PI-Join a novel disk-based join algorithm that can efficiently process join queries involving massive data. PI-Join consists of two stages: JPIPT construction stage (JCS) and result output stage (ROS). JCS performs a cache-conscious construction algorithm on join attributes which are kept in column-oriented model to obtain join positional index pair table (JPIPT) of join results faster. The obtained JPIPT is used in ROS to retrieve results in a one-pass sequential selective scan on each table. We provide the correctness proof and cost analysis of PI-Join. Our experimental results indicate that PI-Join has a significant advantage over the existing join algorithms.

Information Sciences | 2013

TJJE: An efficient algorithm for top-k join on massive data

Xixian Han; Jinbao Wang; Donghua Yang

In many applications, top-k join is an important operation to return the k most important join tuples among the potentially huge answer space according to a given ranking function. PBRJ is an algorithm template that generalizes previous top-k join algorithms. In this paper, our analysis shows that PBRJ needs to maintain a large quantity of candidate tuples on massive data. Based on the analysis, this paper proposes a novel top-k join algorithm TJJE which is suitable for handling massive data. By some pre-computed information, TJJE first estimates an upper-bound on scan depth of each joined table. Then it determines the file that contains the join positional index pairs of the top-k join results. A novel algorithm is proposed to retrieve the required join tuples by a single sequential and selective scan on the joined tables. Finally, the top-k join results are obtained by a single scan on the retrieved join tuples. The correctness proof and cost analysis of TJJE are presented in this paper. Extensive experiments show that TJJE maintains up to three orders of magnitude fewer candidate tuples and obtains up to one order of magnitude speedup compared to PBRJ.

database systems for advanced applications | 2016

VMPSP: Efficient Skyline Computation Using VMP-Based Space Partitioning

Kaiqi Zhang; Donghua Yang; Hong Gao; Hongzhi Wang; Zhipeng Cai

The skyline query returns a set of interesting points that are not dominated by any other points in the multi-dimensional data sets. This query has already been considerably studied over last several years in preference analysis and multi-criteria decision making applications fields. Space partitioning, the best non-index framework, has been proposed and existing methods based on it do not consider the balance of partitioned subspaces. To overcome this limitation, we first develop a cost evaluation model of space partitioning in skyline computation, propose an efficient approach to compute the skyline set using balanced partitioning. We illustrate the importance of the balance in partitioning. Based on this, we propose a method to construct a balanced partitioning point VMP whose ith attribute value is the median value of all points in ith dimension. We also design a structure RST to reduce dominance tests among those subspaces which are comparable. The experimental evaluation indicates that our algorithm is faster at least several times than existing state-of-the-art algorithms.

Personal and Ubiquitous Computing | 2018

Protecting query privacy with differentially private k -anonymity in location-based services

Jinbao Wang; Zhipeng Cai; Yingshu Li; Donghua Yang; Ji Li; Hong Gao

Nowadays, location-based services (LBS) are facilitating people in daily life through answering LBS queries. However, privacy issues including locationprivacy and queryprivacy arise at the same time. Existing works for protecting queryprivacy either work on trusted servers or fail to provide sufficient privacy guarantee. This paper combines the concepts of differential privacy and k-anonymity to propose the notion of differentially private k-anonymity (DPkA) for queryprivacy in LBS. We recognize the sufficient and necessary condition for the availability of 0-DPkA and present how to achieve it. For cases where 0-DPkA is not achievable, we propose an algorithm to achieve 𝜖-DPkA with minimized 𝜖. Extensive simulations are conducted to validate the proposed mechanisms based on real-life datasets and synthetic data distributions.

cyber-enabled distributed computing and knowledge discovery | 2011

Ad Hoc Aggregation Query Processing Algorithms Based on Bit-Store in Data Intensive Cloud

Donghua Yang; Xixian Han; Jinbao Wang

Ad-hoc Aggregation query is extremely important for data-intensive applications in the cloud which extracts valuable summary information on massive datasets to help decision-maker make right decisions. Current data storage schemes (row-store and column-store) cannot efficiently answer ad-hoc aggregation on massive data sets in the cloud. A new data storage structure (bit vector storage structure, bit-store for short) is proposed in the paper, which partitions tables vertically by bit position and stores all bit values in the same positions into a separate bit file. This paper focuses on proposing ad-hoc aggregation query algorithms based on bit-store. Firstly, the storage model of bit-store including its attribute encoding and bit file organization is introduced. Then, the implementation of different aggregation operations using different encoding schemes is presented. Finally, analytical and experimental results show the effectiveness and efficiency of the proposed approach.

Security and Communication Networks | 2018

Achieving the Optimal -Anonymity for Content Privacy in Interactive Cyberphysical Systems

Jinbao Wang; Ling Tian; Yan Huang; Donghua Yang; Hong Gao

Modern applications and services leveraged by interactive cyberphysical systems (CPS) are providing significant convenience to our daily life in various aspects at present. Clients submit their requests including query contents to CPS servers to enjoy diverse services such as health care, automatic driving, and location-based services. However, privacy concerns arise at the same time. Content privacy is recognized and a lot of efforts have been made in the literature of privacy preserving in interactive cyberphysical systems such as location-based services. Nevertheless, neither the cloaking based solutions nor existing client based solutions have achieved effective content privacy by optimizing proper content privacy metrics. In this paper we formulate the problem of achieving the optimal content privacy in interactive cyberphysical systems using -anonymity solutions based on two content privacy metrics, which are defined using the concepts of entropy and differential privacy. Then we propose an algorithm, Multilayer Alignment ( MLA ), to establish -anonymity mechanisms for preserving content privacy in interactive cyberphysical systems. Our proposed MLA is theoretically proved to achieve the optimal content privacy in terms of both the entropy based and the differential privacy mannered content privacy metrics. Evaluation based on real-life datasets is conducted, and the evaluation results validate the effectiveness of our proposed algorithm.

database systems for advanced applications | 2017

RSkycube: Efficient Skycube Computation by Reusing Principle

Kaiqi Zhang; Hong Gao; Xixian Han; Donghua Yang; Zhipeng Cai

Over the past years, the skyline query has already attracted wide attention in database community. In order to meet different preferences for users, the skycube computation is proposed to compute skylines, or cuboids, on all possible non-empty dimension subsets. The key issue of computing skycube is how to share computation among multiple related cuboids, which classified into sharing strict space dominance and sharing space incomparability. However, state-of-the-art algorithm only leverages sharing strict space dominance to compute skycube. This paper aims to design a more efficient skycube algorithm that shares computation among multiple related cuboids. We first propose a set of rules named identical partitioning (IP) for constructing a novel structure VSkyTree. Moreover, we present the reusing principle, which utilizes both sharing strict space dominance and sharing space incomparability by reusing VSkyTree on parent cuboids to compute child cuboids. Then, in top-down fashion, we design an efficient skycube computation algorithm RSkycube based on the reusing principle. Our experimental results indicate that our algorithm RSkycube significantly outperforms state-of-the-art skycube computation algorithm on both synthetic and real datasets.

asia-pacific web conference | 2015

Answering Spatial Approximate Keyword Queries in Disks

Jinbao Wang; Donghua Yang; Yuhong Wei; Hong Gao; Ye Yuan

Spatial approximate keyword queries consist of a spatial condition and a set of keywords as the fuzzy textual conditions, and they return objects labeled with a set of keywords similar to queried keywords while satisfying the spatial condition. Such queries enable users to find objects of interest in a spatial database, and make mismatches between user query keywords and object keywords tolerant. With the rapid growth of data, spatial databases storing objects from diverse geographical regions can be no longer held in main memories. Thus, it is essential to answer spatial approximate keyword queries over disk resident datasets. Existing works present methods either returns incomplete answers or indexes in main memory, and effective solutions in disks are in demand. This paper presents a novel disk resident index RMB-tree to support spatial approximate keyword queries. We study the principle of augmenting R-tree with capacity of approximate keyword searching based on existing solutions, and store multiple bitmaps in R-tree nodes to build an RMB-tree. RMB-tree supports spatial conditions such as range constraint, combined with keyword similarity metrics such as edit distance, dice etc. Experimental results against R-tree on two real world datasets demonstrate the efficiency of our solution.

Explore More