Is this you? Create Your Porfile

Chi Yang

University of Technology, Sydney

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chi Yang is active.

Explore More

Publication

Featured researches published by Chi Yang.

Future Generation Computer Systems | 2015

External integrity verification for outsourced big data in cloud and IoT

Chang Liu; Chi Yang; Xuyun Zhang; Jinjun Chen

As cloud computing is being widely adopted for big data processing, data security is becoming one of the major concerns of data owners. Data integrity is an important factor in almost any data and computation related context. It is not only one of the qualities of service, but also an important part of data security and privacy. With the proliferation of cloud computing and the increasing needs in analytics for big data such as data generated by the Internet of Things, verification of data integrity becomes increasingly important, especially on outsourced data. Therefore, research topics on external data integrity verification have attracted tremendous research interest in recent years. Among all the metrics, efficiency and security are two of the most concerned measurements. In this paper, we will bring forth a big picture through providing an analysis on authenticator-based data integrity verification techniques on cloud and Internet of Things data. We will analyze multiple aspects of the research problem. First, we illustrate the research problem by summarizing research motivations and methodologies. Second, we summarize and compare current achievements of several of the representative approaches. Finally, we introduce our view for possible future developments. Security of Big Data in cloud and IoT is becoming a major problem.Efficient external integrity verification is an important part of data security.We provide a big picture through summarizing and analysis of the main results of external integrity verification schemes for big data in cloud.

Journal of Computer and System Sciences | 2014

A hybrid approach for scalable sub-tree anonymization over big data using MapReduce on cloud

Xuyun Zhang; Chang Liu; Surya Nepal; Chi Yang; Wanchun Dou; Jinjun Chen

In big data applications, data privacy is one of the most concerned issues because processing large-scale privacy-sensitive data sets often requires computation resources provisioned by public cloud services. Sub-tree data anonymization is a widely adopted scheme to anonymize data sets for privacy preservation. Top–Down Specialization (TDS) and Bottom–Up Generalization (BUG) are two ways to fulfill sub-tree anonymization. However, existing approaches for sub-tree anonymization fall short of parallelization capability, thereby lacking scalability in handling big data in cloud. Still, either TDS or BUG individually suffers from poor performance for certain valuing of k-anonymity parameter. In this paper, we propose a hybrid approach that combines TDS and BUG together for efficient sub-tree anonymization over big data. Further, we design MapReduce algorithms for the two components (TDS and BUG) to gain high scalability. Experiment evaluation demonstrates that the hybrid approach significantly improves the scalability and efficiency of sub-tree anonymization scheme over existing approaches.

Journal of Computer and System Sciences | 2014

A spatiotemporal compression based approach for efficient big data processing on Cloud

Chi Yang; Xuyun Zhang; Changmin Zhong; Chang Liu; Jian Pei; Kotagiri Ramamohanarao; Jinjun Chen

It is well known that processing big graph data can be costly on Cloud. Processing big graph data introduces complex and multiple iterations that raise challenges such as parallel memory bottlenecks, deadlocks, and inefficiency. To tackle the challenges, we propose a novel technique for effectively processing big graph data on Cloud. Specifically, the big data will be compressed with its spatiotemporal features on Cloud. By exploring spatial data correlation, we partition a graph data set into clusters. In a cluster, the workload can be shared by the inference based on time series similarity. By exploiting temporal correlation, in each time series or a single graph edge, temporal data compression is conducted. A novel data driven scheduling is also developed for data processing optimisation. The experiment results demonstrate that the spatiotemporal compression and scheduling achieve significant performance gains in terms of data size and data fidelity loss.

IEEE Transactions on Parallel and Distributed Systems | 2015

A Time Efficient Approach for Detecting Errors in Big Sensor Data on Cloud

Chi Yang; Chang Liu; Xuyun Zhang; Surya Nepal; Jinjun Chen

Big sensor data is prevalent in both industry and scientific research applications where the data is generated with high volume and velocity it is difficult to process using on-hand database management tools or traditional data processing applications. Cloud computing provides a promising platform to support the addressing of this challenge as it provides a flexible stack of massive computing, storage, and software services in a scalable manner at low cost. Some techniques have been developed in recent years for processing sensor data on cloud, such as sensor-cloud. However, these techniques do not provide efficient support on fast detection and locating of errors in big sensor data sets. For fast data error detection in big sensor data sets, in this paper, we develop a novel data error detection approach which exploits the full computation potential of cloud platform and the network feature of WSN. Firstly, a set of sensor data error types are classified and defined. Based on that classification, the network feature of a clustered WSN is introduced and analyzed to support fast error detection and location. Specifically, in our proposed approach, the error detection is based on the scale-free network topology and most of detection operations can be conducted in limited temporal or spatial data blocks instead of a whole big data set. Hence the detection and location process can be dramatically accelerated. Furthermore, the detection and location tasks can be distributed to cloud platform to fully exploit the computation power and massive storage. Through the experiment on our cloud computing platform of U-Cloud, it is demonstrated that our proposed approach can significantly reduce the time for error detection and location in big data sets generated by large scale sensor network systems with acceptable error detecting accuracy.

IEEE Transactions on Computers | 2015

Proximity-Aware Local-Recoding Anonymization with MapReduce for Scalable Big Data Privacy Preservation in Cloud

Xuyun Zhang; Wanchun Dou; Jian Pei; Surya Nepal; Chi Yang; Chang Liu; Jinjun Chen

Cloud computing provides promising scalable IT infrastructure to support various processing of a variety of big data applications in sectors such as healthcare and business. Data sets like electronic health records in such applications often contain privacy-sensitive information, which brings about privacy concerns potentially if the information is released or shared to third-parties in cloud. A practical and widely-adopted technique for data privacy preservation is to anonymize data via generalization to satisfy a given privacy model. However, most existing privacy preserving approaches tailored to small-scale data sets often fall short when encountering big data, due to their insufficiency or poor scalability. In this paper, we investigate the local-recoding problem for big data anonymization against proximity privacy breaches and attempt to identify a scalable solution to this problem. Specifically, we present a proximity privacy model with allowing semantic proximity of sensitive values and multiple sensitive attributes, and model the problem of local recoding as a proximity-aware clustering problem. A scalable two-phase clustering approach consisting of a t-ancestors clustering (similar to k-means) algorithm and a proximity-aware agglomerative clustering algorithm is proposed to address the above problem. We design the algorithms with MapReduce to gain high scalability by performing data-parallel computation in cloud. Extensive experiments on real-life data sets demonstrate that our approach significantly improves the capability of defending the proximity privacy breaches, the scalability and the time-efficiency of local-recoding anonymization over existing approaches.

computational science and engineering | 2013

Public Auditing for Big Data Storage in Cloud Computing -- A Survey

Chang Liu; Rajiv Ranjan; Xuyun Zhang; Chi Yang; Dimitrios Georgakopoulos; Jinjun Chen

Data integrity is an important factor to ensure in almost any data and computation related context. It serves not only as one of the qualities of service, but also an important part of data security and privacy. With the proliferation of cloud computing and the increasing needs in big data analytics, verification of data integrity becomes increasingly important, especially on outsourced data. Therefore, research topics related to data integrity verification have attracted tremendous research interest. Among all the metrics, efficiency and security are two of the most concerned measurements. In this paper, we provide an analysis on authenticator-based efficient data integrity verification. we will analyze and provide a survey on the main aspects of this research problem, summarize the research motivations, methodologies as well as main achievements of several of the representative approaches, then try to bring forth a blueprint for possible future developments.

Archive | 2014

Privacy Preservation over Big Data in Cloud Systems

Xuyun Zhang; Chang Liu; Surya Nepal; Chi Yang; Jinjun Chen

Cloud computing and Big Data, two disruptive trends at present, pose significant influence on current IT industry and research communities. Cloud computing provides massive computation power and storage capacity which enable users to deploy applications without infrastructure investment

trust security and privacy in computing and communications | 2013

Combining Top-Down and Bottom-Up: Scalable Sub-tree Anonymization over Big Data Using MapReduce on Cloud

Xuyun Zhang; Chang Liu; Surya Nepal; Chi Yang; Wanchun Dou; Jinjun Chen

In big data applications, data privacy is one of the most concerned issues because processing large-scale privacy-sensitive data sets often requires computation power provided by public cloud services. Sub-tree data anonymization, achieving a good trade-off between data utility and distortion, is a widely adopted scheme to anonymize data sets for privacy preservation. Top-Down Specialization (TDS) and Bottom-Up Generalization (BUG) are two ways to fulfill sub-tree anonymization. However, existing approaches for sub-tree anonymization fall short of parallelization capability, thereby lacking scalability in handling big data on cloud. Still, both TDS and BUG suffer from poor performance for certain value of k-anonymity parameter if they are utilized individually. In this paper, we propose a hybrid approach that combines TDS and BUG together for efficient sub-tree anonymization over big data. Further, we design MapReduce based algorithms for two components (TDS and BUG) to gain high scalability by exploiting powerful computation capability of cloud. Experiment evaluations demonstrate that the hybrid approach significantly improves the scalability and efficiency of sub-tree anonymization scheme over existing approaches.

international conference on cloud and green computing | 2013

A MapReduce Based Approach of Scalable Multidimensional Anonymization for Big Data Privacy Preservation on Cloud

Xuyun Zhang; Chi Yang; Surya Nepal; Chang Liu; Wanchun Dou; Jinjun Chen

The massive increase in computing power and data storage capacity provisioned by cloud computing as well as advances in big data mining and analytics have expanded the scope of information available to businesses, government, and individuals by orders of magnitude. Meanwhile, privacy protection is one of most concerned issues in big data and cloud applications, thereby requiring strong preservation of customer privacy and attracting considerable attention from both IT industry and academia. Data anonymization provides an effective way for data privacy preservation, and multidimensional anonymization scheme is a widely-adopted one among existing anonymization schemes. However, existing multidimensional anonymization approaches suffer from severe scalability or IT cost issues when handling big data due to their incapability of fully leveraging cloud resources or being cost-effectively adapted to cloud environments. As such, we propose a scalable multidimensional anonymization approach for big data privacy preservation using Map Reduce on cloud. In the approach, a highly scalable median-finding algorithm combining the idea of the median of medians and histogram technique is proposed and the recursion granularity is controlled to achieve cost-effectiveness. Corresponding MapReduce jobs are dedicatedly designed, and the experiment evaluations demonstrate that with our approach, the scalability and cost-effectiveness of multidimensional scheme can be improved significantly over existing approaches.

Concurrency and Computation: Practice and Experience | 2013

SaC‐FRAPP: a scalable and cost‐effective framework for privacy preservation over big data on cloud

Xuyun Zhang; Chang Liu; Surya Nepal; Chi Yang; Wanchun Dou; Jinjun Chen

Big data and cloud computing are two disruptive trends nowadays, provisioning numerous opportunities to the current information technology industry and research communities while posing significant challenges on them as well. Cloud computing provides powerful and economical infrastructural resources for cloud users to handle ever increasing data sets in big data applications. However, processing or sharing privacy‐sensitive data sets on cloud probably engenders severe privacy concerns because of multi‐tenancy. Data encryption and anonymization are two widely‐adopted ways to combat privacy breach. However, encryption is not suitable for data that are processed and shared frequently, and anonymizing big data and manage numerous anonymized data sets are still challenges for traditional anonymization approaches. As such, we propose a scalable and cost‐effective framework for privacy preservation over big data on cloud in this paper. The key idea of the framework is that it leverages cloud‐based MapReduce to conduct data anonymization and manage anonymous data sets, before releasing data to others. The framework provides a holistic conceptual foundation for privacy preservation over big data. Further, a corresponding proof‐of‐concept prototype system is implemented. Empirical evaluations demonstrate that scalable and cost‐effective framework for privacy preservation can anonymize large‐scale data sets and mange anonymous data sets in a highly flexible, scalable, efficient, and cost‐effective fashion. Copyright

Explore More