Aibo Song | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Aibo Song is active.

Explore More

Publication

Featured researches published by Aibo Song.

ieee/acm international symposium cluster, cloud and grid computing | 2011

BAR: An Efficient Data Locality Driven Task Scheduling Algorithm for Cloud Computing

Jiahui Jin; Junzhou Luo; Aibo Song; Fang Dong; Runqun Xiong

Large scale data processing is increasingly common in cloud computing systems like MapReduce, Hadoop, and Dryad in recent years. In these systems, files are split into many small blocks and all blocks are replicated over several servers. To process files efficiently, each job is divided into many tasks and each task is allocated to a server to deals with a file block. Because network bandwidth is a scarce resource in these systems, enhancing task data locality(placing tasks on servers that contain their input blocks) is crucial for the job completion time. Although there have been many approaches on improving data locality, most of them either are greedy and ignore global optimization, or suffer from high computation complexity. To address these problems, we propose a heuristic task scheduling algorithm called Balance-Reduce(BAR), in which an initial task allocation will be produced at first, then the job completion time can be reduced gradually by tuning the initial task allocation. By taking a global view, BAR can adjust data locality dynamically according to network state and cluster workload. The simulation results show that BAR is able to deal with large problem instances in a few seconds and outperforms previous related algorithms in term of the job completion time.

Cluster Computing | 2010

A context-aware personalized resource recommendation for pervasive learning

Junzhou Luo; Fang Dong; Jiuxin Cao; Aibo Song

As it is difficult for learners to discover and obtain the most appropriate resources from massive education resources according to traditional keyword searching method, the context-aware based resource recommendation service becomes a significant part of pervasive learning environments. At present, recommendation mechanisms are widely used in e-commerce field, where content-based or collaborative-based filter strategies are usually considered separately. However, in these existing recommendation mechanisms, the dynamic interests and preference of learners, the access pattern and the other attributes of pervasive learning environments (such as multi-modes connecting and resources distribution) are always neglected. Thus, these mechanisms can not effectively reflect learners’ actual preference and can not adapt to pervasive learning environments perfectly. To address these problems, a context-aware resource recommendation model and relevant recommendation algorithm for pervasive learning environments are proposed. Therein, with taking into account the relevant contextual information, the calculation of relevant degree between learners and resources can be divided into two main parts: logic-based RRD (resource relevant degree) and situation-based RRD. In the first part, content-based and collaborative-based recommendation mechanisms are combined together, where the individual preference tree (IPT) is introduced to take into account the multi-dimensional attributes of resources, learners’ rating matrix and the energy of access preference. Meanwhile, the learners’ historical sequential patterns of resource accessing are also considered to further improve the accuracy of recommendation. In the second part, in order to enhance the validation of recommendation, the connecting type relevance and time satisfaction degree are calculated according to other relevant contexts. Then, the candidate resources can be filtered and sorted via combining these two parts to generate (Top-N) recommendation results. The simulations show that our newly proposed method outperforms other state of-the-art algorithms on traditional and newly presented metrics and it may also be more suitable for pervasive learning environments. Finally, a prototype system is implemented based on SEU-ESP to demonstrate the relevant recommendation process further.

ieee international conference on cloud computing technology and science | 2010

A MapReduceMerge-based Data Cube Construction Method

Yuxiang Wang; Aibo Song; Junzhou Luo

The pre-computation of data cubes is critical to improve the response time of On-Line Analytical Processing (OLAP) system. However, as the size of data grows, the time it takes to construct data cubes becomes a significant performance bottleneck. Therefore, we need the parallel pre-computation approach to further improve the performance of OLAP. Current parallel approaches can be grouped into two categories: work partitioning and data partitioning. But the first one can not guarantee the load balance among processors and the second one produces massive data movement between processors. This paper proposes a MapReduceMerge-based parallel data cube construction method with a read-optimized data storage strategy which is more suitable for OLAP. Our method can ensure good load balancing and reduce the large amount of data movement compared with traditional approaches. MapReduceMerge is the expansion of Map Reduce which is a programming model that enables easy development of parallel applications to process massive data on large clusters and it is the key element of Hadoop(an cloud computing framework) which used to support the businesses of Face book under cloud environment. We modify the original MapReduceMerge framework to make it meet the needs of cuboids construction and show the implementation in details through an example of 2-dimension cuboids construction. In the mean time, we discuss the optimization for the construction of multi-dimension cuboids.

international symposium on parallel and distributed processing and applications | 2005

Grid Supporting Platform for AMS Data Processing

Junzhou Luo; Aibo Song; Ye Zhu; Xiaopeng Wang; Teng Ma; Zhiang Wu; Yaobin Xu; Liang Ge

The purpose of AMS experiment is to look for the source of the dark matter, source of the cosmic ray and the universe made of antimatter. The characteristics of AMS experiment are massive data and complicated computing. The data are frequently transmitted, retrieved and processed among the computing nodes located in USA, Europe and China. This paper introduces the grid platform at Southeast University, called SEUGrid, for the AMS data processing and analysis. Some key technologies such as the scheduling strategy, data replica management and semantic access control, which SEUGrid adopts to fit the AMS data processing, are described in the paper.

Journal of Computer Science and Technology | 2013

Partition-Based Online Aggregation with Shared Sampling in the Cloud

Yuxiang Wang; Junzhou Luo; Aibo Song; Fang Dong

Online aggregation is an attractive sampling-based technology to response aggregation queries by an estimate to the final result, with the confidence interval becoming tighter over time. It has been built into a MapReduce-based cloud system for big data analytics, which allows users to monitor the query progress, and save money by killing the computation early once sufficient accuracy has been obtained. However, there are several limitations that restrict the performance of online aggregation generated from the gap between the current mechanism of MapReduce paradigm and the requirements of online aggregation, such as: 1) the low sampling efficiency due to the lack of consideration of skewed data distribution for online aggregation in MapReduce, and 2) the large redundant I/O cost of online aggregation caused by the independent job execution mechanism of MapReduce. In this paper, we present OLACloud, a MapReduce-based cloud system to well support online aggregation for different data distributions and large-scale concurrent query processing. We propose a content-aware repartition method with a fair-allocation block placement strategy to increase the sampling efficiency and guarantee the storage and computation load balancing simultaneously. We also develop a shared sampling method to share the sampling opportunities among multiple queries to reduce redundant I/O cost. We also implement OLACloud in Hadoop, and conduct an extensive experimental study on the TPC-H benchmark for skewed data distribution. Our results demonstrate the efficiency and effectiveness of OLACloud.

Distributed and Parallel Databases | 2014

OATS: online aggregation with two-level sharing strategy in cloud

Yuxiang Wang; Junzhou Luo; Aibo Song; Fang Dong

Online aggregation (OLA) is an attractive sampling-based technology to response aggregation queries by an approximate estimate to the final result, with the confidence interval becomes tighter over time. It has been built into the MapReduce-based cloud system for big data analytics, which allows users to monitor the query progress, and save money by killing the computation early once sufficient accuracy has been obtained. However, there is a serious limitation that restricts the performance of OLA that is the sharing issue of multiple OLA queries processing. Note that, in the original MapReduce paradigm, each query is processed independently without considering the potential sharing opportunities, leading to two major unnecessary additional execution costs: (1) the large redundant I/O cost, and (2) the replicative statistical computation cost. To eliminate such additional execution cost and improve the overall performance, we present online aggregation with two-level sharing strategy in cloud (OATS) based on MapReduce framework in this paper to effectively support online aggregation for large scale concurrent query processing in skewed data distribution. In the first-level sharing, we propose a sample buffer management mechanism to share the sampling opportunities among multiple OLA queries to reduce redundant I/O cost. While in the second-level sharing, we propose a heuristic algorithm (with a good scalability for large input) for the statistical computation to share partial statistics calculation to decrease the number of final aggregation operations, reducing the statistical computation cost. Based on such two-level sharing strategy, we have implemented OATS in Hadoop and conducted an extensive experiments study on the TPC-H benchmark for skewed data distribution. Our results demonstrate the efficiency and effectiveness of OATS.

computer supported cooperative work in design | 2007

A Trust Degree Based Access Control for Multi-domains in Grid Environment

Xudong Ni; Junzhou Luo; Aibo Song

The grid security focuses on implementation of safe access for resource of different domains in dynamic grid environment. Trust as an important factor in grid security is increasingly applied to management of security. But the research of application with trust to access control is rare and coarse. In this paper, we propose the concept of trust degree which is the measurement of trust and combine it with access control framework. A fine-granularity access control model has been realized in a single domain and the trust degree based access control framework accomplishes the work to access resource across multi-domains. Conversion between domains is correct and effective. Simulation results present that the access control model is practicable and credible.

database systems for advanced applications | 2012

Improving online aggregation performance for skewed data distribution

Yuxiang Wang; Junzhou Luo; Aibo Song; Jiahui Jin; Fang Dong

Online aggregation is a commonly-used technique to response aggregation queries with the refined approximate answers (within an estimated confidence interval) quickly. However, we observe that low selectivity and inappropriate sample proportion significantly affect the online aggregation performance when the data distribution is skewed. To overcome this problem, we propose a Partition-based Online Aggregation System called POAS. In POAS, the side effect of low selectivity can be reduced by efficient pruning of unneeded data due to the partition and shuffle strategies, and the appropriate sample proportion can be achieved as far as possible by drawing samples (tuples) from relevant partitions with dynamic sample size. Moreover, POAS applies some statistical approaches to calculate estimates from relevant partitions. We have implemented POAS and conducted an extensive experiments study on the TPC-H benchmark for skewed data distribution. Our results demonstrate the efficiency and effectiveness of POAS.

international conference on pervasive computing | 2012

Multi-objective virtual machine selection for migrating in virtualized data centers

Aibo Song; Wei Fan; Wei Wang; Junzhou Luo; Yuchang Mo

With the increasing deployment of large-scale virtualized datacenters, using virtual machine (VM) migration technology to consolidate VMs is becoming very important for improving the efficiency of data center. The primary prerequisite for VM consolidation is to determine the best candidate VM for migration, and the most previous work targets only on optimizing single objective in VM selection. In this paper, we first propose a multi-objective optimization model based on detailed analysis of the impact of CPU temperature, resource usage and power consumption in VM selection. We then develop a VM selection algorithm to optimize the synthesized effect of VM migration, which will ultimately improve the system performance of physical machines (PMs). We further evaluate our algorithm by comprehensive experiments based on VM monitor Xen, and the results show that it can achieve the best tradeoffs among the resource usage, CPU temperature and power consumption of data center.

Tsinghua Science & Technology | 2016

Efficient Location-Aware Data Placement for Data-Intensive Applications in Geo-distributed Scientific Data Centers

Jinghui Zhang; Jian Chen; Junzhou Luo; Aibo Song

Recent developments in cloud computing and big data have spurred the emergence of data-intensive applications for which massive scientific datasets are stored in globally distributed scientific data centers that have a high frequency of data access by scientists worldwide. Multiple associated data items distributed in different scientific data centers may be requested for one data processing task, and data placement decisions must respect the storage capacity limits of the scientific data centers. Therefore, the optimization of data access cost in the placement of data items in globally distributed scientific data centers has become an increasingly important goal. Existing data placement approaches for geo-distributed data items are insufficient because they either cannot cope with the cost incurred by the associated data access, or they overlook storage capacity limitations, which are a very practical constraint of scientific data centers. In this paper, inspired by applications in the field of high energy physics, we propose an integer-programming-based data placement model that addresses the above challenges as a Non-deterministic Polynomial-time (NP)-hard problem. In addition we use a Lagrangian relaxation based heuristics algorithm to obtain ideal data placement solutions. Our simulation results demonstrate that our algorithm is effective and significantly reduces overall data access cost.

Explore More