Jiahui Jin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jiahui Jin is active.

Explore More

Publication

Featured researches published by Jiahui Jin.

ieee/acm international symposium cluster, cloud and grid computing | 2011

BAR: An Efficient Data Locality Driven Task Scheduling Algorithm for Cloud Computing

Jiahui Jin; Junzhou Luo; Aibo Song; Fang Dong; Runqun Xiong

Large scale data processing is increasingly common in cloud computing systems like MapReduce, Hadoop, and Dryad in recent years. In these systems, files are split into many small blocks and all blocks are replicated over several servers. To process files efficiently, each job is divided into many tasks and each task is allocated to a server to deals with a file block. Because network bandwidth is a scarce resource in these systems, enhancing task data locality(placing tasks on servers that contain their input blocks) is crucial for the job completion time. Although there have been many approaches on improving data locality, most of them either are greedy and ignore global optimization, or suffer from high computation complexity. To address these problems, we propose a heuristic task scheduling algorithm called Balance-Reduce(BAR), in which an initial task allocation will be produced at first, then the job completion time can be reduced gradually by tuning the initial task allocation. By taking a global view, BAR can adjust data locality dynamically according to network state and cluster workload. The simulation results show that BAR is able to deal with large problem instances in a few seconds and outperforms previous related algorithms in term of the job completion time.

database systems for advanced applications | 2012

Improving online aggregation performance for skewed data distribution

Yuxiang Wang; Junzhou Luo; Aibo Song; Jiahui Jin; Fang Dong

Online aggregation is a commonly-used technique to response aggregation queries with the refined approximate answers (within an estimated confidence interval) quickly. However, we observe that low selectivity and inappropriate sample proportion significantly affect the online aggregation performance when the data distribution is skewed. To overcome this problem, we propose a Partition-based Online Aggregation System called POAS. In POAS, the side effect of low selectivity can be reduced by efficient pruning of unneeded data due to the partition and shuffle strategies, and the appropriate sample proportion can be achieved as far as possible by drawing samples (tuples) from relevant partitions with dynamic sample size. Moreover, POAS applies some statistical approaches to calculate estimates from relevant partitions. We have implemented POAS and conducted an extensive experiments study on the TPC-H benchmark for skewed data distribution. Our results demonstrate the efficiency and effectiveness of POAS.

high performance computing and communications | 2010

Resource Load Based Stochastic DAGs Scheduling Mechanism for Grid Environment

Fang Dong; Junzhou Luo; Aibo Song; Jiahui Jin

The dynamic feature is one of the most important differences between Grid and traditional heterogeneous distributed systems, thus the most significant challenge for task scheduling in Grid environment is how to relieve the resource performance dynamism effectively. However, the existing schedule algorithms usually suppose that computation or communication times are deterministic and static, thus they will lead to bad performance in the practical Grid environment. To address this problem, a mechanism which is used to estimate the probability distribution of task execution time based on resource load is proposed. And then a Resource Load based Stochastic DAGs Scheduling algorithm for Grid environments is introduced. The simulation results show that our mechanism can achieve a significant improvement in several metrics (such as normalized real schedule length) and can relieve the influence brought by the dynamic nature of Grid effectively.

international world wide web conferences | 2015

Querying Web-Scale Information Networks Through Bounding Matching Scores

Jiahui Jin; Samamon Khemmarat; Lixin Gao; Junzhou Luo

Web-scale information networks containing billions of entities are common nowadays. Querying these networks can be modeled as a subgraph matching problem. Since information networks are incomplete and noisy in nature, it is important to discover answers that match exactly as well as answers that are similar to queries. Existing graph matching algorithms usually use graph indices to improve the efficiency of query processing. For web-scale information networks, it may not be feasible to build the graph indices due to the amount of work and the memory/storage required. In this paper, we propose an efficient algorithm for finding the best k answers for a given query without precomputing graph indices. The quality of an answer is measured by a matching score that is computed online. To speed up query processing, we propose a novel technique for bounding the matching scores during the computation. By using bounds, we can efficiently prune the answers that have low qualities without having to evaluate all possible answers. The bounding technique can be implemented in a distributed environment, allowing our approach to efficiently answer the queries on web-scale information networks. We demonstrate the effectiveness and the efficiency of our approach through a series of experiments on real-world information networks. The result shows that our bounding technique can reduce the running time up to two orders of magnitude comparing to an approach that does not use bounds.

international conference on parallel and distributed systems | 2014

A distributed approach for top-k star queries on massive information networks

Jiahui Jin; Samamon Khemmarat; Lixin Gao; Junzhou Luo

Massive information networks, such as the knowledge graph by Google, contain billions of labeled entities. Star queries, which aim to identify an entity, given a set of related entities, are common on such networks. Answering star queries can be modeled as a graph pattern matching problem. Traditional approaches apply graph indices to accelerate the query processing. Unfortunately, it is so costly that it is nearly infeasible to build indices on billion node graphs since the time or storage complexity of most indexing techniques is super-linear to the graph size. In this paper, we propose an algorithm to identify the top-k best answers for a star query. Instead of using expensive indices, our algorithm utilizes novel bounding techniques to derive the top-k best answers efficiently. Further, the algorithm can be implemented in a distributed manner scaling to billions of entities and hundreds of machines. We demonstrate the effectiveness and the efficiency of our approach through a series of experiments on real-world information networks.

systems, man and cybernetics | 2012

Performance evaluation and analysis of SEU Cloud Computing Platform — Using general benchmarks and real world AMS application

Fang Dong; Junzhou Luo; Jiahui Jin; Yanhao Wang; Yanmin Zhu

Cloud computing, as a popular technique to support and achieve CSCW, is gaining increasing importance in recent years, where the virtualization becomes the key technique. However, although utilizing virtualization can implement more efficient and flexible resource allocation, it may also come at the cost of increased system complexity and dynamics. In order to effectively adapt to performance fluctuations for ensuring high-performance, a generic approach to predict the performance influences of cloud platforms is highly desirable. To address this request, in this paper, the major factors that affect the performance of cloud and the relevant variation discipline are evaluated and analyzed thoroughly using a series of benchmarks in SEU (Southeast University) Cloud Computing Platform, where not only a general methodology on quantifying the performance influence but also the most important impact factors are proposed. Moreover, we use a real world application, as AMS experiment, to further evaluate the relevant performance.

Journal of Computational Science | 2017

Fast multi-resource allocation with patterns in large scale cloud data center ☆

J. Y. Shi; Junzhou Luo; Fang Dong; Jiahui Jin; Jun Shen

Abstract How to achieve fast and efficient resource allocation is an important optimization problem of resource management in cloud data center. On one hand, in order to ensure the user experience of resource requesting, the system has to achieve fast resource allocation to timely process resource requests; on the other hand, in order to ensure the efficiency of resource allocation, how to allocate multi-dimensional resource requests to servers needs to be optimized, such that servers resource utilization can be improved. However, most of existing approaches focus on finding out the mapping of each specific resource request to each specific server. This makes the complexity of resource allocation problem increases with the size of data center. Thus, these approaches cannot achieve fast and efficient resource allocation for large-scale data center. To address this problem, we propose a pattern based resource allocation mechanism based on the following findings. In a real-world cloud environment, the resource requests are usually classified into limited types. Thus, the mechanism first utilizes this feature to generate pattern information, which indicates which types of resource requests are suitable to be allocated together to a server. Then, the mechanism uses the pattern information as guidelines to make fast resource allocation decision and fully utilize servers multidimensional resources. Simulation experiments based on real and synthetic traces have shown that our mechanism significantly improves systems resource utilization and reduces the overall number of used servers.

Concurrency and Computation: Practice and Experience | 2018

Skew-aware online aggregation over joins through guided sampling: Skew-aware Online Aggregation

Yuxiang Wang; Jiahui Jin; Xiaoliang Xu; Longbin Zhang

Online aggregation is a query processing technique that returns approximate answers with error guarantees (in the form of confidence intervals) continuously during the query execution process. This approach offers users a suitable tradeoff between query efficiency and accuracy. The key issue of online aggregation is how to ensure a random sample collections efficiency and effectiveness. However, the often‐used “blind” sampling method does not adequately consider dataset statistics and other useful information, leading to inefficient sampling and poor sample quality. This becomes a glaring performance issue for skewed data distribution over joins. To alleviate this problem, we utilize dataset statistics to propose a new “guided” sampling approach, which consists of a logic‐partition‐based weighted Gaussian sampling method tailored for the skewed join key, as well as a two‐level sample allocation method that applies to the skewed measured value. Extensive experiments using the TPC‐H benchmark for skewed data distribution demonstrate our solutions superior performance.

Concurrency and Computation: Practice and Experience | 2018

Cooperative storage by exploiting graph-based data placement algorithm for edge computing environment: Cooperative storage by exploiting graph-based data placement algorithm for edge computing environment

Jiahui Jin; Yunhao Li; Junzhou Luo

Edge computing is a new computing paradigm that performs data processing at the edge of the network (ie, edge servers) to lower data processing latency. Prior research significantly focused on offloading tasks from terminals to edge servers, yet most ignored how to store tasks necessary data (such as databases and pretrained machine‐learning models) on edge servers. Today, as data‐intensive tasks such as deep learning and augmented reality become common, large data storage and powerful computation resources are needed. This is a cumbersome challenge, because many lightweight edge servers have limited resources. If an edge server does not have a tasks necessary data, then it needs to offload the task to cloud datacenters or download the necessary data from the cloud. Either case could increase data processing latency. To address this problem, this paper proposes an edge‐side collaborative storage framework called Edge‐side Cooperative Storage (ECS). In ECS, edge servers collaboratively store and process data‐intensive taskss necessary data. Here, we particularly focus on how to effectively place data on ECS, using an approach that differs from existing works (that model data placement problems as linear/integer programming problems). Our work models cooperative storage as a graph and solves the data placement problem by using a graph‐based iterative algorithm. This algorithm easily extends to a distributed version, so distributed ECSworks efficiently without a centralized scheduler. We also evaluate ECSs effectiveness and convergence through simulations. Simulation results show that ECSis 2× better than a traditional nonshared storage framework in terms of the cache hit rate.

World Wide Web | 2018

GStar: an efficient framework for answering top-k star queries on billion-node knowledge graphs

Jiahui Jin; Junzhou Luo; Samamon Khemmarat; Fang Dong; Lixin Gao

Massive knowledge graphs, such as Linked Open Data or Freebase, contain billions of labeled entities and relationships. Star queries aim to identify an entity given a set of related entities, and they are common with massive knowledge graphs. It is important to find the best way to answer star queries, and we can do this by treating it as a graph pattern-matching problem. Because knowledge graphs are noisy and incomplete in nature, we must find answers that match the star pattern closely, and extract a precise match if possible. Thus, here we propose GStar, a framework to identify the top-k best answers for a star query. GStar effectively and efficiently answers top-k star queries on billion-node graphs through a novel query model, an index-free query algorithm, and a distributed query system. We evaluate GStar through experiments on real-world knowledge graphs. Experimental results show that our query model effectively answers real-life star-pattern queries; our query algorithm can answer top-k queries in a near-real-time manner without requiring expensive graph indices; and the distributed system scales well with both the graph size and number of machines used for computation.

Explore More