Jizhong Han | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jizhong Han is active.

Explore More

Publication

Featured researches published by Jizhong Han.

international conference on cluster computing | 2009

Implementing WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS

Xuhui Liu; Jizhong Han; Yunqin Zhong; Chengde Han; Xubin He

Hadoop framework has been widely used in various clusters to build large scale, high performance systems. However, Hadoop distributed file system (HDFS) is designed to manage large files and suffers performance penalty while managing a large amount of small files. As a consequence, many web applications, like WebGIS, may not take benefits from Hadoop. In this paper, we propose an approach to optimize I/O performance of small files on HDFS. The basic idea is to combine small files into large ones to reduce the file number and build index for each file. Furthermore, some novel features such as grouping neighboring files and reserving several latest version of data are considered to meet the characteristics of WebGIS access patterns. Preliminary experiment results show that our approach achieves better performance.

international conference on cluster computing | 2009

SJMR: Parallelizing spatial join with MapReduce on clusters

Shubin Zhang; Jizhong Han; Zhiyong Liu; Kai Wang; Zhiyong Xu

MapReduce is a widely used parallel programming model and computing platform. With MapReduce, it is very easy to develop scalable parallel programs to process data-intensive applications on clusters of commodity machines. However, it does not directly support heterogeneous related data sets processing, which is common in operations like spatial joins. This paper presents SJMR (Spatial Join with MapReduce), a novel parallel algorithm to relieve the problem. The strategies include strip-based plane sweeping algorithm, tile-based spatial partitioning function and duplication avoidance technology. We evalauted the performance of SJMR algorithm in various situations with the real world data sets. It demonstrates the applicability of computing-intensive spatial applications with MapReduce on small scale clusters.

networking architecture and storages | 2010

Multi-dimensional Index on Hadoop Distributed File System

Haojun Liao; Jizhong Han; Jinyun Fang

In this paper, we present an approach to construct a built-in block-based hierarchical index structures, like R-tree, to organize data sets in one, two, or higher dimensional space and improve the query performance towards the common query types (e.g., point query, range query) on Hadoop distributed file system (HDFS). The query response time for data sets that are stored in HDFS can be significantly reduced by avoiding exhaustive search on the corresponding data sets in the presence of index structures. The basic idea is to adopt the conventional hierarchical structure to HDFS, and several issues, including index organization, index node size, buffer management, and data transfer protocol, are considered to reduce the query response time and data transfer overhead through network. Experimental evaluation demonstrates that the built-in index structure can efficiently improve query performance, and serve as cornerstones for structured or semi-structured data management.

international conference on parallel and distributed systems | 2010

Accelerating Spatial Data Processing with MapReduce

Kai Wang; Jizhong Han; Bibo Tu; Jiao Dai; Wei Zhou; Xuan Song

Map Reduce is a key-value based programming model and an associated implementation for processing large data sets. It has been adopted in various scenarios and seems promising. However, when spatial computation is expressed straightforward by this key-value based model, difficulties arise due to unfit features and performance degradation. In this paper, we present methods as follows: 1) a splitting method for balancing workload, 2) pending file structure and redundant data partition dealing with relation between spatial objects, 3) a strip-based two-direction plane sweeping algorithm for computation accelerating. Based on these methods, ANN(All nearest neighbors) query and astronomical cross-certification are developed. Performance evaluation shows that the Map Reduce-based spatial applications outperform the traditional one on DBMS.

international conference on parallel and distributed systems | 2009

Accelerating MapReduce with Distributed Memory Cache

Shubin Zhang; Jizhong Han; Zhiyong Liu; Kai Wang; Shengzhong Feng

MapReduce is a partition-based parallel programming model and framework enabling easy development of scalable parallel programs on clusters of commodity machines. In order to make time-intensive applications benefit from MapReduce on small scale clusters, this paper proposes a new method to improve the performance of MapReduce by using distributed memory cache as a high speed access between map tasks and reduce tasks. Map outputs sent to the distributed memory cache can be gotten by reduce tasks as soon as possible. Experiment results show that our prototype’s performance is much better than that of the original on small scale clusters. To our knowledge, this is the first effort to accelerate MapReduce with the help of distributed memory cache.

international conference on cluster computing | 2012

SDM: A Stripe-Based Data Migration Scheme to Improve the Scalability of RAID-6

Chentao Wu; Xubin He; Jizhong Han; Huailiang Tan; Changsheng Xie

In large scale data storage systems, RAID-6 has received more attention due to its capability to tolerate concurrent failures of any two disks, providing a higher level of reliability. However, a challenging issue is its scalability, or how to efficiently expand the disks. The main reason causing this problem is the typical fault tolerant scheme of most RAID-6 systems known as Maximum Distance Separable (MDS) codes, which offer data protection against disk failures with optimal storage efficiency but they are difficult to scale. To address this issue, we propose a novel Stripe-based Data Migration (SDM) scheme for large scale storage systems based on RAID-6 to achieve higher scalability. SDM is a stripe-level scheme, and the basic idea of SDM is optimizing data movements according to the future parity layout, which minimizes the overhead of data migration and parity modification. SDM scheme also provides uniform data distribution, fast data addressing and migration. We have conducted extensive mathematical analysis of applying SDM to various popular RAID-6 coding methods such as RDP, P-Code, H-Code, HDP, X-Code, and EVENODD. The results show that, compared to existing scaling approaches, SDM decreases more than 72.7% migration I/O operations and saves the migration time by up to 96.9%, which speeds up the scaling process by a factor of up to 32.

Cluster Computing | 2014

Predictively booting nodes to minimize performance degradation of a power-aware web cluster

Yuhui Deng; Yang Hu; Xiaohua Meng; Yifeng Zhu; Zhen Zhang; Jizhong Han

With the ever increasing trend of dynamic and static content web, clusters have been widely used for large-scale web servers to improve the system scalability. Dynamically switching the cluster nodes between different power states is one effective approach to save the energy in such clusters. Many research efforts have been invested in designing power-aware clusters by using this method. However, booting a cluster node from a low-power state to an active state takes a certain amount of time that depends on different configurations. This process incurs significant performance degradation. The existing work normally trades a certain amount of performance degradation for energy saving. This paper proposes a hybrid method to predict the number of requests per booting time of the web workloads. A power-aware web cluster scheduler is designed to divide the cluster nodes into an active group and a low-power group. The scheduler attempts to minimize the active group and maximize the low-power group, and boot the cluster nodes in the low-power group in advance to minimize/eliminate performance degradation by leveraging the prediction scheme. Furthermore, this paper integrates the power awareness into the conventional load balancers including Least Connections, Deficit Round Robin, and Skew. Comprehensive experiments are performed to explore the potential opportunities to minimize/eliminate the performance degradation of the power-aware web cluster.

Journal of Network and Computer Applications | 2009

An efficient design for fast memory registration in RDMA

Li Ou; Xubin He; Jizhong Han

Remote Direct Memory Access (RDMA) improves network bandwidth and reduces latency by eliminating unnecessary copies from network interface card to application buffers, but the communication buffer management to reduce memory registration and deregistration cost is a significant challenge to be addressed. Previous studies use pin-down cache and batched deregistration, but only simple LRU is used as a replacement algorithm to manage cache space. In this paper, we evaluate the cost of memory registration in both user and kernel spaces. Based on our analysis, we reduce the overhead of communication buffer management in two aspects simultaneously: utilize a Memory Registration Region Cache (MRRC), and optimize the RDMA communication process of clients and servers with Fast RDMA Read and Write Process (FRRWP). MRRC manages memory in terms of memory region, and replaces old memory regions according to both their sizes and recency. FRRWP overlaps memory registrations between a client and a server, and allows applications to submit RDMA write operations without being blocked by message synchronization. We compare the performance of MRRC and FRRWP with traditional RDMA operations. The results show that our new design improves the total cost of memory registrations and overall communication latency by up to 70%.

international conference on geoinformatics | 2012

A distributed geospatial data storage and processing framework for large-scale WebGIS

Yunqin Zhong; Jizhong Han; Tieying Zhang; Jinyun Fang

With the rapid growth of geospatial data and concurrent users, the state-of-the-art WebGIS cannot support massive data storage and processing due to poor scalability of underlying centralized systems (e.g., native file systems and SDBMS). In this paper, we propose a novel distributed geospatial data storage and processing framework for large-scale WebGIS. Our proposal contains three significant characteristics. Firstly, a scalable cloud-based architecture is designed to provide elastic storage and computation resources of shared-nothing commodity cluster for WebGIS. Secondly, we present efficient geospatial data placement and geospatial data access refinement schemes to improve I/O efficiency. Thirdly, we propose MapReduce based localized geospatial computing model for parallel processing of massive geospatial data, which improves geospatial computation performance. We have implemented a prototype named VegaCI on top of the emerging Hadoop cloud platform. Comprehensive experiments demonstrate that our proposal is efficient and applicable in practical large-scale WebGIS.

international conference on parallel processing | 2007

Collaborative Memory Pool in Cluster System

Nan Wang; Xuhui Liu; Jin He; Jizhong Han; Lisheng Zhang; Zhiyong Xu

With the developments of network technologies, many mechanisms have been introduced to improve system performance in cluster systems by exploiting remote idle memory. However, none of them can satisfy the requirements from different applications. Most methods can only improve the performance of a particular type of applications but not for others. One important reason is they failed to provide unified interfaces. In this paper, we propose collaborative memory pool (CMP) to solve this problems. CMP brings scalability and high performance. It has five features: (1) Providing malloc-like interfaces, block device interfaces and kernel API for different applications, which benefit both user-level and kernel-level applications; (2) Retaining traditional VM mechanism, programmers and uses have the freedom to select CMP or not; (3) Improving kernel applications performance by eliminating remote swapping; (4) Avoiding loan while in debt problem with dynamic workload; (5) Providing optional memory servers to further improve performance. In our testbed with CMP-based swap devices, Qsort gets 83.28% improvement comparing with the case using disk-based swap devices.

Explore More