Li Zha
Chinese Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Li Zha.
grid and cooperative computing | 2009
Chao Tian; Haojie Zhou; Yongqiang He; Li Zha
MapReduce is an important programming model for building data centers containing ten of thousands of nodes. In a practical data center of that scale, it is a common case that I/O-bound jobs and CPU-bound jobs, which demand different resources, run simultaneously in the same cluster. In the MapReduce framework, parallelization of these two kinds of job has not been concerned. In this paper, we give a new view of the MapReduce model, and classify the MapReduce workloads into three categories based on their CPU and I/O utilization. With workload classification, we design a new dynamic MapReduce workload predict mechanism, MR-Predict, which detects the workload type on the fly. We propose a Triple-Queue Scheduler based on the MR-Predict mechanism. The Triple-Queue scheduler could improve the usage of both CPU and disk I/O resources under heterogeneous workloads. And it could improve the Hadoop throughput by about 30% under heterogeneous workloads.
international parallel and distributed processing symposium | 2014
Xiaoyi Lu; Fan Liang; Bing Wang; Li Zha; Zhiwei Xu
MPI has been widely used in High Performance Computing. In contrast, such efficient communication support is lacking in the field of Big Data Computing, where communication is realized by time consuming techniques such as HTTP/RPC. This paper takes a step in bridging these two fields by extending MPI to support Hadoop-like Big Data Computing jobs, where processing and communication of a large number of key-value pair instances are needed through distributed computation models such as MapReduce, Iteration, and Streaming. We abstract the characteristics of key-value communication patterns into a bipartite communication model, which reveals four distinctions from MPI: Dichotomic, Dynamic, Data-centric, and Diversified features. Utilizing this model, we propose the specification of a minimalistic extension to MPI. An open source communication library, DataMPI, is developed to implement this specification. Performance experiments show that DataMPI has significant advantages in performance and flexibility, while maintaining high productivity, scalability, and fault tolerance of Hadoop.
network and parallel computing | 2005
Li Zha; Wei Li; Haiyan Yu; Xianghui Xie; Nong Xiao; Zhiwei Xu
The China National Grid project developed and deployed a suite of grid system software called CNGrid Software. This paper presents the features and implementation of the software suite from the viewpoints of grid system deployment, grid application developers, grid resource providers, grid system administrators, and the end users.
semantics, knowledge and grid | 2010
Weisong Hu; Chao Tian; Xiaowei Liu; Hongwei Qi; Li Zha; Huaming Liao; Yuezhuo Zhang; Jie Zhang
Map Reduce cluster is emerging as a solution of data-intensive scalable computing system. The open source implementation Hadoop has already been adopted for building clusters containing thousands of nodes. Such cloud infrastructure was used to processing many different jobs depending on different hardware resources, such as memory, CPU, Disk I/O and Network I/O, simultaneously. If the schedule policy does not consider the heterogeneity of running jobs’ resource utilization types, resource contention may happen. In this paper, we analyze this multiple job parallelization problems in Map Reduce, and propose the multiple-job optimization (MJO) scheduler. Our scheduler detects job’s resource utilization type on the fly and improves the hardware utilization by parallel different kinds of jobs. We give two scenarios which are “same plan” and “same job” to illustrate the multiple jobs’ submission traces in Map Reduce clusters. Our experiments show that in these scenarios, MJO scheduler could save the make span by about 20%.
networking architecture and storages | 2010
Jian Lin; Xiaoyi Lu; Lin Yu; Yongqiang Zou; Li Zha
In a virtual cluster based Cloud Computing environment, the sharing of infrastructure introduces two problems on user management: usability and security. Meanwhile, we observe that most conventional user management frameworks in the network environment are not fit for the scale expansion and interconnection of dynamic virtualization environment. In this paper, we propose VegaWarden, a uniform user management system to solve these problems. VegaWarden supplies a global user space for different virtual infrastructures and application services in one Cloud, and allows user system interconnection among homogeneous Cloud instances. A uniform authentication model enables the security isolation of different administrative domains, and a decentralized architecture ensures its scalability. We have implemented VegaWarden in an experimental Cloud-oriented infrastructure and a production Grid Computing environment. The functionality and performance of VegaWarden have been demonstrated.
Frontiers of Computer Science in China | 2013
Jian Lin; Li Zha; Zhiwei Xu
In the cloud age, heterogeneous application modes on large-scale infrastructures bring about the challenges on resource utilization and manageability to data centers. Many resource and runtime management systems are developed or evolved to address these challenges and relevant problems from different perspectives. This paper tries to identify the main motivations, key concerns, common features, and representative solutions of such systems through a survey and analysis. A typical kind of these systems is generalized as the consolidated cluster system, whose design goal is identified as reducing the overall costs under the quality of service premise. A survey on this kind of systems is given, and the critical issues concerned by such systems are summarized as resource consolidation and runtime coordination. These two issues are analyzed and classified according to the design styles and external characteristics abstracted from the surveyed work. Five representative consolidated cluster systems from both academia and industry are illustrated and compared in detail based on the analysis and classifications. We hope this survey and analysis to be conducive to both design implementation and technology selection of this kind of systems, in response to the constantly emerging challenges on infrastructure and application management in data centers.
Proceedings of the second international workshop on Data intensive computing in the clouds | 2011
Xiao Wei Wang; Jie Zhang; Hua Ming Liao; Li Zha
MapReduce is gaining increasing popularity as a parallel programming model for large-scale data processing. We find however some traditional MapReduce platforms have a poor performance in terms of cluster resource utilization since the traditional multi-phase parallel model and some existing schedule policies used in the cluster environment have some drawbacks. We address these problems through our experience in designing a Dynamic Split Model of the resources utilization which contains two technologies, Dynamic Resource Allocation considering the phase priority as well as job requirement when allocating resources and Resource Usage Pipeline which can assign tasks dynamically. We verify our optimization on top of Hadoop and the results show that these technologies can improve the throughput by 21.72%, the average wall time gain is 55.83%. And we improve the percentage of user CPU utilization by 12.93%, reduce the percentage of iowait CPU and idle CPU utilization by 6.61% and 6.73%. The upstream speed and downstream speed are increased by 11.3% and 23.5%. Whats more, we have relieved the Disk I/O bottleneck by 30.3%.
parallel and distributed computing: applications and technologies | 2009
Xiaoyi Lu; Yongqiang Zou; Fei Xiong; Jian Lin; Li Zha
Theoretically, multi-language clients invocating web services is no longer a problem due to XML-based interface descriptions by WSDL, but the reality is not so good. Some implementation level difficulties still exist when invoking web services from clients in different programming languages. These difficulties are caused by involving complex data structures in the service interface, carrying additional information such as WS-security headers in the SOAP messages, missing language features such as Reflection in C/C++ and so on, which make large scale multi-language SOA application development a time-consuming and buggy work. This paper proposes a new complexity ICOMC, short for Invocation Complexity Of Multi-language Clients, to quantify these difficulties, introduces implementation cost and runtime performance metrics for ICOMC, and indentifies three factors dominating the ICOMC: service interface, message context, and language feature. Consequently, the problem is formulated as finding out the correlation of the three factors to ICOMC. To simplify the problem, web services are classified into four categories: SISM, SICM, CISM and CICM according to service interface complexity and message context complexity. Furthermore, micro-benchmark experiments are done in C/C++/Java for all four categories. This paper also takes the GOS System Software of the China National Grid as a real large scale application to implement its C/C++ client APIs and compare them with the original Java APIs. Evaluations based on micro-benchmarks and real application show the correlations between the factors and ICOMC. Our results benefit web service interface designing, appropriate language adoption, and implementation cost / runtime performance estimation.
parallel and distributed computing: applications and technologies | 2008
Yongqiang Zou; Li Zha; Xiaoning Wang; Haojie Zhou; Peixu Li
Virtual organizations (VO) are widely accepted in the grid and other distributed computing environments. However, there are few effective VO implementations. This paper presents a layered architecture to construct Agora, an implementation of VO. Agora manages users, resources, and agora instances, provides policies to support a DAC/MAC-hybrid cross-domain access control mechanism, and maintains the context of operations. The Agora architecture consists of three layers. At the bottom is the physical layer containing external resources, then an abstraction RController is introduced to manipulate external resources. Above the physical layer, all the involved entities, including users, resources, and agoras, are abstracted as GNodes, and a naming layer is introduced to manage these GNodes. At the top, the logic layer implements all the Agora functionalities. This architecture has been implemented in Vega GOS and applied in the China National Grid and other grid platforms. The evaluation shows that the architecture provides minimal but sufficient VO functionalities while keeping decentralization, flexibility, simplicity, and effectiveness.
international conference on web services | 2007
Qiang Yue; Zhiwei Xu; Haiyan Yu; Wei Li; Li Zha
In this paper, we first introduce some issues that are encountered in building a service debugger and briefly describe our approach to addressing them. Next, we outline some debugging modes and components of a simple composite debugger. Then, we mainly describe its message-based front-end and back-end, which are a co-existing, self-identifying, and non- intrusive. Finally, we preset some experimental results of our latest prototype.