Is this you? Create Your Porfile

Jianjiang Li

University of Science and Technology Beijing

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jianjiang Li is active.

Explore More

Publication

Featured researches published by Jianjiang Li.

Science in China Series F: Information Sciences | 2010

OpenMP compiler for distributed memory architectures

Jue Wang; Changjun Hu; Jilin Zhang; Jianjiang Li

OpenMP is an emerging industry standard for shared memory architectures. While OpenMP has advantages on its ease of use and incremental programming, message passing is today still the most widely-used programming model for distributed memory architectures. How to effectively extend OpenMP to distributed memory architectures has been a hot spot. This paper proposes an OpenMP system, called KLCoMP, for distributed memory architectures. Based on the “partially replicating shared arrays” memory model, we propose an algorithm for shared array recognition based on the inter-procedural analysis, optimization technique based on the producer/consumer relationship, and communication generation technique for nonlinear references. We evaluate the performance on nine benchmarks which cover computational fluid dynamics, integer sorting, molecular dynamics, earthquake simulation, and computational chemistry. The average scalability achieved by KLCoMP version is close to that achieved by MPI version. We compare the performance of our translated programs with that of versions generated for Omni+SCASH, LLCoMP, and OpenMP(Purdue), and find that parallel applications (especially, irregular applications) translated by KLCoMP can achieve more effective performance than other versions.

network and parallel computing | 2008

Automatic Transformation for Overlapping Communication and Computation

Changjun Hu; Yewei Shao; Jue Wang; Jianjiang Li

Message-passing is a predominant programming paradigm for distributed memory systems. RDMA networks like infiniBand and Myrinet reduce communication overhead by overlapping communication with computation. For the overlap to be more effective, we propose a source-to-source transformation scheme by automatically restructuring message-passing codes. The extensions to control-flow graph can accurately analyze the message-passing program and help perform data-flow analysis effectively. This analysis identifies the minimal region between producer and consumer, which contains message-passing functional calls. Using inter-procedural data-flow analysis, the transformation scheme enables the overlap of communication with computation. Experiments on the well-known NAS Parallel Benchmarks show that for distributed memory systems, versions employing communication-computation overlap are faster than original programs.

parallel computing in electrical engineering | 2006

Notice of Violation of IEEE Publication Principles Communication Generation for Irregular Parallel Applications

Changjun Hu; Jing Li; Jue Wang; Yonghong Li; Liang Ding; Jianjiang Li

Irregular computing significantly influences the performance of large scale parallel applications. How to generate local memory access sequence and communication set efficiently for irregular parallel application is an important issue in compiling a data parallel language into a single program multiple data (SPMD) code for distributed-memory machines. In this paper, we propose a hybrid approach, which combines the advantages of the algebra method and the integer lattice method. Our approach derives an algebraic solution of communication set enumeration at compile time for the situation of irregular array references in nested loops. Based on the integer lattice, we develop our method for global-to-local and local-to-global index translations in the case of alignment and cyclic (k) distribution. Then, we present our algorithm for the corresponding SPMD code generation, which adopts some communication optimization techniques. In our method, when parameters are known, the communication set generation, the global-to-local and local-to-global index translations, and the SPMD code generation can be completed at compile time without inspector phase of run time

Future Generation Computer Systems | 2017

A data-check based distributed storage model for storing hot temporary data

Jianjiang Li; Peng Zhang; Yuance Li; Wei Chen; Yajun Liu; Lizhe Wang

Abstract For the purpose of ensuring data security, traditional systems have widely used redundancy backup to store multiple copies of data. Multiple copies technology has high reliability, but also has the disadvantage of high redundancy storage and low space utilization. On the contrary, EC (Erasure Coding) technology has a high utilization rate of storage space, but the overhead of coding, decoding and data reconstruction is great. So, this paper demonstrates a data backup method based on XOR checksum being suitable for storing hot temporary data, which first splits the data into two parts and then performs the XOR operation of the two parts to generate another part of the data. Finally, the XOR checksum stores the three data parts into different nodes. The checksum not only ensures the security of data but also saves the storage space, thus improving the performance of reading and writing. This strategy achieves a mutual backup between the three nodes in order to ensure data security. Because there is only one copy of original data in the system, this model resolves the data inconsistency problem reasonably and simplifies the data version control existing in the redundancy backup model. Actual data test results show that, compared with the current mainstream Cassandra redundant backup model, the performance of a data backup model based on the XOR checksum proposed and implemented in this paper has been significantly improved: the reading performance improves by an average of 10%, and the writing performance improves by an average of 30%.

Future Generation Computer Systems | 2010

Message scheduling for array re-decomposition on distributed memory systems

Jue Wang; Changjun Hu; Jilin Zhang; Jianjiang Li

For many parallel applications on distributed memory systems, array re-decomposition is usually required to enhance data locality and reduce the communication overheads. How to effectively schedule messages to improve the performance of array re-decomposition has received much attention in recent years. This paper is devoted to develop efficient scheduling algorithms using the compiling information provided by array distribution patterns, array alignment patterns and the periodic property of array accesses. Our algorithms not only avoid inter-processor contention, but also reduces real communication cost and communication generation time. The experimental results show that the performance of array redecomposition can be significantly improved using our algorithms

international conference on parallel architectures and compilation techniques | 2007

A New Parallel Gauss-Seidel Method by Iteration Space Alternate Tiling

Changjun Hu; Jilin Zhang; Jue Wang; Jianjiang Li; Liang Ding

To take advantage of the supercomputing resource with multiple processors, several parallel versions of the Gauss-Seidel (SOR) method have been proposed. In the present study, a new parallel Gauss-Seidel algorithm is developed based on domain decomposition and convergence iteration space alternate tiling method for solution of system of linear equations related to finite difference discretization of partial differential equations. The goal of this method is to improve three different performance aspects: inter-iteration data locality, intra-iteration data locality and parallelism. Intra-iteration locality refers to cache locality upon data reuse within convergence iteration, and inter-iteration locality refers to cache locality upon data reuse between convergence iterations.

international conference on move to meaningful internet systems | 2007

Transforming the adaptive irregular out-of-core applications for hiding communication and disk I/O

Changjun Hu; Guangli Yao; Jue Wang; Jianjiang Li

In adaptive irregular out-of-core applications, communications and mass disk I/O operations occupy a large portion of the overall execution. This paper presents a program transformation scheme to enable overlap of communication, computation and disk I/O in this kind of applications. We take programs in inspector-executor model as starting point, and transform them to a pipeline fashion. By decomposing the inspector phase and reordering iterations, more overlap opportunities are efficiently utilized. In the experiments, our techniques are applied to two important applications i.e. Partial differential equation solver and Molecular dynamics problems. For these applications, versions employing our techniques are almost 30% faster than inspector-executor versions.

Future Generation Computer Systems | 2017

Research and implementation of a distributed transaction processing middleware

Jianjiang Li; Qian Ge; Jie Wu; Yue Li; Xiaolei Yang; Zhanning Ma

Abstract Currently, increasingly transactional requests require high-performance transaction processing systems as support. The performance of a distributed transaction processing system is superior to that of traditional single-node transaction processing system, and the characteristic of multi-node determines that distributed transaction processing systems should pay more attention to availability. For example, in traditional single-node systems, the performance of Berkeley DB is high, but its shortcoming of not supporting parallel writing across multiple nodes is weakening its availability and scalability in the distributed environment. This paper has designed and implemented a middleware-level distributed transaction processing system called POST, including a distributed database system called POSTBOX which is based on Berkeley DB and data partition, and a distributed transaction processing middleware called POSTMAN. POSTBOX inherits the availability of highly available Berkeley DB, and expands it with data partition. By Partition Replication Body (PRB), POSTBOX overcomes the native weakness of highly available Berkeley DB, which indicates that highly available Berkeley DB does not support parallel writing across multiple nodes; POSTMAN is a middleware adapting PRB. POSTMAN monitors POSTBOX in real-time via Partition Replication Body State Array (PRBSA), and ensures the correctness of transaction processing and transactions migration in the case of node failure. The actual test results show that POST possesses high availability, and has an obvious improvement of write performance compared with highly available Berkeley DB.

international workshop on openmp | 2007

OpenMP Extensions for Irregular Parallel Applications on Clusters

Jue Wang; Changjun Hu; Jilin Zhang; Jianjiang Li

Many researchers have focused on developing the techniques for the situation where data arrays are indexed through indirection arrays. However, these techniques may be ineffective for nonlinear indexing. In this paper, we propose extensions to OpenMP directives, aiming at efficient irregular OpenMP codes including nonlinear indexing to be executed in parallel. Furthermore, some optimization techniques for irregular computing are presented. These techniques include generation of communication sets and SPMD code, communication scheduling strategy, and low overhead locality transformation scheme. Finally, experimental results are presented to validate our extensions and optimization techniques.

international workshop on openmp | 2007

OpenMP Implementation of Parallel Linear Solver for Reservoir Simulation

Changjun Hu; Jilin Zhang; Jue Wang; Jianjiang Li

In this paper, we discuss an OpenMP implementation of an evolutionary LSOR method, the MBLSOR method, for solution of system of linear equations related to reservoir simulation on SMPs. MBLSOR method not only can improve the data locality by spatial computational domain decomposition technique, but it also can parallel the sub blocks with no data dependence. We compare the performance of different parallel LSOR methods in terms of efficiency and data locality. Numerical results on SMPs indicate that MBLSOR algorithm is more efficient.

Explore More