Chenggang Lai | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chenggang Lai is active.

Explore More

Publication

Featured researches published by Chenggang Lai.

International Journal of Geographical Information Science | 2016

A hybrid parallel cellular automata model for urban growth simulation over GPU/CPU heterogeneous architectures

Qingfeng Guan; Xuan Shi; Miaoqing Huang; Chenggang Lai

As an important spatiotemporal simulation approach and an effective tool for developing and examining spatial optimization strategies (e.g., land allocation and planning), geospatial cellular automata (CA) models often require multiple data layers and consist of complicated algorithms in order to deal with the complex dynamic processes of interest and the intricate relationships and interactions between the processes and their driving factors. Also, massive amount of data may be used in CA simulations as high-resolution geospatial and non-spatial data are widely available. Thus, geospatial CA models can be both computationally intensive and data intensive, demanding extensive length of computing time and vast memory space. Based on a hybrid parallelism that combines processes with discrete memory and threads with global memory, we developed a parallel geospatial CA model for urban growth simulation over the heterogeneous computer architecture composed of multiple central processing units (CPUs) and graphics processing units (GPUs). Experiments with the datasets of California showed that the overall computing time for a 50-year simulation dropped from 13,647 seconds on a single CPU to 32 seconds using 64 GPU/CPU nodes. We conclude that the hybrid parallelism of geospatial CA over the emerging heterogeneous computer architectures provides scalable solutions to enabling complex simulations and optimizations with massive amount of data that were previously infeasible, sometimes impossible, using individual computing approaches.

Archive | 2013

Accelerating Mean Shift Segmentation Algorithm on Hybrid CPU/GPU Platforms

Miaoqing Huang; Liang Men; Chenggang Lai

Image segmentation is a very important step in many GIS applications. Mean shift is an advanced and versatile technique for clustering-based segmentation, and is favored in many cases because it is non-parametric. However, mean shift is very computationally intensive compared with other simple methods such as k-means. In this work, we present a hybrid design of mean shift algorithm on a computer platform consisting of both CPUs and GPUs. By taking advantages of the massive parallelism and the advanced memory hierarchy on Nvidia’s Fermi GPU and Kepler GPU, the hybrid design achieves a 30 ×speedup compared with the pure CPU implementation on the filtering step when dealing with images bigger than 4096 ×4096 pixels.

Giscience & Remote Sensing | 2014

Unsupervised image classification over supercomputers Kraken, Keeneland and Beacon

Xuan Shi; Miaoqing Huang; Haihang You; Chenggang Lai; Zhong Chen

The iterative self-organizing data analysis technique algorithm (ISODATA) was implemented over supercomputers Kraken, Keeneland and Beacon to explore scalable and high-performance solutions for image processing and analytics using emerging advanced computer architectures. When 10 classes are extracted from one 18-GB image tile, the calculation can be reduced from several hours to no more than 90 seconds when 100 CPU, GPU or MIC processors are utilized. High-performance scalability tests were further implemented over Kraken using 10,800 processors to extract various number of classes from 12 image tiles totalling 216 gigabytes. As the first geospatial computations over GPU clusters (Keeneland) and MIC clusters (Beacon), the success of this research illustrates a solid foundation for exploring the potential of scalable and high-performance geospatial computation for the next generation cyber-enabled image analytics.

high performance computing and communications | 2013

Accelerating Geospatial Applications on Hybrid Architectures

Chenggang Lai; Miaoqing Huang; Xuan Shi; Haihang You

Accelerators have become critical in the process to develop supercomputers with exascale computing capability. In this work, we examine the potential of two latest acceleration technologies, Nvidia K20 Kepler GPU and Intel Many Integrated Core (MIC) Architecture, for accelerating geospatial applications. We first apply a set of benchmarks under 3 different configurations, i.e, MPI+CPU, MPI+GPU, and MPI+MIC. This set of benchmarks include embarrassingly parallel application, loosely communicating application, and intensely communicating application. It is found that the straightforward MPI implementation on MIC cores can achieve the same amount of performance speedup as hybrid MPI+GPU implementation when the same number of processors are used. Further, we demonstrate the potentials of hardware accelerators for advancing the scientific research using an urban sprawl simulation application. The parallel implementation of the urban sprawl simulation using 16 Tesla M2090 GPUs can realize a 155× speedup compared with the single-node implementation, while achieving a good strong scalability.

high performance computing and communications | 2013

Accelerating Applications Using GPUs on Embedded Systems and Mobile Devices

Miaoqing Huang; Chenggang Lai

Graphics processing units (GPUs) are capable of achieving remarkable performance improvements for a broad range of applications. However, they have not been widely adopted in embedded systems and mobile devices as accelerators mainly due to their relatively higher power consumption compared with embedded microprocessors. In this work, we conduct a comprehensive analysis regarding the feasibility and potential of accelerating applications using GPUs in low-power domains. We use two different categories of benchmarks: (1) the Level 3 BLAS subroutines, and (2) the computer vision algorithms, i.e., mean shift image segmentation and scale-invariant feature transform (SIFT). We carried out our experiments on the Nvidia CARMA development kit, which consists of a Nvidia Tegra 3 quad-core CPU and a Nvidia Quadro 1000M GPU. It is found that the GPU can deliver a remarkable performance speedup compared with the CPU while using a significantly less energy for most benchmarks. Further we propose a hybrid approach to developing applications on platform with GPU accelerators. This approach optimally distributes workload between the parallel GPU and the sequential CPU to achieve the best performance while using the least energy.

International Journal of High Performance Computing Applications | 2017

Study of parallel programming models on computer clusters with Intel MIC coprocessors

Miaoqing Huang; Chenggang Lai; Xuan Shi; Zhijun Hao; Haihang You

Coprocessors based on the Intel Many Integrated Core (MIC) Architecture have been adopted in many high-performance computer clusters. Typical parallel programming models, such as MPI and OpenMP, are supported on MIC processors to achieve the parallelism. In this work, we conduct a detailed study on the performance and scalability of the MIC processors under different programming models using the Beacon computer cluster. Our findings are as follows. (1) The native MPI programming model on the MIC processors is typically better than the offload programming model, which offloads the workload to MIC cores using OpenMP. (2) On top of the native MPI programming model, multithreading inside each MPI process can further improve the performance for parallel applications on computer clusters with MIC coprocessors. (3) Given a fixed number of MPI processes, it is a good strategy to schedule these MPI processes to as few MIC processors as possible to reduce the cross-processor communication overhead. (4) The hybrid MPI programming model, in which data processing is distributed to both MIC cores and CPU cores, can outperform the native MPI programming model.

international parallel and distributed processing symposium | 2014

Comparison of Parallel Programming Models on Intel MIC Computer Cluster

Chenggang Lai; Zhijun Hao; Miaoqing Huang; Xuan Shi; Haihang You

Coprocessors based on Intel Many Integrated Core (MIC) Architecture have been adopted in many high-performance computer clusters. Typical parallel programming models, such as MPI and OpenMP, are supported on MIC processors to achieve the parallelism. In this work, we conduct a detailed study on the performance and scalability of the MIC processors under different programming models using the Beacon computer cluster. Followings are our findings. (1) The native MPI programming model on the MIC processors is typically better than the offload programming model, which offloads the workload to MIC cores using OpenMP, on Beacon computer cluster. (2) On top of the native MPI programming model, multithreading inside each MPI process can further improve the performance for parallel applications on computer clusters with MIC coprocessors. (3) Given a fixed number of MPI processes, it is a good strategy to schedule these MPI processes to as few MIC processors as possible to reduce the cross-processor communication overhead. (4) The hybrid MPI programming model, in which data processing is distributed to both MIC cores and CPU cores, can outperform the native MPI programming model.

international conference on signal and information processing | 2014

Parallelizing computer vision algorithms on acceleration technologies: A SIFT case study

Miaoqing Huang; Chenggang Lai

Computer vision algorithms, such as scale-invariant feature transform (SIFT), are used in many important applications, e.g., autonomous vehicle and computer-human interaction. They are typically computation intensive and require a long processing time on traditional single-core processors. In this work, we present the methodologies to parallelize the SIFT algorithm on various acceleration technologies and multicore processors, such as field-programmable gate arrays (FPGAs), graphics processing units (GPUs), and Intel Many Integrated Core (MIC) Architecture. The results show that all acceleration technologies can significantly improve the performance compared with the single-thread implementation on an Intel Core i7 processor. In particular the Nvidia Telsa K20 GPU is capable of providing a 10× speedup. Furthermore, it is found that the performance of the Intel MIC coprocessor is at the same range as the multicore CPU processor, while the optimal implementation of SIFT does not necessarily use all the computing resources on these two platforms.

Big Earth Data | 2018

Efficient utilization of multi-core processors and many-core co-processors on supercomputer beacon for scalable geocomputation and geo-simulation over big earth data

Chenggang Lai; Xuan Shi; Miaoqing Huang

Abstract Digital earth science data originated from sensors aboard satellites and platforms such as airplane, UAV, and mobile systems are increasingly available with high spectral, spatial, vertical, and temporal resolution data. When such big earth science data are processed and analyzed via geocomputation solutions, or utilized in geospatial simulation or modeling, considerable computing power and resources are necessary to complete the tasks. While classic computer clusters equipped by central processing units (CPUs) and the new computing resources of graphics processing units (GPUs) have been deployed in handling big earth data, coprocessors based on the Intel’s Many Integrated Core (MIC) Architecture are emerging and adopted in many high-performance computer clusters. This paper introduces how to efficiently utilize Intel’s Xeon Phi multicore processors and MIC coprocessors for scalable geocomputation and geo-simulation by implementing two algorithms, Maximum Likelihood Classification (MLC) and Cellular Automata (CA), on supercomputer Beacon, a cluster of MICs. Four different programming models are examined, including (1) the native model, (2) the offload model, (3) the symmetric model, and (4) the hybrid-offload model. It can be concluded that while different kinds of parallel programming models can enable big data handling efficiently, the hybrid-offload model can achieve the best performance and scalability. These different programming models can be applied and extended to other types of geocomputation to handle big earth data.

Sigspatial Special | 2017

Accelerating the calculation of minimum set of viewpoints for maximum coverage over digital elevation model data by hybrid computer architecture and systems

Chenggang Lai; Miaoqing Huang; Xuan Shi

This paper introduces how to accelerate the calculation of the minimum set of viewpoints for the maximum coverage over digital elevation model data using Intels Xeon Phi and a computer cluster equipped with Intels Many-Integrated-Core (MIC) coprocessors. This data and computation intensive process consists of a series of geocomputation tasks, including 1) the automatic generation of control viewpoints through map algebra calculation and hydrological modeling approaches; 2) the creation of the joint viewshed derived from the viewshed of all viewpoints to establish the maximum viewshed coverage of the given digital elevation model (DEM) data; and 3) the identification of a minimum set of viewpoints that cover the maximum terrain area of the joint viewshed. The parallel implementation on the hybrid computer cluster was able to achieve more than 100× performance speedup in comparison to the sequential implementation. The outcome of the computation has broad societal impacts since the research questions and solutions can be applied to real-world applications and decision-making practice.

Explore More