Po Chen
University of Wyoming
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Po Chen.
Journal of Geophysical Research | 2014
En Jui Lee; Po Chen; Thomas H. Jordan; Phillip B. Maechling; Marine A. Denolle; Gregory C. Beroza
We have successfully applied full-3-D tomography (F3DT) based on a combination of the scattering-integral method (SI-F3DT) and the adjoint-wavefield method (AW-F3DT) to iteratively improve a 3-D starting model, the Southern California Earthquake Center (SCEC) Community Velocity Model version 4.0 (CVM-S4). In F3DT, the sensitivity (Frechet) kernels are computed using numerical solutions of the 3-D elastodynamic equation and the nonlinearity of the structural inversion problem is accounted for through an iterative tomographic navigation process. More than half-a-million misfit measurements made on about 38,000 earthquake seismograms and 12,000 ambient-noise correlagrams have been assimilated into our inversion. After 26 F3DT iterations, synthetic seismograms computed using our latest model, CVM-S4.26, show substantially better fit to observed seismograms at frequencies below 0.2 Hz than those computed using our 3-D starting model CVM-S4 and the other SCEC CVM, CVM-H11.9, which was improved through 16 iterations of AW-F3DT. CVM-S4.26 has revealed strong crustal heterogeneities throughout Southern California, some of which are completely missing in CVM-S4 and CVM-H11.9 but exist in models obtained from previous crustal-scale 2-D active-source refraction tomography models. At shallow depths, our model shows strong correlation with sedimentary basins and reveals velocity contrasts across major mapped strike-slip and dip-slip faults. At middle to lower crustal depths, structural features in our model may provide new insights into regional tectonics. When combined with physics-based seismic hazard analysis tools, we expect our model to provide more accurate estimates of seismic hazards in Southern California.
IEEE Transactions on Parallel and Distributed Systems | 2014
Ping Guo; Liqiang Wang; Po Chen
This paper presents a performance modeling and optimization analysis tool to predict and optimize the performance of sparse matrix-vector multiplication (SpMV) on GPUs. We make the following contributions: 1) We present an integrated analytical and profile-based performance modeling to accurately predict the kernel execution times of CSR, ELL, COO, and HYB SpMV kernels. Our proposed approach is general, and neither limited by GPU programming languages nor restricted to specific GPU architectures. In this paper, we use CUDA-based SpMV kernels and NVIDIA Tesla C2050 for our performance modeling and experiments. According to our experiments, for 77 out of 82 test cases, the performance differences between the predicted and measured execution times are less than 9 percent; for the rest five test cases, the differences are between 9 and 10 percent. For CSR, ELL, COO, and HYB SpMV CUDA kernels, the average differences are 6.3, 4.4, 2.2, and 4.7 percent, respectively. 2) Based on the performance modeling, we design a dynamic-programming based SpMV optimal solution auto-selection algorithm to automatically report an optimal solution (i.e., optimal storage strategy, storage format(s), and execution time) for a target sparse matrix. In our experiments, the average performance improvements of the optimal solutions are 41.1, 49.8, and 37.9 percent, compared to NVIDIAs CSR, COO, and HYB CUDA kernels, respectively.
ieee international conference on cloud computing technology and science | 2010
Vedaprakash Subramanian; Liqiang Wang; En Jui Lee; Po Chen
Currently, numerically simulated synthetic seismograms are widely used by seismologists for seismological inferences. The generation of these synthetic seismograms requires large amount of computing resources, and the maintenance of these observed seismograms requires massive storage. Traditional high-performance computing platforms is inefficient to handle these applications because rapid computations are needed and large-scale datasets should be maintained. The emerging cloud computing platform provides an efficient substitute. In this paper, we introduce our experience on implementing a computational platform for rapidly computing and delivering synthetic seismograms on Windows Azure. Our experiment shows that cloud computing is an ideal platform for such kind of applications.
teragrid conference | 2011
Ping Guo; He Huang; Qichang Chen; Liqiang Wang; En Jui Lee; Po Chen
Sparse Matrix-Vector Multiplication (SpMV) is very common to scientific computing. The Graphics Processing Unit (GPU) has recently emerged as a high-performance computing platform due to its massive processing capability. This paper presents an innovative performance-model driven approach for partitioning sparse matrix into appropriate formats, and auto-tuning configurations of CUDA kernels to improve the performance of SpMV on GPUs. This paper makes the following contributions: (1) Propose an empirical CUDA performance model to predict the execution time of SpMV CUDA kernels. (2) Design and implement a model-driven partitioning framework to predict how to partition the target sparse matrix into one or more partitions and transform each partition into appropriate storage format, which is based on the fact that the different storage formats of sparse matrix can significantly affect the performance of SpMV. (3) Integrate the model-driven partitioning with our previous auto-tuning framework to automatically adjust CUDA-specific parameters to optimize performance on specific GPUs. Compared to the NVIDIAs existing implementations, our approach shows a substantial performance improvement. It has 222%, 197%, and 33% performance improvement on the average for CSR vector kernel, ELL kernel and HYB kernel, respectively.
international conference on conceptual structures | 2011
Steve R. Diersen; En Jui Lee; Diana Spears; Po Chen; Liqiang Wang
We examine the plausibility of using an Artificial Neural Network (ANN) and an Importance-Aided Neural Network (IANN) for the refinement of the structural model used to create full-wave tomography images. Specifically, we apply the machine learning techniques to classifying segments of observed data wave seismograms and synthetic data wave seismograms as either usable for iteratively refining the structural model or not usable for refinement. Segments of observed and synthetic seismograms are considered usable if they are not too di erent, a heuristic observation made by a human expert, which is considered a match. The use of the ANN and the IANN for classification of the data wave segments removes the human computational cost of the classification process and removes the need for an expert to oversee all such classifications. Our experiments on the seismic data for Southern California have shown this technique to be promising for both classification accuracy and the reduction of the time required to compute the classification of observed data wave segment and synthetic data wave segment matches.
world congress on services | 2011
Vedaprakash Subramanian; Hongyi Ma; Liqiang Wang; En Jui Lee; Po Chen
With its rapid development, cloud computing has been increasingly adopted by scientists for large-scale scientific computation. Compared to the traditional computing platforms such as cluster and supercomputer, cloud computing is more elastic in the support of real-time computation and more powerful in the management of large-scale datasets. This paper presents our experience on designing and implementing seismic source inversion on both cluster (specifically, MPI-based) and cloud computing (specifically, Amazon EC2 and Microsoft Windows Azure). Our experiment shows that applying cloud computing to seismic source inversion is feasible and has its advantages. In addition, we notice that both cluster and Amazon EC2 have obviously better performance than Windows Azure. Cloud computing is suited for real-time processing scientific applications but it (especially Azure) does not work well for tightly-coupled applications.
international conference on conceptual structures | 2012
He Huang; Liqiang Wang; En Jui Lee; Po Chen
LSQR (Sparse Equations and Least Squares) is a widely used Krylov subspace method to solve large-scale linear systems in seismic tomography. This paper presents a parallel MPI-CUDA implementation for LSQR solver. On CUDA level, our contributions include: (1) utilize CUBLAS and CUSPARSE to compute major steps in LSQR; (2) optimize memory copy between host memory and device memory; (3) develop a CUDA kernel to perform transpose SpMV without transposing the matrix in memory or preserving additional copy. On MPI level, our contributions include: (1) decompose both matrix and vector to increase parallelism; (2) design a static load balancing strategy. In our experiment, the single GPU code achieves up to 17.6x speedup with 15.7 GFlops in single precision and 15.2x speedup with 12.0 GFlops in double precision compared with the original serial CPU code. The MPI-GPU code achieves up to 3.7x speedup with 268 GFlops in single precision and 3.8x speedup with 223 GFlops in double precision on 135 MPI tasks compared with the corresponding MPI-CPU code. The MPI-GPU code scales on both strong and weak scaling tests. In addition, our parallel implementations have better performance than the LSQR subroutine in PETSc library.
international conference on conceptual structures | 2013
He Huang; John M. Dennis; Liqiang Wang; Po Chen
Abstract Least Squares with QR-factorization (LSQR) method is a widely used Krylov subspace algorithm to solve sparse rectangu- lar linear systems for tomographic problems. Traditional parallel implementations of LSQR have the potential, depending on the non-zero structure of the matrix, to have significant communication cost. The communication cost can dramatically limit the scalability of the algorithm at large core counts. We describe a scalable parallel LSQR algorithm that utilizes the particular non-zero structure of matrices that occurs in tomographic problems. In particular, we specially treat the kernel component of the matrix, which is relatively dense with a random structure, and the damping component, which is very sparse and highly structured separately. The resulting algorithm has a scalable communication volume with a bounded number of communica- tion neighbors regardless of core count. We present scaling studies from real seismic tomography datasets that illustrate good scalability up to O(10, 000) cores on a Cray XT cluster.
Computers & Geosciences | 2013
Dawei Mu; Po Chen; Liqiang Wang
We have successfully ported an arbitrary high-order discontinuous Galerkin (ADER-DG) method for solving the three-dimensional elastic seismic wave equation on unstructured tetrahedral meshes to an Nvidia Tesla C2075 GPU using the Nvidia CUDA programming model. On average our implementation obtained a speedup factor of about 24.3 for the single-precision version of our GPU code and a speedup factor of about 12.8 for the double-precision version of our GPU code when compared with the double precision serial CPU code running on one Intel Xeon W5880 core. When compared with the parallel CPU code running on two, four and eight cores, the speedup factor of our single-precision GPU code is around 12.9, 6.8 and 3.6, respectively. In this article, we give a brief summary of the ADER-DG method, a short introduction to the CUDA programming model and a description of our CUDA implementation and optimization of the ADER-DG method on the GPU. To our knowledge, this is the first study that explores the potential of accelerating the ADER-DG method for seismic wave-propagation simulations using a GPU.
Computers & Geosciences | 2013
En Jui Lee; He Huang; John M. Dennis; Po Chen; Liqiang Wang
The LSQR algorithm developed by Paige and Saunders (1982) is considered one of the most efficient and stable methods for solving large, sparse, and ill-posed linear (or linearized) systems. In seismic tomography, the LSQR method has been widely used in solving linearized inversion problems. As the amount of seismic observations increase and tomographic techniques advance, the size of inversion problems can grow accordingly. Currently, a few parallel LSQR solvers are presented or available for solving large problems on supercomputers, but the scalabilities are generally weak because of the significant communication cost among processors. In this paper, we present the details of our optimizations on the LSQR code for, but not limited to, seismic tomographic inversions. The optimizations we have implemented to our LSQR code include: reordering the damping matrix to reduce its band-width for simplifying the communication pattern and reducing the amount of communication during calculations; adopting sparse matrix storage formats for efficiently storing and partitioning matrices; using the MPI I/O functions to parallelize the date reading and result writing processes; providing different data partition strategies for efficiently using computational resources. A large seismic tomographic inversion problem, the full-3D waveform tomography for Southern California, is used to explain the details of our optimizations and examine the performance on Yellowstone supercomputer at the NCAR-Wyoming Supercomputing Center (NWSC). The results showed that the required wall time of our code for the same inversion problem is much less than that of the LSQR solver from the PETSc library (Balay et al., 1997).