En Jui Lee
University of Wyoming
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by En Jui Lee.
Journal of Geophysical Research | 2014
En Jui Lee; Po Chen; Thomas H. Jordan; Phillip B. Maechling; Marine A. Denolle; Gregory C. Beroza
We have successfully applied full-3-D tomography (F3DT) based on a combination of the scattering-integral method (SI-F3DT) and the adjoint-wavefield method (AW-F3DT) to iteratively improve a 3-D starting model, the Southern California Earthquake Center (SCEC) Community Velocity Model version 4.0 (CVM-S4). In F3DT, the sensitivity (Frechet) kernels are computed using numerical solutions of the 3-D elastodynamic equation and the nonlinearity of the structural inversion problem is accounted for through an iterative tomographic navigation process. More than half-a-million misfit measurements made on about 38,000 earthquake seismograms and 12,000 ambient-noise correlagrams have been assimilated into our inversion. After 26 F3DT iterations, synthetic seismograms computed using our latest model, CVM-S4.26, show substantially better fit to observed seismograms at frequencies below 0.2 Hz than those computed using our 3-D starting model CVM-S4 and the other SCEC CVM, CVM-H11.9, which was improved through 16 iterations of AW-F3DT. CVM-S4.26 has revealed strong crustal heterogeneities throughout Southern California, some of which are completely missing in CVM-S4 and CVM-H11.9 but exist in models obtained from previous crustal-scale 2-D active-source refraction tomography models. At shallow depths, our model shows strong correlation with sedimentary basins and reveals velocity contrasts across major mapped strike-slip and dip-slip faults. At middle to lower crustal depths, structural features in our model may provide new insights into regional tectonics. When combined with physics-based seismic hazard analysis tools, we expect our model to provide more accurate estimates of seismic hazards in Southern California.
ieee international conference on cloud computing technology and science | 2010
Vedaprakash Subramanian; Liqiang Wang; En Jui Lee; Po Chen
Currently, numerically simulated synthetic seismograms are widely used by seismologists for seismological inferences. The generation of these synthetic seismograms requires large amount of computing resources, and the maintenance of these observed seismograms requires massive storage. Traditional high-performance computing platforms is inefficient to handle these applications because rapid computations are needed and large-scale datasets should be maintained. The emerging cloud computing platform provides an efficient substitute. In this paper, we introduce our experience on implementing a computational platform for rapidly computing and delivering synthetic seismograms on Windows Azure. Our experiment shows that cloud computing is an ideal platform for such kind of applications.
teragrid conference | 2011
Ping Guo; He Huang; Qichang Chen; Liqiang Wang; En Jui Lee; Po Chen
Sparse Matrix-Vector Multiplication (SpMV) is very common to scientific computing. The Graphics Processing Unit (GPU) has recently emerged as a high-performance computing platform due to its massive processing capability. This paper presents an innovative performance-model driven approach for partitioning sparse matrix into appropriate formats, and auto-tuning configurations of CUDA kernels to improve the performance of SpMV on GPUs. This paper makes the following contributions: (1) Propose an empirical CUDA performance model to predict the execution time of SpMV CUDA kernels. (2) Design and implement a model-driven partitioning framework to predict how to partition the target sparse matrix into one or more partitions and transform each partition into appropriate storage format, which is based on the fact that the different storage formats of sparse matrix can significantly affect the performance of SpMV. (3) Integrate the model-driven partitioning with our previous auto-tuning framework to automatically adjust CUDA-specific parameters to optimize performance on specific GPUs. Compared to the NVIDIAs existing implementations, our approach shows a substantial performance improvement. It has 222%, 197%, and 33% performance improvement on the average for CSR vector kernel, ELL kernel and HYB kernel, respectively.
international conference on conceptual structures | 2011
Steve R. Diersen; En Jui Lee; Diana Spears; Po Chen; Liqiang Wang
We examine the plausibility of using an Artificial Neural Network (ANN) and an Importance-Aided Neural Network (IANN) for the refinement of the structural model used to create full-wave tomography images. Specifically, we apply the machine learning techniques to classifying segments of observed data wave seismograms and synthetic data wave seismograms as either usable for iteratively refining the structural model or not usable for refinement. Segments of observed and synthetic seismograms are considered usable if they are not too di erent, a heuristic observation made by a human expert, which is considered a match. The use of the ANN and the IANN for classification of the data wave segments removes the human computational cost of the classification process and removes the need for an expert to oversee all such classifications. Our experiments on the seismic data for Southern California have shown this technique to be promising for both classification accuracy and the reduction of the time required to compute the classification of observed data wave segment and synthetic data wave segment matches.
world congress on services | 2011
Vedaprakash Subramanian; Hongyi Ma; Liqiang Wang; En Jui Lee; Po Chen
With its rapid development, cloud computing has been increasingly adopted by scientists for large-scale scientific computation. Compared to the traditional computing platforms such as cluster and supercomputer, cloud computing is more elastic in the support of real-time computation and more powerful in the management of large-scale datasets. This paper presents our experience on designing and implementing seismic source inversion on both cluster (specifically, MPI-based) and cloud computing (specifically, Amazon EC2 and Microsoft Windows Azure). Our experiment shows that applying cloud computing to seismic source inversion is feasible and has its advantages. In addition, we notice that both cluster and Amazon EC2 have obviously better performance than Windows Azure. Cloud computing is suited for real-time processing scientific applications but it (especially Azure) does not work well for tightly-coupled applications.
international conference on conceptual structures | 2012
He Huang; Liqiang Wang; En Jui Lee; Po Chen
LSQR (Sparse Equations and Least Squares) is a widely used Krylov subspace method to solve large-scale linear systems in seismic tomography. This paper presents a parallel MPI-CUDA implementation for LSQR solver. On CUDA level, our contributions include: (1) utilize CUBLAS and CUSPARSE to compute major steps in LSQR; (2) optimize memory copy between host memory and device memory; (3) develop a CUDA kernel to perform transpose SpMV without transposing the matrix in memory or preserving additional copy. On MPI level, our contributions include: (1) decompose both matrix and vector to increase parallelism; (2) design a static load balancing strategy. In our experiment, the single GPU code achieves up to 17.6x speedup with 15.7 GFlops in single precision and 15.2x speedup with 12.0 GFlops in double precision compared with the original serial CPU code. The MPI-GPU code achieves up to 3.7x speedup with 268 GFlops in single precision and 3.8x speedup with 223 GFlops in double precision on 135 MPI tasks compared with the corresponding MPI-CPU code. The MPI-GPU code scales on both strong and weak scaling tests. In addition, our parallel implementations have better performance than the LSQR subroutine in PETSc library.
Computers & Geosciences | 2013
En Jui Lee; He Huang; John M. Dennis; Po Chen; Liqiang Wang
The LSQR algorithm developed by Paige and Saunders (1982) is considered one of the most efficient and stable methods for solving large, sparse, and ill-posed linear (or linearized) systems. In seismic tomography, the LSQR method has been widely used in solving linearized inversion problems. As the amount of seismic observations increase and tomographic techniques advance, the size of inversion problems can grow accordingly. Currently, a few parallel LSQR solvers are presented or available for solving large problems on supercomputers, but the scalabilities are generally weak because of the significant communication cost among processors. In this paper, we present the details of our optimizations on the LSQR code for, but not limited to, seismic tomographic inversions. The optimizations we have implemented to our LSQR code include: reordering the damping matrix to reduce its band-width for simplifying the communication pattern and reducing the amount of communication during calculations; adopting sparse matrix storage formats for efficiently storing and partitioning matrices; using the MPI I/O functions to parallelize the date reading and result writing processes; providing different data partition strategies for efficiently using computational resources. A large seismic tomographic inversion problem, the full-3D waveform tomography for Southern California, is used to explain the details of our optimizations and examine the performance on Yellowstone supercomputer at the NCAR-Wyoming Supercomputing Center (NWSC). The results showed that the required wall time of our code for the same inversion problem is much less than that of the LSQR solver from the PETSc library (Balay et al., 1997).
Computers & Geosciences | 2016
Peter Lindstrom; Po Chen; En Jui Lee
Full-3D seismic waveform tomography (F3DT) is the latest seismic tomography technique that can assimilate broadband, multi-component seismic waveform observations into high-resolution 3D subsurface seismic structure models. The main drawback in the current F3DT implementation, in particular the scattering-integral implementation (F3DT-SI), is the high disk storage cost and the associated I/O overhead of archiving the 4D space-time wavefields of the receiver- or source-side strain tensors. The strain tensor fields are needed for computing the data sensitivity kernels, which are used for constructing the Jacobian matrix in the Gauss-Newton optimization algorithm. In this study, we have successfully integrated a lossy compression algorithm into our F3DT-SI workflow to significantly reduce the disk space for storing the strain tensor fields. The compressor supports a user-specified tolerance for bounding the error, and can be integrated into our finite-difference wave-propagation simulation code used for computing the strain fields. The decompressor can be integrated into the kernel calculation code that reads the strain fields from the disk and compute the data sensitivity kernels. During the wave-propagation simulations, we compress the strain fields before writing them to the disk. To compute the data sensitivity kernels, we read the compressed strain fields from the disk and decompress them before using them in kernel calculations. Experiments using a realistic dataset in our California statewide F3DT project have shown that we can reduce the strain-field disk storage by at least an order of magnitude with acceptable loss, and also improve the overall I/O performance of the entire F3DT-SI workflow significantly. The integration of the lossy online compressor may potentially open up the possibilities of the wide adoption of F3DT-SI in routine seismic tomography practices in the near future. A new compressor is successfully integrated into F3DT-SI workflow.Disk storage in F3DT-SI is reduced by 10 times at least.I/O overhead in simulations and kernel calculations are reduced significantly.Realistic F3DT-SI inversions are now possible on small clusters.
Archive | 2015
Po Chen; En Jui Lee
This book introduces a methodology for solving the seismic inverse problem using purely numerical solutions built on 3D wave equations and which is free of the approximations or simplifications that are common in classical seismic inversion methodologies and therefore applicable to arbitrary 3D geological media and seismic source models. Source codes provided allow readers to experiment with the calculations demonstrated and also explore their own applications.
Computers & Geosciences | 2017
Dawei Mu; En Jui Lee; Po Chen
Abstract The template-matching algorithm (TMA) has been widely adopted for improving the reliability of earthquake detection. The TMA is based on calculating the normalized cross-correlation coefficient (NCC) between a collection of selected template waveforms and the continuous waveform recordings of seismic instruments. In realistic applications, the computational cost of the TMA is much higher than that of traditional techniques. In this study, we provide an analysis of the TMA and show how the GPU architecture provides an almost ideal environment for accelerating the TMA and NCC-based pattern recognition algorithms in general. So far, our best-performing GPU code has achieved a speedup factor of more than 800 with respect to a common sequential CPU code. We demonstrate the performance of our GPU code using seismic waveform recordings from the M L 6.6 Meinong earthquake sequence in Taiwan.