Weichung Wang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Weichung Wang is active.

Explore More

Publication

Featured researches published by Weichung Wang.

Computers in Education | 2002

How Computers Facilitate English Foreign Language Learners Acquire English Abstract Words

Wenli Tsou; Weichung Wang; Hung Yi Li

Abstract This study investigated computer assisted (CAL) foreign language abstract word learning. A total of 13 commonly encountered abstract words at the elementary school level were chosen to be studied in the abstract word learning system. According to the theories in CAL, the abstract word learning system was designed to provide context for language learning as well as flexibility in learning time, paths, and modes. A total of 38 sixth graders engaged in learning with the system. It was found that students learning with the system learned significantly more abstract words than students in regular language learning class. This paper also provides suggestions for further improvements in the CAL system used in this study.

Numerical Linear Algebra With Applications | 2005

Jacobi-Davidson methods for cubic eigenvalue problems

Tsung Min Hwang; Wen-Wei Lin; Jinn-Liang Liu; Weichung Wang

This study investigates the isothermal crystallization behaviors of polypropylene-polyethylene-(1-butene) terpolymer and the adiabatically expanded polyolefin structured foams. For this purpose, butane gas was used as a physical blowing agent. Avrami equation has been used to interpret theoretically the experimental results obtained by either DSC or polarized optical microscope. It is believed that elongation induced crystallization occurring during the adiabatic expansion process has resulted in an increase in crystallization rate, eventually leading to a faster growth rate of spherulites and an increase in the nucleation density. An analysis of the foam by SEM images showed that the structure of foam is uniform (below diameter 30 m closed cell) In addition, the thermal conductivity and the compressive strength of the polyolefin structured foams was measured. The thermal conductivity of foamed resin with excellent insulation characteristics is reduced compared with unfoamed resin. The compressive strength is decreased with increase in the expansion ratio.

Medical Physics | 2011

A fast forward projection using multithreads for multirays on GPUs in medical image reconstruction.

Cheng-Ying Chou; Yi-Yen Chuo; Yukai Hung; Weichung Wang

PURPOSE Iterative reconstruction techniques hold great potential to mitigate the effects of data noise and/or incompleteness, and hence can facilitate the patient dose reduction. However, they are not suitable for routine clinical practice due to their long reconstruction times. In this work, the authors accelerated the computations by fully taking advantage of the highly parallel computational power on single and multiple graphics processing units (GPUs). In particular, the forward projection algorithm, which is not included in the close-form formulas, will be accelerated and optimized by using GPU here. METHODS The main contribution is a novel forward projection algorithm that uses multithreads to handle the computations associated with a bunch of adjacent rays simultaneously. The proposed algorithm is free of divergence and bank conflict on GPU, and benefits from data locality and data reuse. It achieves the efficiency particularly by (i) employing a tiled algorithm with three-level parallelization, (ii) optimizing thread block size, (iii) maximizing data reuse on constant memory and shared memory, and (iv) exploiting built-in texture memory interpolation capability to increase efficiency. In addition, to accelerate the iterative algorithms and the Feldkamp-Davis-Kress (FDK) algorithm on GPU, the authors apply batched fast Fourier transform (FFT) to expedite filtering process in FDK and utilize projection bundling parallelism during backprojection to shorten the execution times in FDK and the expectation-maximization (EM). RESULTS Numerical experiments conducted on an NVIDIA Tesla C1060 GPU demonstrated the superiority of the proposed algorithms in computational time saving. The forward projection, filtering, and backprojection times for generating a volume image of 512 x 512 x 512 with 360 projection data of 512 x 512 using one GPU are about 4.13, 0.65, and 2.47 s (including distance weighting), respectively. In particular, the proposed forward projection algorithm is ray-driven and its paralleli-zation strategy evolves from single-thread-for-single-ray (38.56 s), multithreads-for-single-ray (26.05 s), to multithreads-for-multirays (4.13 s). For the voxel-driven backprojection, the use of texture memory reduces the reconstruction time from 4.95 to 3.35 s. By applying the projection bundle technique, the computation time is further reduced to 2.47 s. When employing multiple GPUs, near-perfect speedups were observed as the number of GPUs increases. For example, by using four GPUs, the time for the forward projection, filtering, and backprojection are further reduced to 1.11, 0.18, and 0.66 s. The results obtained by GPU-based algorithms are virtually indistinguishable with those by CPU. CONCLUSIONS The authors have proposed a highly optimized GPU-based forward projection algorithm, as well as the GPU-based FDK and expectation-maximization reconstruction algorithms. Our compute unified device architecture (CUDA) codes provide the exceedingly fast forward projection and backprojection that outperform those using the shading languages, cell broadband engine architecture and previous CUDA implementations. The reconstruction times in the FDK and the EM algorithms were considerably shortened, and thus can facilitate their routine usage in a variety of applications such as image quality improvement and dose reduction.

Journal of Computational Physics | 2003

Numerical methods for semiconductor heterostructures with band nonparabolicity

Weichung Wang; Tsung Min Hwang; Wen-Wei Lin; Jinn-Liang Liu

This article presents numerical methods for computing bound state energies and associated wave functions of three-dimensional semiconductor heterostructures with special interest in the numerical treatment of the effect of band nonparabolicity. A nonuniform finite difference method is presented to approximate a model of a cylindrical-shaped semiconductor quantum dot embedded in another semiconductor matrix. A matrix reduction method is then proposed to dramatically reduce huge eigenvalue systems to relatively very small subsystems. Moreover, the nonparabolic band structure results in a cubic type of nonlinear eigenvalue problems for which a cubic Jacobi-Davidson method with an explicit nonequivalence deflation method are proposed to compute all the desired eigenpairs. Numerical results are given to illustrate the spectrum of energy levels and the corresponding wave functions in rather detail.

Statistics and Computing | 2013

Optimizing Latin hypercube designs by particle swarm

Ray Bing Chen; Dai Ni Hsieh; Ying Hung; Weichung Wang

Latin hypercube designs (LHDs) are widely used in many applications. As the number of design points or factors becomes large, the total number of LHDs grows exponentially. The large number of feasible designs makes the search for optimal LHDs a difficult discrete optimization problem. To tackle this problem, we propose a new population-based algorithm named LaPSO that is adapted from the standard particle swarm optimization (PSO) and customized for LHD. Moreover, we accelerate LaPSO via a graphic processing unit (GPU). According to extensive comparisons, the proposed LaPSO is more stable than existing approaches and is capable of improving known results.

Numerical Algorithms | 2000

Adaptive use of iterative methods in predictor–corrector interior point methods for linear programming

Weichung Wang; Dianne P. O'Leary

In this work we devise efficient algorithms for finding the search directions for interior point methods applied to linear programming problems. There are two innovations. The first is the use of updating of preconditioners computed for previous barrier parameters. The second is an adaptive automated procedure for determining whether to use a direct or iterative solver, whether to reinitialize or update the preconditioner, and how many updates to apply. These decisions are based on predictions of the cost of using the different solvers to determine the next search direction, given costs in determining earlier directions. We summarize earlier results using a modified version of the OB1-R code of Lustig, Marsten, and Shanno, and we present results from a predictor–corrector code PCx modified to use adaptive iteration. If a direct method is appropriate for the problem, then our procedure chooses it, but when an iterative procedure is helpful, substantial gains in efficiency can be obtained.

Optimization Methods & Software | 2012

Accelerating parallel particle swarm optimization via GPU

Yukai Hung; Weichung Wang

Particle swarm optimization (PSO) is a population-based stochastic and derivative-free method that has been used to solve various optimization problems due to its simplicity and efficiency. While solving high-dimensional or complicated problems, PSO requires a large number of particles to explore the problem domains and consequently introduces high computational costs. In this paper, we focus on the acceleration of PSO for solving box-constrained, load-balanced optimization problems by parallelization on a graphics processing unit (GPU). We propose a GPU-accelerated PSO (GPSO) algorithm by using a thread pool model and implement GPSO on a GPU. Numerical results show that the GPU architecture fits the PSO framework well by reducing computational timing, achieving high parallel efficiency and finding better optimal solutions by using a large number of particles. For example, while solving the 100-dimensional test problems with 65,536 (16×212) particles, GPSO has achieved up to 280X and 83X speedups on a NVIDIA Tesla C1060 1.30 GHz GPU relative to an Intel Xeon-X5450 3.00 GHz central processing unit running in single- and quad-core mode, respectively. GPSO provides a promising method for tackling high-dimensional and difficult optimization problems using a low-cost and many-core GPU system.

parallel computing | 2011

A CPU-GPU hybrid approach for the unsymmetric multifrontal method

Chenhan D. Yu; Weichung Wang; Dan’l Pierce

Multifrontal is an efficient direct method for solving large-scale sparse and unsymmetric linear systems. The method transforms a large sparse matrix factorization process into a sequence of factorizations involving smaller dense frontal matrices. Some of these dense operations can be accelerated by using a graphic processing unit (GPU). We analyze the unsymmetric multifrontal method from both an algorithmic and implementational perspective to see how a GPU, in particular the NVIDIA Tesla C2070, can be used to accelerate the computations. Our main accelerating strategies include (i) performing BLAS on both CPU and GPU, (ii) improving the communication efficiency between the CPU and GPU by using page-locked memory, zero-copy memory, and asynchronous memory copy, and (iii) a modified algorithm that reuses the memory between different GPU tasks and sets thresholds to determine whether certain tasks be performed on the GPU. The proposed acceleration strategies are implemented by modifying UMFPACK, which is an unsymmetric multifrontal linear system solver. Numerical results show that the CPU-GPU hybrid approach can accelerate the unsymmetric multifrontal solver, especially for computationally expensive problems.

Statistics and Computing | 2015

Minimax optimal designs via particle swarm optimization methods

Ray Bing Chen; Shin Perng Chang; Weichung Wang; Heng Chih Tung; Weng Kee Wong

Particle swarm optimization (PSO) techniques are widely used in applied fields to solve challenging optimization problems but they do not seem to have made an impact in mainstream statistical applications hitherto. PSO methods are popular because they are easy to implement and use, and seem increasingly capable of solving complicated problems without requiring any assumption on the objective function to be optimized. We modify PSO techniques to find minimax optimal designs, which have been notoriously challenging to find to date even for linear models, and show that the PSO methods can readily generate a variety of minimax optimal designs in a novel and interesting way, including adapting the algorithm to generate standardized maximin optimal designs.

Journal of Computational Physics | 2010

A parallel additive Schwarz preconditioned Jacobi-Davidson algorithm for polynomial eigenvalue problems in quantum dot simulation

Feng-Nan Hwang; Zih Hao Wei; Tsung Ming Huang; Weichung Wang

We develop a parallel Jacobi-Davidson approach for finding a partial set of eigenpairs of large sparse polynomial eigenvalue problems with application in quantum dot simulation. A Jacobi-Davidson eigenvalue solver is implemented based on the Portable, Extensible Toolkit for Scientific Computation (PETSc). The eigensolver thus inherits PETScs efficient and various parallel operations, linear solvers, preconditioning schemes, and easy usages. The parallel eigenvalue solver is then used to solve higher degree polynomial eigenvalue problems arising in numerical simulations of three dimensional quantum dots governed by Schrodingers equations. We find that the parallel restricted additive Schwarz preconditioner in conjunction with a parallel Krylov subspace method (e.g. GMRES) can solve the correction equations, the most costly step in the Jacobi-Davidson algorithm, very efficiently in parallel. Besides, the overall performance is quite satisfactory. We have observed near perfect superlinear speedup by using up to 320 processors. The parallel eigensolver can find all target interior eigenpairs of a quintic polynomial eigenvalue problem with more than 32 million variables within 12 minutes by using 272 Intel 3.0GHz processors.

Explore More