Hung-Lung Allen Huang

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hung-Lung Allen Huang is active.

Explore More

Publication

Featured researches published by Hung-Lung Allen Huang.

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing | 2011

GPU-Accelerated Multi-Profile Radiative Transfer Model for the Infrared Atmospheric Sounding Interferometer

Jarno Mielikainen; Bormin Huang; Hung-Lung Allen Huang

In this paper, we develop a novel Graphics Processing Unit (GPU)-based high-performance Radiative Transfer Model (RTM) for the Infrared Atmospheric Sounding Interferometer (IASI) launched in 2006 onboard the first European meteorological polar-orbiting satellites, METOP-A. The proposed GPU RTM processes more than one profile at a time in order to gain a significant speedup compared to the case of processing just one profile at a time. The radiative transfer model performance in operational numerical weather prediction systems nowadays still limits the number of channels they can use in hyperspectral sounders to only a few hundreds. To take the full advantage of such high resolution infrared observations, a computationally efficient radiative transfer model is needed. Our GPU-based IASI radiative transfer model is developed to run on a low-cost personal supercomputer with 4 NVIDIA Tesla C1060 GPUs with total 960 cores, delivering near 4 TFlops theoretical peak performance. The model exhibited linear scaling with the number of graphics processing units. Computing 10 IASI radiance spectra simultaneously on a GPU, we reached 763x speedup for 1 GPU and 3024x speedup for all 4 GPUs, both with respect to the original single-threaded Fortran CPU code. The significant 3024x speedup means that the proposed GPU-based high-performance forward model is able to compute one days amount of 1,296,000 IASI spectra within 6 minutes, whereas the original CPU-based version will impractically take more than 10 days. The GPU-based high-performance IASI radiative transfer model is suitable for the assimilation of the IASI radiance observations into the operational numerical weather forecast model.

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing | 2012

Improved GPU/CUDA Based Parallel Weather and Research Forecast (WRF) Single Moment 5-Class (WSM5) Cloud Microphysics

Jarno Mielikainen; Bormin Huang; Hung-Lung Allen Huang; Mitchell D. Goldberg

The Weather Research and Forecasting (WRF) model is an atmospheric simulation system which is designed for both operational and research use. WRF is currently in operational use at the National Oceanic and Atmospheric Administration (NOAA)s national weather service as well as at the air force weather agency and meteorological services worldwide. Getting weather predictions in time using latest advances in atmospheric sciences is a challenge even on the fastest super computers. Timely weather predictions are particularly useful for severe weather events when lives and property are at risk. Microphysics is a crucial but computationally intensive part of WRF. WRF Single Moment 5-class (WSM5) microphysics scheme represents fallout of various types of precipitation, condensation and thermodynamics effects of latent heat release. Therefore, to expedite the computation process, Graphics Processing Units (GPUs) appear an attractive alternative to traditional CPU architectures. In this paper, we accelerate the WSM5 microphysics scheme on GPUs and obtain a considerable speedup thereby significantly reducing the processing time. Such high performance and computationally efήcient GPUs allow us to use higher resolution WRF forecasts. The use of high resolution WRF enables us to compute microphysical processes for increasingly small clouds and water droplets. To implement WSM5 scheme on GPUs, the WRF code was rewritten into CUDA C, a high level data-parallel programming language used on NVIDIA GPU. We observed a reduction in processing time from 16928 ms on CPU to 43.5 ms on a Graphics Processing Unit (GPU). We obtained a speedup of 389× without I/O using a single GPU. Taking I/O transfer times into account, the speedup obtained is 206×. The speedup was further increased by using four GPUs, speedup being 1556× and 357× for without I/O and with I/O, respectively.

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing | 2012

GPU Acceleration of the Updated Goddard Shortwave Radiation Scheme in the Weather Research and Forecasting (WRF) Model

Jarno Mielikainen; Bormin Huang; Hung-Lung Allen Huang; Mitchell D. Goldberg

Next-generation mesoscale numerical weather prediction system, the Weather Research and Forecasting (WRF) model, is a designed for dual use for forecasting and research. WRF offers multiple physics options that can be combined in any way. One of the physics options is radiance computation. The major source for energy for the earths climate is solar radiation. Thus, it is imperative to accurately model horizontal and vertical distribution of the heating. Goddard solar radiative transfer model includes the absorption duo to water vapor, O3, O2, CO2, clouds and aerosols. The model computes the interactions among the absorption and scattering by clouds, aerosols, molecules and surface. Finally, fluxes are integrated over the entire shortwave spectrum from 0.175 μm to 10 μm. In this paper, we develop an efficient graphics processing unit (GPU) based Goddard shortwave radiative scheme. The GPU-based Goddard shortwave scheme was compared to a CPU-based single-threaded counterpart on a computational domain of 422 × 297 horizontal grid points with 34 vertical levels. Both the original FORTRAN code on CPU and CUDA C code on GPU use double precision floating point values for computation. Processing time for Goddard shortwave radiance on CPU is 22106 ms. GPU accelerated Goddard shortwave radiance on 4 GPUs can be computed in 208.8 ms and 157.1 ms with and without I/O, respectively. Thus, the speedups are 116 × with data I/O and 141× without I/O on two NVIDIA GTX 590 s . Using single precision arithmetic and less accurate arithmetic modes the speedups are increased to 536× and 259×, with and without I/O, respectively.

Journal of Atmospheric and Oceanic Technology | 2013

Speeding Up the Computation of WRF Double-Moment 6-Class Microphysics Scheme with GPU

Jarno Mielikainen; Bormin Huang; Hung-Lung Allen Huang; Mitchell D. Goldberg; Ajay Mehta

AbstractThe Weather Research and Forecasting model (WRF) double-moment 6-class microphysics scheme (WDM6) implements a double-moment bulk microphysical parameterization of clouds and precipitation and is applicable in mesoscale and general circulation models. WDM6 extends the WRF single-moment 6-class microphysics scheme (WSM6) by incorporating the number concentrations for cloud and rainwater along with a prognostic variable of cloud condensation nuclei (CCN) number concentration. Moreover, it predicts the mixing ratios of six water species (water vapor, cloud droplets, cloud ice, snow, rain, and graupel), similar to WSM6. This paper describes improving the computational performance of WDM6 by exploiting its inherent fine-grained parallelism using the NVIDIA graphics processing unit (GPU). Compared to the single-threaded CPU, a single GPU implementation of WDM6 obtains a speedup of 150× with the input/output (I/O) transfer and 206× without the I/O transfer. Using four GPUs, the speedup reaches 347× and 7...

Journal of Applied Remote Sensing | 2014

Assimilation of clear sky Atmospheric Infrared Sounder radiances in short-term regional forecasts using community models

Agnes H. N. Lim; James A. Jung; Hung-Lung Allen Huang; Steven A. Ackerman; Jason A. Otkin

Abstract Regional assimilation experiments of clear-sky Atmospheric Infrared Sounder (AIRS) radiances were performed using the gridpoint statistical interpolation three-dimensional variational assimilation system coupled to the weather research and forecasting model. The data assimilation system and forecast model used in this study are separate community models; it cannot be assumed that the coupled systems work optimally. Tuning was performed on the data assimilation system and forecast model. Components tuned included the background error covariance matrix, the satellite radiance bias correction, the quality control procedures for AIRS radiances, the forecast model resolution, and the infrared channel selection. Assimilation metrics and diagnostics from the assimilation system were used to identify problems when combining separate systems. Forecasts initiated from analyses after assimilation were verified with model analyses, rawinsondes, nonassimilated satellite radiances, and 24 h–accumulated precipitation. Assimilation of clear sky AIRS radiances showed the largest improvement in temperature and radiance brightness temperature bias when compared with rawinsondes and satellite observations, respectively. Precipitation skill scores displayed minor changes with AIRS radiance assimilation. The 00 and 12 coordinated universal time (UTC) forecasts were typically of better quality than the 06 and 18 UTC forecasts, possibly due to the amount of AIRS data available for each assimilation cycle.

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing | 2014

GPU-Accelerated Longwave Radiation Scheme of the Rapid Radiative Transfer Model for General Circulation Models (RRTMG)

Erik Price; Jarno Mielikainen; Melin Huang; Bormin Huang; Hung-Lung Allen Huang; Tsengdar Lee

Atmospheric radiative transfer models calculate radiative transfer of electromagnetic radiation through a planetary atmosphere. One of such models is the rapid radiative transfer model (RRTM), which evaluates longwave and shortwave atmospheric radiative fluxes and heating rates. The RRTM for general circulation models (GCMs), RRTMG, is an accelerated version based on the single-column reference of RRTM. The longwave radiation scheme of RRTM for GCMs (RRTMG_LW) is one model that utilizes the correlated-k approach to calculate longwave fluxes and heating rates for application to GCMs. In this paper, the feasibility of using graphics processing units (GPUs) to accelerate the in weather research and forecasting (WRF) model is examined. GPUs allow a substantial performance improvement in RRTMG_LW with a large number of parallel compute cores at low cost and power. Our GPU version of RRTMG_LW yields the bit-exact outputs as its original Fortran code. Our results show that NVIDIAs K40 GPU achieves a speedup of x as compared to its CPU counterpart running on one CPU core of Intel Xeon E5-2603, whereas the speedup for one CPU socket (4 cores) of the Xeon E5-2603 with respect to one CPU core is only 3.2×.

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing | 2012

GPU Implementation of Stony Brook University 5-Class Cloud Microphysics Scheme in the WRF

Jarno Mielikainen; Bormin Huang; Hung-Lung Allen Huang; Mitchell D. Goldberg

The Weather Research and Forecasting (WRF) model is a next-generation mesoscale numerical weather prediction system. It is designed to serve the needs of both operational forecasting and atmospheric research for a broad spectrum of applications across scales ranging from meters to thousands of kilometers. Microphysics plays an important role in weather and climate prediction. Microphysics includes explicitly resolved water vapor, cloud, and precipitation processes. Several bulk water microphysics schemes are available within the WRF, with different numbers of simulated hydrometeor classes and methods for estimating their size, fall speeds, distributions and densities. Stony Brook University scheme is a 5-class scheme with riming intensity predicted to account for the mixed-phase processes. In this paper, we develop an efficient Graphics Processing Unit (GPU) based Stony Brook University scheme. The GPU-based Stony Brook University scheme was compared to a CPU-based single-threaded counterpart on a computational domain of 422 × 297 horizontal grid points with 34 vertical levels. The original Fortran code was first rewritten into a standard C code. After that, C code was verified against Fortran code and CUDA C extensions were added for data parallel execution on GPUs. On a single GPU, we achieved a speed-up of 213× with data I/O and 896 × without I/O on NVIDIA GTX 590. Using multiple GPUs, a speed-up of 352 × is achieved with I/O for 4 GPUs. We will also discuss how data I/O will be less cumbersome if we ran the complete WRF model on GPUs.

Journal of Applied Remote Sensing | 2011

Accelerating the RTTOV-7 IASI and AMSU-A radiative transfer models on graphics processing units: evaluating central processing unit/graphics processing unit-hybrid and pure-graphics processing unit approaches

Jarno Mielikainen; Bormin Huang; Hung-Lung Allen Huang; Roger Saunders

The radiative transfer for television operational vertical sounder (RTTOV) is a widely-used radiative transfer model (RTM) for calculation of radiances for satellite infrared and microwave sensors, including the 8461-channel infrared atmospheric sounding interferometer (IASI) and the 15-band Advanced Microwave Sounding Unit-A (AMSU-A). In the era of hyperspectral sounders with thousands of spectral channels, the computation of the RTM becomes more time-consuming. The RTM performance in operational numerical weather prediction systems still limits the number of used channels in hyperspectral sounders to only a few hundred. To take full advantage of such high-resolution infrared observations, a computationally efficient radiative transfer model is needed to facilitate satellite data assimilation. In this paper, we develop the parallel implementation of the RTTOV-7 IASI and AMSU-A RTMs to run the predictor module on CPUs in pipeline with the transmittance and radiance modules on NVIDIA many-core graphics processing units (GPUs). We show that concurrent execution of RTTOV-7 IASI RTM on CPU and GPU, in addition to asynchronous data transfer from CPU to GPU, allows the GPU accelerated code running on the 240-core NVIDIA Tesla C1060 to reach a speedup of 461× and 1793× for 1- and 4-GPU configurations, respectively. To compute one days amount of 1,296,000 IASI spectra, the CPU code running on the host AMD Phenom II X4 940 CPU core with 3.0 GHz will take 2.8 days. Thus, GPU acceleration reduced running time to 8.75 and 2.25 min on 1- and 4-GPU configurations, respectively. Speedup for the RTTOV AMSU-A RTM varied from 29× to 75× for 1 and 4 GPUs, respectively. To further boost the speedup of a multispectral RTM, we developed a novel pure-GPU version of the RTTOV AMSU-A RTM where the predictor module also runs on GPUs to achieve a 96% reduction in the host-to-device data transfer. The speedups for the pure-GPU AMSU-A RTM are significantly increased to 56× and 125× for 1- and 4-GPU configurations, respectively.

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing | 2015

Optimizing Total Energy–Mass Flux (TEMF) Planetary Boundary Layer Scheme for Intel’s Many Integrated Core (MIC) Architecture

Jarno Mielikainen; Bormin Huang; Hung-Lung Allen Huang

In order to make use of the ever-improving microprocessor performance, the applications must be modified to take advantage of the parallelism of todays microprocessors. One such application that needs to be modernized is the weather research and forecasting (WRF) model, which is designed for numerical weather prediction and atmospheric research. The WRF software infrastructure consists of several components such as dynamic solvers and physics schemes. Numerical models are used to resolve the large-scale flow. However, subgrid-scale parameterizations are for an estimation of small-scale properties (e.g., boundary layer turbulence and convection, clouds, radiation). Those have a significant influence on the resolved scale due to the complex nonlinear nature of the atmosphere. For the cloudy planetary boundary layer (PBL), it is fundamental to parameterize vertical turbulent fluxes and subgrid-scale condensation in a realistic manner. A parameterization based on the total energy-mass flux (TEMF) that unifies turbulence and moist convection components produces a better result than other PBL schemes. Thus, we present our optimization results for the TEMF PBL scheme. Those optimizations included vectorization of the code to utilize multiple vector units inside each processor code. The optimizations improved the performance of the original TEMF code on Xeon Phi 7120P by a factor of 25.9×. Furthermore, the same optimizations improved the performance of the TEMF on a dual socket configuration of eight-core Intel Xeon E5-2670 CPUs by a factor of 8.3× compared to the original TEMF code.

Frontiers of Earth Science in China | 2013

Analysis of air quality variability in Shanghai using AOD and API data in the recent decade

Qing Zhao; Wei Gao; Weining Xiang; Runhe Shi; Chaoshun Liu; Tianyong Zhai; Hung-Lung Allen Huang; Liam E. Gumley; Kathleen I. Strabala

We use the aerosol optical depth (AOD) measured by the moderate resolution imaging spectrometer (MODIS) onboard the Terra satellite, air pollution index (API) daily data measured by the Shanghai Environmental Monitoring Center (SEMC), and the ensemble empirical mode decomposition (EEMD) method to analyze the air quality variability in Shanghai in the recent decade. The results indicate that a trend with amplitude of 1.0 is a dominant component for the AOD variability in the recent decade. During the World Expo 2010, the average AOD level reduced 30% in comparison to the long-term trend. Two dominant annual components decreased 80% and 100%. This implies that the air quality in Shanghai was remarkably improved, and environmental initiatives and comprehensive actions for reducing air pollution are effective. AOD and API variability analysis results indicate that semi-annual and annual signals are dominant components implying that the monsoon weather is a dominant factor in modulating the AOD and API variability. The variability of AOD and API in selected districts located in both downtown and suburban areas shows similar trends; i.e., in 2000 the AOD began a monotonic increase, reached the maxima around 2006, then monotonically decreased to 2011 and from around 2006 the API started to decrease till 2011. This indicates that the air quality in the entire Shanghai area, whether urban or suburban areas, has remarkably been improved. The AOD improved degrees (IDS) in all the selected districts are (8.6±1.9)%, and API IDS are (9.2±7.1)%, ranging from a minimum value of 1.5% for Putuo District to a maximum value of 22% for Xuhui District.

Explore More