Is this you? Create Your Porfile

Bormin Huang

University of Wisconsin-Madison

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bormin Huang is active.

Explore More

Publication

Featured researches published by Bormin Huang.

Archive | 2011

Satellite Data Compression

Bormin Huang

Satellite Data Compression covers recent progress in compression techniques for multispectral, hyperspectral and ultra spectral data. A survey of recent advances in the fields of satellite communications, remote sensing and geographical information systems is included. Satellite Data Compression, contributed by leaders in this field, is the first book available on satellite data compression. It covers onboard compression methodology and hardware developments in several space agencies. Case studies are presented on recent advances in satellite data compression techniques via various prediction-based, lookup-table-based, transform-based, clustering-based, and projection-based approaches. This book provides valuable information on state-of-the-art satellite data compression technologies for professionals and students who are interested in this topic. Satellite Data Compression is designed for a professional audience comprised of computer scientists working in satellite communications, sensor system design, remote sensing, data receiving, airborne imaging and geographical information systems (GIS). Advanced-level students and academic researchers will also benefit from this book.

ieee international conference on high performance computing data and analytics | 2014

Optimizing zonal advection of the Advanced Research WRF (ARW) dynamics for Intel MIC

Jarno Mielikainen; Bormin Huang; Allen Huang

The Weather Research and Forecast (WRF) model is the most widely used community weather forecast and research model in the world. There are two distinct varieties of WRF. The Advanced Research WRF (ARW) is an experimental, advanced research version featuring very high resolution. The WRF Nonhydrostatic Mesoscale Model (WRF-NMM) has been designed for forecasting operations. WRF consists of dynamics code and several physics modules. The WRF-ARW core is based on an Eulerian solver for the fully compressible nonhydrostatic equations. In the paper, we will use Intel Intel Many Integrated Core (MIC) architecture to substantially increase the performance of a zonal advection subroutine for optimization. It is of the most time consuming routines in the ARW dynamics core. Advection advances the explicit perturbation horizontal momentum equations by adding in the large-timestep tendency along with the small timestep pressure gradient tendency. We will describe the challenges we met during the development of a high-speed dynamics code subroutine for MIC architecture. Furthermore, lessons learned from the code optimization process will be discussed. The results show that the optimizations improved performance of the original code on Xeon Phi 5110P by a factor of 2.4x.

ieee international conference on high performance computing data and analytics | 2014

Intel Many Integrated Core (MIC) architecture optimization strategies for a memory-bound Weather Research and Forecasting (WRF) Goddard microphysics scheme

Jarno Mielikainen; Bormin Huang; Allen Huang

The Goddard cloud microphysics scheme is a sophisticated cloud microphysics scheme in the Weather Research and Forecasting (WRF) model. The WRF is a widely used weather prediction system in the world. It development is a done in collaborative around the globe. The Goddard microphysics scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. Compared to the earlier microphysics schemes, the Goddard scheme incorporates a large number of improvements. Thus, we have optimized the code of this important part of WRF. In this paper, we present our results of optimizing the Goddard microphysics scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The Intel MIC is capable of executing a full operating system and entire programs rather than just kernels as the GPU do. The MIC coprocessor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of MICs will require using some novel optimization techniques. Those optimization techniques are discusses in this paper. The results show that the optimizations improved performance of the original code on Xeon Phi 7120P by a factor of 4.7x. Furthermore, the same optimizations improved performance on a dual socket Intel Xeon E5-2670 system by a factor of 2.8x compared to the original code.

data compression communications and processing | 2014

Using Intel Xeon Phi to accelerate the WRF TEMF planetary boundary layer scheme

Jarno Mielikainen; Bormin Huang; Allen Huang

The Weather Research and Forecasting (WRF) model is designed for numerical weather prediction and atmospheric research. The WRF software infrastructure consists of several components such as dynamic solvers and physics schemes. Numerical models are used to resolve the large-scale flow. However, subgrid-scale parameterizations are for an estimation of small-scale properties (e.g., boundary layer turbulence and convection, clouds, radiation). Those have a significant influence on the resolved scale due to the complex nonlinear nature of the atmosphere. For the cloudy planetary boundary layer (PBL), it is fundamental to parameterize vertical turbulent fluxes and subgrid-scale condensation in a realistic manner. A parameterization based on the Total Energy – Mass Flux (TEMF) that unifies turbulence and moist convection components produces a better result that the other PBL schemes. For that reason, the TEMF scheme is chosen as the PBL scheme we optimized for Intel Many Integrated Core (MIC), which ushers in a new era of supercomputing speed, performance, and compatibility. It allows the developers to run code at trillions of calculations per second using the familiar programming model. In this paper, we present our optimization results for TEMF planetary boundary layer scheme. The optimizations that were performed were quite generic in nature. Those optimizations included vectorization of the code to utilize vector units inside each CPU. Furthermore, memory access was improved by scalarizing some of the intermediate arrays. The results show that the optimization improved MIC performance by 14.8x. Furthermore, the optimizations increased CPU performance by 2.6x compared to the original multi-threaded code on quad core Intel Xeon E5-2603 running at 1.8 GHz. Compared to the optimized code running on a single CPU socket the optimized MIC code is 6.2x faster.

ieee international conference on high performance computing data and analytics | 2015

Revisiting Intel Xeon Phi optimization of Thompson cloud microphysics scheme in Weather Research and Forecasting (WRF) model

Jarno Mielikainen; Bormin Huang; Allen Huang

The Thompson cloud microphysics scheme is a sophisticated cloud microphysics scheme in the Weather Research and Forecasting (WRF) model. The scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. Compared to the earlier microphysics schemes, the Thompson scheme incorporates a large number of improvements. Thus, we have optimized the speed of this important part of WRF. Intel Many Integrated Core (MIC) ushers in a new era of supercomputing speed, performance, and compatibility. It allows the developers to run code at trillions of calculations per second using the familiar programming model. In this paper, we present our results of optimizing the Thompson microphysics scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The coprocessor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of MICs will require using some novel optimization techniques. New optimizations for an updated Thompson scheme are discusses in this paper. The optimizations improved the performance of the original Thompson code on Xeon Phi 7120P by a factor of 1.8x. Furthermore, the same optimizations improved the performance of the Thompson on a dual socket configuration of eight core Intel Xeon E5-2670 CPUs by a factor of 1.8x compared to the original Thompson code.

data compression communications and processing | 2015

Optimizing the updated Goddard shortwave radiation Weather Research and Forecasting (WRF) scheme for Intel Many Integrated Core (MIC) architecture

Jarno Mielikainen; Bormin Huang; Allen Huang

Intel Many Integrated Core (MIC) ushers in a new era of supercomputing speed, performance, and compatibility. It allows the developers to run code at trillions of calculations per second using the familiar programming model. In this paper, we present our results of optimizing the updated Goddard shortwave radiation Weather Research and Forecasting (WRF) scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The co-processor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of Xeon Phi will require using some novel optimization techniques. Those optimization techniques are discusses in this paper. The results show that the optimizations improved performance of the original code on Xeon Phi 7120P by a factor of 1.3x.

data compression communications and processing | 2014

Optimizing Weather and Research Forecast (WRF) Thompson cloud microphysics on Intel Many Integrated Core (MIC)

Jarno Mielikainen; Bormin Huang; Allen Huang

The Thompson cloud microphysics scheme is a sophisticated cloud microphysics scheme in the Weather Research and Forecasting (WRF) model. The scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. Compared to the earlier microphysics schemes, the Thompson scheme incorporates a large number of improvements. Thus, we have optimized the speed of this important part of WRF. Intel Many Integrated Core (MIC) ushers in a new era of supercomputing speed, performance, and compatibility. It allows the developers to run code at trillions of calculations per second using the familiar programming model. In this paper, we present our results of optimizing the Thompson microphysics scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The coprocessor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of MICs will require using some novel optimization techniques. Those optimization techniques are discusses in this paper. The results show that the optimization improved MIC performance by 3.4x. Furthermore, the optimized MIC code is 7.0x faster than the optimized multi-threaded code on the four CPU cores of a single socket Intel Xeon E5-2603 running at 1.8 GHz.

ieee international conference on high performance computing data and analytics | 2013

GPU acceleration experience with RRTMG long wave radiation model

Erik Price; Jarno Mielikainen; Bormin Huang; Hung-Lung Allen Huang; Tsengdar Lee

An Atmospheric radiative transfer model calculates radiative transfer of electromagnetic radiation through a planetary atmosphere. Both shortwave radiance and longwave radiance parameterizations in an atmospheric model calculate radiation fluxes and heating rates in the earth-atmospheric system. One radiative transfer model is the rapid radiative transfer model (RRTM), which calculates of longwave and shortwave atmospheric radiative fluxes and heating rates. Longwave broadband radiative transfer code for general circulation model (GCM) applications, RRTMG, is based on the single-column reference code, RRTM. The RRTMG is a validated, correlated k-distribution band model for the calculation of longwave and shortwave atmospheric radiative fluxes and heating rates. The focus of this paper is on the RRTMG long wave (RRTMG_LW) model. In order to improve computational efficiency, RRTMG_LW incorporates several modifications compared to RRTM. In RRTM_LW there are 16 g points in each of the spectral bands for a total of 256 g points. In RRTMG_LW, the number of g points in each spectral band varies from 2 to 16 depending on the absorption in each band. RRTMG_LW employs a computationally efficient correlated-k method for radiative transfer calculations. It contains 16 spectral bands with various number of quadrature points (g points) in each of the bands. In total, there are 140 g points. The radiative effects of all significant atmospheric gases are included in RRTMG_LW. Active gas absorbers include H2O, O3, CO2, CH4, N2O, O2 and four types of halocarbons: CFC-11, CFC-12, CFC-22, and CCL4. RRTMG_LW also treats the absorption and scattering from liquid and ice clouds and aerosols. For cloudysky radiative transfer, a maximum-random cloud overlapping scheme is used. Small scale cloud variability, such as cloud fraction and the vertical overlap of clouds can be represented using a statistical technique in RRTMG_LW. Due to its accuracy, RRTMG_LW has been implemented operationally in many weather forecast and climate models. RRTMG_LW is in operational use in ECMWF weather forecast system, the NCEP global forecast system, the ECHAM5 climate model, Community Earth System Model (CESM) and the weather and forecasting (WRF) model. RRTMG_LW has also been evaluated for use in GFDL climate model. In this paper, we examine the feasibility of using graphics processing units (GPUs) to accelerate the RRTMG_LW as used by the WRF. GPUs can provide a substantial improvement in RRTMG speed by supporting the parallel computation of large numbers of independent radiative calculations. Furthermore, using commodity GPUs for accelerating RRTMG_LW allows getting a much higher computational performance at lower price point than traditional CPUs. Furthermore, power and cooling costs are significantly reduced by using GPUs. A GPU-compatible version of RRTMG was implemented and thorough testing was performed to ensure that the original level of accuracy is retained. Our results show that GPUs can provide significant speedup over conventional CPUs. In particular, Nvidia’s GTX 680 GPU card can provide a speedup of 69x for the compared to its single-threaded Fortran counterpart running on Intel Xeon E5-2603 CPU.

ieee international conference on high performance computing data and analytics | 2015

Performance tuning Weather Research and Forecasting (WRF) Goddard longwave radiative transfer scheme on Intel Xeon Phi

Jarno Mielikainen; Bormin Huang; Allen Huang

Next-generation mesoscale numerical weather prediction system, the Weather Research and Forecasting (WRF) model, is a designed for dual use for forecasting and research. WRF offers multiple physics options that can be combined in any way. One of the physics options is radiance computation. The major source for energy for the earths climate is solar radiation. Thus, it is imperative to accurately model horizontal and vertical distribution of the heating. Goddard solar radiative transfer model includes the absorption duo to water vapor,ozone, ozygen, carbon dioxide, clouds and aerosols. The model computes the interactions among the absorption and scattering by clouds, aerosols, molecules and surface. Finally, fluxes are integrated over the entire longwave spectrum.In this paper, we present our results of optimizing the Goddard longwave radiative transfer scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The coprocessor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of MICs will require using some novel optimization techniques. Those optimization techniques are discusses in this paper. The optimizations improved the performance of the original Goddard longwave radiative transfer scheme on Xeon Phi 7120P by a factor of 2.2x. Furthermore, the same optimizations improved the performance of the Goddard longwave radiative transfer scheme on a dual socket configuration of eight core Intel Xeon E5-2670 CPUs by a factor of 2.1x compared to the original Goddard longwave radiative transfer scheme code.

ieee international conference on high performance computing data and analytics | 2014

Initial results on computational performance of Intel Many Integrated Core (MIC) architecture: implementation of the Weather and Research Forecasting (WRF) Purdue-Lin microphysics scheme

Jarno Mielikainen; Bormin Huang; Allen Huang

Purdue-Lin scheme is a relatively sophisticated microphysics scheme in the Weather Research and Forecasting (WRF) model. The scheme includes six classes of hydro meteors: water vapor, cloud water, raid, cloud ice, snow and graupel. The scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. In this paper, we accelerate the Purdue Lin scheme using Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi is a high performance coprocessor consists of up to 61 cores. The Xeon Phi is connected to a CPU via the PCI Express (PICe) bus. In this paper, we will discuss in detail the code optimization issues encountered while tuning the Purdue-Lin microphysics Fortran code for Xeon Phi. In particularly, getting a good performance required utilizing multiple cores, the wide vector operations and make efficient use of memory. The results show that the optimizations improved performance of the original code on Xeon Phi 5110P by a factor of 4.2x. Furthermore, the same optimizations improved performance on Intel Xeon E5-2603 CPU by a factor of 1.2x compared to the original code.

Explore More