Is this you? Create Your Porfile

Fang Wang

National University of Defense Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Fang Wang is active.

Explore More

Publication

Featured researches published by Fang Wang.

International Journal of Modern Physics: Conference Series | 2016

CPU/GPU COMPUTING FOR AN IMPLICIT MULTI-BLOCK COMPRESSIBLE NAVIER-STOKES SOLVER ON HETEROGENEOUS PLATFORM

Liang Deng; Hanli Bai; Fang Wang; Qingxin Xu

CPU/GPU computing allows scientists to tremendously accelerate their numerical codes. In this paper, we port and optimize a double precision alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house Computational Fluid Dynamics (CFD) software on heterogeneous platform. First, we implement a full GPU version of the ADI solver to remove a lot of redundant data transfers between CPU and GPU, and then design two fine-grain schemes, namely “one-thread-one-point” and “one-thread-one-line”, to maximize the performance. Second, we present a dual-level parallelization scheme using the CPU/GPU collaborative model to exploit the computational resources of both multi-core CPUs and many-core GPUs within the heterogeneous platform. Finally, considering the fact that memory on a single node becomes inadequate when the simulation size grows, we present a tri-level hybrid programming pattern MPI-OpenMP-CUDA that merges fine-grain parallelism using OpenMP and CUDA threads with coarse-grain parallelism using MPI for inter-node communication. We also propose a strategy to overlap the computation with communication using the advanced features of CUDA and MPI programming. We obtain speedups of 6.0 for the ADI solver on one Tesla M2050 GPU in contrast to two Xeon X5670 CPUs. Scalability tests show that our implementation can offer significant performance improvement on heterogeneous platform.

Journal of Visualization | 2018

A CNN-based vortex identification method

Liang Deng; Yueqing Wang; Yang Liu; Fang Wang; Sikun Li; Jie Liu

Vortex identification and visualization are important for understanding the underlying physical mechanism of the flow field and have been intensively studied recently. Local vortex identification methods could provide results in a rapid way, but they require the choice of a suitable criterion and threshold, which leads to poor robustness. Global vortex identification methods could obtain reliable results, while they require considerable user input and are computationally intractable for large-scale data sets. To address the problems described above, we present a novel vortex identification method based on the convolutional neural network (CNN). The proposed method integrates the advantages of both the local and global vortex identification methods to achieve higher precision and recall efficiently. In specific, the proposed method firstly obtains the labels of all grid points using a global and objective vortex identification method and then samples local patches around each point in the velocity field as the inputs of CNN. After that it trains the CNN to decide whether the central points of these patches belong to vortices. By this way, our method converts the vortex identification task to a binary classification problem, which could detect vortices quickly from the flow field in an objective and robust way. Extensive experimental results demonstrate the efficacy of our proposed method, and we expect this method can replace or supplement existing traditional methods.Graphical abstract

Journal of Visualization | 2018

UIA: a uniform integrated advection algorithm for steady and unsteady piecewise linear flow field on structured and unstructured grids

Fang Wang; Yang Liu; Dan Zhao; Liang Deng; Sikun Li

Integration-based geometric method is widely used in vector field visualization. To improve the efficiency of integration advection-based visualization, we propose a uniform integrated advection (UIA) algorithm on steady and unsteady vector field according to common piecewise linear field data set analysis. UIA employs cell gradient-based interpolation along spatial and temporal direction, and transforms multi-step advection into single-step advection in association with fourth-order Runge–Kutta advection process. UIA can significantly reduce computational load, and is applicable on arbitrary grid type with cell-center/cell-vertex data structure. The experiments are performed on steady/unsteady vector fields with two-dimensional cell-center unstructured grids and three-dimensional cell-vertex grids, and also on unsteady field from fluid dynamics numerical simulation. The result shows that the proposed algorithm can significantly improve advection efficiency and reduce visualization computational time compared with fourth-order Runge–Kutta.Graphical abstract

Journal of Visualization | 2018

A novel in situ compression method for CFD data based on generative adversarial network

Yang Liu; Yueqing Wang; Liang Deng; Fang Wang; Fang Liu; Yutong Lu; Sikun Li

Abstract As one of the main technologies of in situ visualization, data compression plays a key role in solving I/O bottleneck and has been intensively studied. However, existing methods take too much compression time to meet the requirement of in situ processing on computational fluid dynamics (CFD) flow field data. To address this problem, we introduce deep learning into CFD data compression and propose a novel in situ compression method based on generative adversarial network (GAN) in this paper. In specific, the proposed method samples small patches from CFD data and trains a GAN which includes two convolutional neural networks: the discriminative network and the generative network. The discriminative network is responsible for compressing data on compute nodes, while the generative network is used to reconstruct data on visualization nodes. Compared with the existing CFD data compression methods, our method has great advantages in compression time and manages to adjust compression ratio according to acceptable reconstruction effect, showing its applicability for loosely coupled in situ visualization. Extensive experimental results demonstrate the good generalization of the proposed method on many datasets.Graphical Abstract

Journal of Visualization | 2017

An accurate vortex feature extraction method for Lagrangian vortex visualization on high-order flow field data

Fang Wang; Dan Zhao; Liang Deng; Sikun Li

As vortex is of great significance to structure analysis and mechanism research in flow field, vortex feature extraction has always been a hot research topic in flow field visualization. We investigate the data precision and numerical algorithm accuracy impact on vortex feature area extraction of Lagrangian Averaged Vorticity Deviation (LAVD). Then, an LAVD-based high-order accurate vortex extraction algorithm is proposed, which incorporates vorticity computation of cell-vertexed data by Weighted Essentially Non-Oscillatory (WENO) scheme, vorticity and velocity computation of off-grid point by candidate stencil weight-based high-order polynomial interpolation method, flow map computation by 4th-order Runge–Kutta (RK) method and integration by compound Simpson rule. We perform vortex feature extraction and visualization on both analytical flow field and high-order unsteady flow field. The experiment results demonstrate that the proposed algorithm can practically reflect the vortex feature of high-order flow field and accurately describe small scale vortex structure, which is unavailable by low-order method.Graphical Abstract

trust, security and privacy in computing and communications | 2016

Evaluating Multi-core and Many-Core Architectures through Parallelizing a High-Order WENO Solver

Liang Deng; Hanli Bai; Dan Zhao; Fang Wang

This paper studies the implementation and optimization of a high-order weighted essentially non-oscillatory (WENO) solver to the solution of the Euler equations on the multi-core and many-core architectures (Intel Ivy Bridge CPU, Intel Xeon Phi 7110P coprocessor and NVIDIA Kepler K20c GPU). The implementation of up to ninth-order accurate WENO schemes is used in the solver. For the GPU platform, both the OpenACC-based and CUDA-based versions of different WENO schemes are developed. To achieve high performance, various optimizatin techniques are used. For Ivy Bridge CPU and MIC, we focus on three categories of optimization techniques: thread parallelism for multi-/many-core scaling, data parallelism to exploit the SIMD mechanism and improving on-chip data reuse, to maximize the performance. Also, we provide an in-depth analysis on the performance differences between Ivy Bridge and MIC. The numerical experiments show that the OpenACC performance can reach up to 84% in contrast to CUDA performance with careful manual optimizations, and the proposed CUDA-based version can achieve a speedup of 9.0 on a Kepler GPU in comparison with the sequential run. We also notice that the speedups of different WENO schemes roughly reach 15.9 and 192.2 on the two Ivy Bridge CPUs and the MIC, respectively. Besides, we conduct a systematic comparison of the three platforms in three aspects: performance, programmability, and power efficiency. Our insights facilitate the programmers to select the right platform with a suitable programming model according to their target applications.

international symposium on parallel and distributed computing | 2016

Evaluating Multi-core and Many-Core Architectures through Accelerating an Alternating Direction Implicit CFD Solver

Liang Deng; Jianbin Fang; Fang Wang; Hanli Bai

In this paper, we accelerate a double-precision alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house computational fluid dynamics (CFD) software on the latest multi-core and many-core architectures (Intel Ivy Bridge CPU, Intel Xeon Phi 7110P coprocessor and NVIDIA Kepler K20c GPU). For the GPU platform, both the OpenACC-based and the CUDA-based versions of the ADI solver are developed. To achieve high performance, we use a series of optimizatin techniques. For the Ivy Bridge CPU and Xeon Phi, we focus on three categories of optimization techniques: thread parallelism for multi-/many-core scaling, data parallelism to exploit the SIMD mechanism and improving on-chip data reuse, to maximize the performance. Also, we provide an in-depth analysis on the performance differences between Ivy Bridge and Xeon Phi. Our numerical experiments show that the proposed CUDA-based ADI solver can achieve a speedup of 9.7 on a Kepler GPU in contrast to a single naive serial version and our optimization techniques can improve the performance of the ADI solver by 2.5x on two Ivy Bridge CPUs and 1.7x on the Intel Xeon Phi coprocessor. We also notice that the OpenACC-based version runs around 29% slower than the CUDA-based one with careful manual optimizations. Besides, we systematically evaluate the programmability of the three platforms. Our insights facilitate the programmers to select a right platform with a suitable programming model according to their target applications.

international conference on virtual reality and visualization | 2016

An Efficient Preprocessing and Composition Based Finite-Time Lyapunov Exponent Visualization Algorithm for Unsteady Flow Field

Fang Wang; Liang Deng; Dan Zhao; Sikun Li

Based on piecewise linear hypothesis that both the velocity and flow map gradient are constant in grid cell and adjacent time intervals, we design a time-slicing flow map preprocessing and composition based Finite-Time Lyapunov Exponent (FTLE) visualization algorithm for unsteady flow field. This algorithm is capable of reducing the computation overhead from duplicated flow map calculation and multiple spatiotemporal variable interpolation. The FTLE computation error and speedup ratios with different field size are obtained from the experiments performed on 2- and 3-dimensional unsteady flow fields. The results show that this algorithm could dramatically improve visualization efficiency and reduce execution time compared with FTLE visualization by 4th-order Runge-Kutta (RK4) based flow map and other related algorithms.

international conference on e-learning and games | 2016

UIA: A Uniform Integrated Advection Algorithm for Steady and Unsteady Piecewise Linear Flow Field on Structured and Unstructured Grids

Fang Wang; Yang Liu; Dan Zhao; Liang Deng; Sikun Li

Integration-Based geometric method is widely used in vector field visualization. In order to improve visualization efficiency based on integration advection, we propose a unified advection algorithm on steady and unsteady vector field according to common piecewise linear field data set analysis. The algorithm interpolates along spatial and temporal direction using cell gradient based method combined with advection process of 4th-order Runge-Kutta algorithm, which transforms multi-step advection into single-step advection. The algorithm can dramatically reduce computational load, and is applicable on any grid type and cell-centered/cell-vertexed data structure. The experiments are per- formed on steady/unsteady vector fields on 2-dimensional cell-centered unstructured grids and 3-dimensional cell-vertexed format grids. The result shows that the proposed algorithm can significantly improve advection efficiency and reduce visualization computational time compared with 4th-order Runge-Kutta.

international conference on computer science and network technology | 2015

Acceleration of PDE-based FTLE calculations on Intel multi-core and many-core architectures

Fang Wang; Liang Deng; Dan Zhao; Sikun Li

Finite-time Lyapunov exponent (FTLE) is widely used to extract coherent structure of unsteady flow. However, the calculation of FTLE can be highly time-consuming, which greatly limits the applications performance efficiency. In this paper, we accelerate a double precision PDE-based FTLE application for two- and three-dimensional analytical flow field on Intel multi-core and many-core architectures such as Intel Sandy Bridge and Intel Many Integrated Core (MIC)coprocessor. Through analysis of the calculation processes of FTLE and the characteristics of Intel multi-core and many-core architectures, we employ three categories of optimization techniques, namely, thread parallelism for multi-/many-core scaling, data parallelism to exploit SIMD (single-instruction multiple-data) mechanism and improving on-chip data reuse, to maximize the performance. Also, the hardware performance metrics through an open source performance analysis tool, in order to explain performance difference between Sandy Bridge and MIC, are discussed. The experiment results show that our MIC-enabled FTLE achieves about 1.8× speed-ups relative to a parallel computation on two Intel Sandy Bridge CPUs, and perfect parallel efficiency is also observed from the experiment results.

Explore More