Yuefan Deng
Stony Brook University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yuefan Deng.
ieee international conference on high performance computing data and analytics | 2001
Osman Yasar; Yuefan Deng; R. E. Tuzun; D. Saltz
This paper describes the ATLAS (Automatically Tuned Linear Algebra Software) project, as well as the fundamental principles that underly it. ATLAS is an instantiation of a new paradigm in high performance library production and maintenance, which we term AEOS (Automated Empirical Optimization of Software); this style of library management has been created in order to allow software to keep pace with the incredible rate of hardware advancement inherent in Moores Law. ATLAS is the application of this new paradigm to linear algebra software, with the present emphasis on the Basic Linear Algebra Subprograms (BLAS), a widely used, performance-critical, linear algebra kernel library. This work was supported in part by: U.S. Department of Energy under contract number DE-AC0596OR22464; National Science Foundation Science and Technology Center Cooperative Agreement No. CCR-8809615; University of California, Los Alamos National Laboratory, subcontract # B76680017-3Z; Department of Defense Raytheon E-Systems, subcontract# AA23, prime contract# DAHC94-96-C-0010; Department of Defense Nichols Research Corporation, subcontract#s NRC CR-96-0011 (ASC) and prime contract # DAHC-94-96-C-0005; Department of Defense Nichols Research Corporation, subcontract#s NRC CR-96-0011 (CEWES); prime contract # DAHC-94-96-C-0002 Dept. of Computer Sciences, Univ. of TN, Knoxville, TN 37996, [email protected] Dept. of Computer Sciences, Univ. of TN, Knoxville, TN 37996, [email protected] Dept. of Computer Sciences, Univ. of TN, Knoxville, TN 37996, and Mathematical Sciences Section, ORNL, Oak Ridge, TN 37831, [email protected] This paper describes the automatically tuned linear algebra software (ATLAS) project, as well as the fundamental principles that underly it. ATLAS is an instantiation of a new paradigm in high performance library production and maintenance, which we term automated empirical optimization of software (AEOS); this style of library management has been created in order to allow software to keep pace with the incredible rate of hardware advancement inherent in Moores Law. ATLAS is the application of this new paradigm to linear algebra software, with the present emphasis on the basic linear algebra subprograms (BLAS), a widely used, performance-critical, linear algebra kernel library.
Computer Physics Communications | 2007
Bin Fang; Yuefan Deng; Glenn J. Martyna
Abstract QCDOC is a massively parallel supercomputer with tens of thousands of nodes distributed on a six-dimensional torus network. The 6D structureof the network provides the needed communication resources for many communication-intensive applications. In this paper, we present a parallelalgorithm for three-dimensional Fast Fourier Transform and its implementation for a 4096-node QCDOC prototype. Two techniques have beenused to increase its parallel performance: simultaneous multi-dimensional communication and communication-and-computation overlapping.Benchmarking experiments suggest that 3D FFTs of size 128 ×128 ×128 can scale well on such platforms up to 4096 nodes. Our performanceresults suggest stronger scalability on QCDOC than on IBM BlueGene/L supercomputer.
Physics Letters A | 1992
Yuefan Deng; Chen Ning Yang
Abstract Exploiting the symmetry of the molecule C 60 , we obtain the precise algebraic and numerical expressions for the eigenvalues and eigenfunctions of the Huckel problem for C 60 .
Computers & Mathematics With Applications | 1995
C.-C. Chou; Yuefan Deng; G. Li; Y. Wang
We present a parallel method for matrix multiplication on distributed-memory MIMD architectures based on Strassens method. Our timing tests, performed on a 56-node Intel Paragon, demonstrate the realization of the potential of the Strassens method with a complexity of 4.7 M2.807 at the system level rather than the node level at which several earlier works have been focused. The parallel efficiency is nearly perfect when the processor number is the power of 7. The parallelized Strassens method seems always faster than the traditional matrix multiplication methods whose complexity is 2M3 coupled with the BMR method and the Ring method at the system level. The speed gain depends on matrix order M: 20% for M ≈ 1000 and more than 100% for M ≈ 5000.
Journal of Computational Physics | 2015
Peng Zhang; Na Zhang; Yuefan Deng; Danny Bluestein
We developed a multiple time-stepping (MTS) algorithm for multiscale modeling of the dynamics of platelets flowing in viscous blood plasma. This MTS algorithm improves considerably the computational efficiency without significant loss of accuracy. This study of the dynamic properties of flowing platelets employs a combination of the dissipative particle dynamics (DPD) and the coarse-grained molecular dynamics (CGMD) methods to describe the dynamic microstructures of deformable platelets in response to extracellular flow-induced stresses. The disparate spatial scales between the two methods are handled by a hybrid force field interface. However, the disparity in temporal scales between the DPD and CGMD that requires time stepping at microseconds and nanoseconds respectively, represents a computational challenge that may become prohibitive. Classical MTS algorithms manage to improve computing efficiency by multi-stepping within DPD or CGMD for up to one order of magnitude of scale differential. In order to handle 3-4 orders of magnitude disparity in the temporal scales between DPD and CGMD, we introduce a new MTS scheme hybridizing DPD and CGMD by utilizing four different time stepping sizes. We advance the fluid system at the largest time step, the fluid-platelet interface at a middle timestep size, and the nonbonded and bonded potentials of the platelet structural system at two smallest timestep sizes. Additionally, we introduce parameters to study the relationship of accuracy versus computational complexities. The numerical experiments demonstrated 3000x reduction in computing time over standard MTS methods for solving the multiscale model. This MTS algorithm establishes a computationally feasible approach for solving a particle-based system at multiple scales for performing efficient multiscale simulations.
Computer Physics Communications | 2007
Guowen Han; Yuefan Deng; James Glimm; Glenn J. Martyna
Abstract Molecular dynamics simulations of biomolecules performed using multiple time-step integration methods are hampered by resonance instabilities. We analyze the properties of a simple 1D linear system integrated with the symplectic reference system propagator MTS (r-RESPA) technique following earlier work by others. A closed form expression for the time step dependent Hamiltonian which corresponds to r-RESPA integration of the model is derived. This permits us to present an analytic formula for the dependence of the integration accuracy on short-range force cutoff range. A detailed analysis of the force decomposition for the standard Ewald summation method is then given as the Ewald method is a good candidate to achieve high scaling on modern massively parallel machines. We test the new analysis on a realistic system, a protein in water. Under Langevin dynamics with a weak friction coefficient ( ζ = 1 ps −1 ) to maintain temperature control and using the SHAKE algorithm to freeze out high frequency vibrations, we show that the 5 fs resonance barrier present when all degrees of freedom are unconstrained is postponed to ≈ 12 fs . An iso-error boundary with respect to the short-range cutoff range and multiple time step size agrees well with the analytical results which are valid due to dominance of the high frequency modes in determining integrator accuracy. Using r-RESPA to treat the long range interactions results in a 6× increase in efficiency for the decomposition described in the text.
parallel computing | 2013
Yuefan Deng; Peng Zhang; Carlos Marques; Reid Powell; Li Zhang
Abstract The biannual TOP500 list of the highest performing supercomputers has chronicled, and even fostered, the development of recent supercomputing platforms. Coupled with the GREEN500 list that launched in November 2007, the TOP500 list has enabled analysis of multiple aspects of supercomputer design. In this comparative and retrospective study, we examine all of the available data contained in these two lists through November 2012 and propose a novel representation and analysis of the data, highlighting several major evolutionary trends.
Physics of Fluids | 1993
Yupin Chen; Yuefan Deng; James Glimm; Gang Li; Qiang Zhang; David H. Sharp
Computational solutions to the Rayleigh–Taylor fluid mixing problem, as modeled by the two‐fluid two‐dimensional Euler equations, are presented. Data from these solutions are analyzed from the point of view of Reynolds averaged equations, using scaling laws derived from a renormalization group analysis. The computations, carried out with the front tracking method on an Intel iPSC/860, are highly resolved and statistical convergence of ensemble averages is achieved. The computations are consistent with the experimentally observed growth rates for nearly incompressible flows. The dynamics of the interior portion of the mixing zone is simplified by the use of scaling variables. The size of the mixing zone suggests fixed‐point behavior. The profile of statistical quantities within the mixing zone exhibit self‐similarity under fixed‐point scaling to a limited degree. The effect of compressibility is also examined. It is found that, for even moderate compressibility, the growth rates fail to satisfy universal s...
International Journal for Numerical Methods in Biomedical Engineering | 2015
Seetha Pothapragada; Peng Zhang; Jawaad Sheriff; Mark Livelli; Marvin J. Slepian; Yuefan Deng; Danny Bluestein
We developed a phenomenological three-dimensional platelet model to characterize the filopodia formation observed during early stage platelet activation. Departing from continuum mechanics based approaches, this coarse-grained molecular dynamics (CGMD) particle-based model can deform to emulate the complex shape change and filopodia formation that platelets undergo during activation. The platelet peripheral zone is modeled with a two-layer homogeneous elastic structure represented by spring-connected particles. The structural zone is represented by a cytoskeletal assembly comprising of a filamentous core and filament bundles supporting the platelets discoid shape, also modeled by spring-connected particles. The interior organelle zone is modeled by homogeneous cytoplasm particles that facilitate the platelet deformation. Nonbonded interactions among the discrete particles of the membrane, the cytoskeletal assembly, and the cytoplasm are described using the Lennard-Jones potential with empirical constants. By exploring the parameter space of this CGMD model, we have successfully simulated the dynamics of varied filopodia formations. Comparative analyses of length and thickness of filopodia show that our numerical simulations are in agreement with experimental measurements of flow-induced activated platelets. Copyright
ieee international conference on high performance computing data and analytics | 1999
R. Alan McCoy; Yuefan Deng
Thin-film growth by sputter deposition is a manufacturing process that is well suited for study by particle simulation methods. The authors report on the development of a high performance, parallel, molecular-dynamics software package that simulates atomic metal systems under sputter deposition conditions. The package combines advanced techniques for parallel molecular dynamics with specialized schemes for the simulation of sputtered atoms impinging on thin films and substrates. The features of the package include asynchronous message passing, dynamic load balancing, mechanisms for data caching, and efficient memory management. For classical, semiempirical force calculations, the authors employ a modified version of the embedded-atom method with improved efficiency. Enhancements for the simulation of sputter deposition include an adjustable temperature control algorithm, the detection and ray tracing of emitted particles, and a Langevin localization procedure that restricts the dynamics computations to regions undergoing kinetic energy transfer. The authors describe in detail the features of the package, discuss its performance behavior, and also present some results from sputter deposition simulations.