Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Amir Gholami is active.

Publication


Featured researches published by Amir Gholami.


SIAM Journal on Scientific Computing | 2016

FFT, FMM, or Multigrid? A comparative Study of State-Of-the-Art Poisson Solvers for Uniform and Nonuniform Grids in the Unit Cube

Amir Gholami; Dhairya Malhotra; Hari Sundar; George Biros

From molecular dynamics and quantum chemistry, to plasma physics and computational astrophysics, Poisson solvers in the unit cube are used in many applications in computational science and engineering. In this work, we benchmark and discuss the performance of the scalable methods for the Poisson problem which are used widely in practice: the fast Fourier transform (FFT), the fast multipole method (FMM), the geometric multigrid (GMG), and algebraic multigrid (AMG). Our focus is on solvers supporting high-order, highly nonuniform discretizations. To allow comparisons with standard libraries we also compare adaptive solvers with solvers specialized for problems on regular grids, that is, FFT and regular-stencil multigrid, since both are very popular algorithms for several practical applications. For the multigrid, we use the finite element variant of a high-performance geometric multigrid (HPGMG) benchmark. In total we compare five different codes, three of which are developed in our group. Our FFT, GMG, and...


ieee international conference on high performance computing data and analytics | 2014

A volume integral equation stokes solver for problems with variable coefficients

Dhairya Malhotra; Amir Gholami; George Biros

We present a novel numerical scheme for solving the Stokes equation with variable coefficients in the unit box. Our scheme is based on a volume integral equation formulation. Compared to finite element methods, our formulation decouples the velocity and pressure, generates velocity fields that are by construction divergence free to high accuracy and its performance does not depend on the order of the basis used for discretization. In addition, we employ a novel adaptive fast multipole method for volume integrals to obtain a scheme that is algorithmically optimal. Our scheme supports non-uniform discretizations and is spectrally accurate. To increase per node performance, we have integrated our code with both NVIDIA and Intel accelerators. In our largest scalability test, we solved a problem with 20 billion unknowns, using a 14-order approximation for the velocity, on 2048 nodes of the Stampede system at the Texas Advanced Computing Center. We achieved 0.656 peta FLOPS for the overall code (23% efficiency) and one peta FLOPS for the volume integrals (33% efficiency). As an application example, we simulate Stokes ow in a porous medium with highly complex pore structure using a penalty formulation to enforce the no slip condition.


ieee international conference on high performance computing data and analytics | 2016

Distributed-memory large deformation diffeomorphic 3D image registration

Andreas Mang; Amir Gholami; George Biros

We present a parallel distributed-memory algorithm for large deformation diffeomorphic registration of volumetric images that produces large isochoric deformations (locally volume preserving). Image registration is a key technology in medical image analysis. Our algorithm uses a partial differential equation constrained optimal control formulation. Finding the optimal deformation map requires the solution of a highly nonlinear problem that involves pseudo-differential operators, biharmonic operators, and pure advection operators both forward and backward in time. A key issue is the time to solution, which poses the demand for efficient optimization methods as well as an effective utilization of high performance computing resources. To address this problem we use a preconditioned, inexact, Gauss-Newton-Krylov solver. Our algorithm integrates several components: a spectral discretization in space, a semi-Lagrangian formulation in time, analytic adjoints, different regularization functionals (including volume-preserving ones), a spectral preconditioner, a highly optimized distributed Fast Fourier Transform, and a cubic interpolation scheme for the semi-Lagrangian time-stepping. We demonstrate the scalability of our algorithm on images with resolution of up to 10243 on the “Maverick” and “Stampede” systems at the Texas Advanced Computing Center (TACC). The critical problem in the medical imaging application domain is strong scaling, that is, solving registration problems of a moderate size of 2563—a typical resolution for medical images. We are able to solve the registration problem for images of this size in less than five seconds on 64 x86 nodes of TACCs “Maverick” system.


ieee international conference on high performance computing data and analytics | 2017

A framework for scalable biophysics-based image analysis

Amir Gholami; Andreas Mang; Klaudius Scheufele; Christos Davatzikos; Miriam Mehl; George Biros

We present SIBIA (Scalable Integrated Biophysics-based Image Analysis), a framework for coupling biophysical models with medical image analysis. It provides solvers for an image-driven inverse brain tumor growth model and an image registration problem, the combination of which can eventually help in diagnosis and prognosis of brain tumors. The two main computational kernels of SIBIA are a Fast Fourier Transformation (FFT) implemented in the library AccFFT to discretize differential operators, and a cubic interpolation kernel for semi-Lagrangian based advection. We present efficiency and scalability results for the computational kernels, the inverse tumor solver and image registration on two x86 systems, Lonestar 5 at the Texas Advanced Computing Center and Hazel Hen at the Stuttgart High Performance Computing Center. We showcase results that demonstrate that our solver can be used to solve registration problems of unprecedented scale, 40963 resulting in ∼ 200 billion unknowns---a problem size that is 64X larger than the state-of-the-art. For problem sizes of clinical interest, SIBIA is about 8X faster than the state-of-the-art.


Optimization and Engineering | 2018

PDE-constrained optimization in medical image analysis

Andreas Mang; Amir Gholami; Christos Davatzikos; George Biros

PDE-constrained optimization problems find many applications in medical image analysis, for example, neuroimaging, cardiovascular imaging, and oncologic imaging. We review the related literature and give examples of the formulation, discretization, and numerical solution of PDE-constrained optimization problems for medical imaging. We discuss three examples. The first is image registration, the second is data assimilation for brain tumor patients, and the third is data assimilation in cardiovascular imaging. The image registration problem is a classical task in medical image analysis and seeks to find pointwise correspondences between two or more images. Data assimilation problems use a PDE-constrained formulation to link a biophysical model to patient-specific data obtained from medical images. The associated optimality systems turn out to be sets of nonlinear, multicomponent PDEs that are challenging to solve in an efficient way. The ultimate goal of our work is the design of inversion methods that integrate complementary data, and rigorously follow mathematical and physical principles, in an attempt to support clinical decision making. This requires reliable, high-fidelity algorithms with a short time-to-solution. This task is complicated by model and data uncertainties, and by the fact that PDE-constrained optimization problems are ill-posed in nature, and in general yield high-dimensional, severely ill-conditioned systems after discretization. These features make regularization, effective preconditioners, and iterative solvers that, in many cases, have to be implemented on distributed-memory architectures to be practical, a prerequisite. We showcase state-of-the-art techniques in scientific computing to tackle these challenges.


arXiv: Distributed, Parallel, and Cluster Computing | 2015

AccFFT: A library for distributed-memory FFT on CPU and GPU architectures

Amir Gholami; Judith Hill; Dhairya Malhotra; George Biros


Archive | 2014

FFT, FMM, or MULTIGRID? A comparative study of state-of-the-art poisson solvers.

Amir Gholami; Dhairya Malhotra; Hari Sundar; George Biros


arXiv: Neural and Evolutionary Computing | 2018

SqueezeNext: Hardware-Aware Neural Network Design.

Amir Gholami; Kiseok Kwon; Bichen Wu; Zizheng Tai; Xiangyu Yue; Peter H. Jin; Sicheng Zhao; Kurt Keutzer


Archive | 2017

Coupling Brain-Tumor Biophysical Models and Diffeomorphic Image Registration

Klaudius Scheufele; Andreas Mang; Amir Gholami; Christos Davatzikos; George Biros; Miriam Mehl


arXiv: Optimization and Control | 2018

CLAIRE: A distributed-memory solver for constrained large deformation diffeomorphic image registration.

Andreas Mang; Amir Gholami; Christos Davatzikos; George Biros

Collaboration


Dive into the Amir Gholami's collaboration.

Top Co-Authors

Avatar

George Biros

University of Texas at Austin

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Dhairya Malhotra

University of Texas at Austin

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kurt Keutzer

University of California

View shared research outputs
Top Co-Authors

Avatar

Peter H. Jin

University of California

View shared research outputs
Top Co-Authors

Avatar

Sicheng Zhao

University of California

View shared research outputs
Top Co-Authors

Avatar

Xiangyu Yue

University of California

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge