George Biros | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where George Biros is active.

Explore More

Publication

Featured researches published by George Biros.

SIAM Journal on Scientific Computing | 2005

Parallel Lagrange--Newton--Krylov--Schur Methods for PDE-Constrained Optimization. Part I: The Krylov--Schur Solver

George Biros; Omar Ghattas

Large-scale optimization of systems governed by partial differential equations (PDEs) is a frontier problem in scientific computation. Reduced quasi-Newton sequential quadratic programming (SQP) methods are state-of-the-art approaches for such problems. These methods take full advantage of existing PDE solver technology and parallelize well. However, their algorithmic scalability is questionable; for certain problem classes they can be very slow to converge. In this two-part article we propose a new method for steady-state PDE-constrained optimization, based on the idea of using a full space Newton solver combined with an approximate reduced space quasi-Newton SQP preconditioner. The basic components of the method are Newton solution of the first-order optimality conditions that characterize stationarity of the Lagrangian function; Krylov solution of the Karush--Kuhn--Tucker (KKT) linear systems arising at each Newton iteration using a symmetric quasi-minimum residual method; preconditioning of the KKT system using an approximate state/decision variable decomposition that replaces the forward PDE Jacobians by their own preconditioners, and the decision space Schur complement (the reduced Hessian) by a BFGS approximation initialized by a two-step stationary method. Accordingly, we term the new method {\it Lagrange--Newton--Krylov--Schur} (LNKS). It is fully parallelizable, exploits the structure of available parallel algorithms for the PDE forward problem, and is locally quadratically convergent. In part I of this two-part article, we investigate the effectiveness of the KKT linear system solver. We test our method on two optimal control problems in which the state constraints are described by the steady-state Stokes equations. The objective is to minimize dissipation or the deviation from a given velocity field; the control variables are the boundary velocities. Numerical experiments on up to 256 Cray T3E processors and on an SGI Origin 2000 include scalability and performance assessment of the LNKS algorithm and comparisons with reduced SQP for up to

conference on high performance computing (supercomputing) | 2003

High Resolution Forward And Inverse Earthquake Modeling on Terascale Computers

Volkan Akcelik; Jacobo Bielak; George Biros; Ioannis Epanomeritakis; Antonio Fernandez; Omar Ghattas; Eui Joong Kim; Julio Lopez; David R. O'Hallaron; Tiankai Tu; John Urbanic

1,000,000

Journal of Computational Physics | 2009

A boundary integral method for simulating the dynamics of inextensible vesicles suspended in a viscous fluid in 2D

Shravan Veerapaneni; Denis Zorin; George Biros

state and 50,000 decision variables. In part II of the article, we address globalization and inexactness issues, and apply LNKS to the optimal control of the steady incompressible Navier--Stokes equations.

ieee international conference on high performance computing data and analytics | 2010

Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures

Abtin Rahimian; Ilya Lashuk; Shravan Veerapaneni; Aparna Chandramowlishwaran; Dhairya Malhotra; Logan Moon; Rahul S. Sampath; Aashay Shringarpure; Jeffrey S. Vetter; Richard W. Vuduc; Denis Zorin; George Biros

For earthquake simulations to play an important role in the reduction of seismic risk, they must be capable of high resolution and high fidelity. We have developed algorithms and tools for earthquake simulation based on multiresolution hexahedral meshes. We have used this capability to carry out 1 Hz simulations of the 1994 Northridge earthquake in the LA Basin using 100 million grid points. Our wave propagation solver sustains 1.21 teraflop/s for 4 hours on 3000 AlphaServer processors at 80% parallel efficiency. Because of uncertainties in characterizing earthquake source and basin material properties, a critical remaining challenge is to invert for source and material parameter fields for complex 3D basins from records of past earthquakes. Towards this end, we present results for material and source inversion of high-resolution models of basins undergoing antiplane motion using parallel scalable inversion algorithms that overcome many of the difficulties particular to inverse heterogeneous wave propagation problems.

SIAM Journal on Scientific Computing | 2005

Parallel Lagrange--Newton--Krylov--Schur Methods for PDE-Constrained Optimization. Part II: The Lagrange--Newton Solver and Its Application to Optimal Control of Steady Viscous Flows

George Biros; Omar Ghattas

We present a new method for the evolution of inextensible vesicles immersed in a Stokesian fluid. We use a boundary integral formulation for the fluid that results in a set of nonlinear integro-differential equations for the vesicle dynamics. The motion of the vesicles is determined by balancing the non-local hydrodynamic forces with the elastic forces due to bending and tension. Numerical simulations of such vesicle motions are quite challenging. On one hand, explicit time-stepping schemes suffer from a severe stability constraint due to the stiffness related to high-order spatial derivatives and a milder constraint due to a transport-like stability condition. On the other hand, an implicit scheme can be expensive because it requires the solution of a set of nonlinear equations at each time step. We present two semi-implicit schemes that circumvent the severe stability constraints on the time step and whose computational cost per time step is comparable to that of an explicit scheme. We discretize the equations by using a spectral method in space, and a multistep third-order accurate scheme in time. We use the fast multipole method (FMM) to efficiently compute vesicle-vesicle interaction forces in a suspension with a large number of vesicles. We report results from numerical experiments that demonstrate the convergence and algorithmic complexity properties of our scheme.

SIAM Journal on Scientific Computing | 2008

Bottom-Up Construction and 2:1 Balance Refinement of Linear Octrees in Parallel

Hari Sundar; Rahul S. Sampath; George Biros

We present a fast, petaflop-scalable algorithm for Stokesian particulate flows. Our goal is the direct simulation of blood, which we model as a mixture of a Stokesian fluid (plasma) and red blood cells (RBCs). Directly simulating blood is a challenging multiscale, multiphysics problem. We report simulations with up to 200 million deformable RBCs. The largest simulation amounts to 90 billion unknowns in space. In terms of the number of cells, we improve the state-of-the art by several orders of magnitude: the previous largest simulation, at the same physical fidelity as ours, resolved the flow of O(1,000-10,000) RBCs. Our approach has three distinct characteristics: (1) we faithfully represent the physics of RBCs by using nonlinear solid mechanics to capture the deformations of each cell; (2) we accurately resolve the long-range, N-body, hydrodynamic interactions between RBCs (which are caused by the surrounding plasma); and (3) we allow for the highly non-uniform distribution of RBCs in space. The new method has been implemented in the software library MOBO (for “Moving Boundaries”). We designed MOBO to support parallelism at all levels, including inter-node distributed memory parallelism, intra-node shared memory parallelism, data parallelism (vectorization), and fine-grained multithreading for GPUs. We have implemented and optimized the majority of the computation kernels on both Intel/AMD x86 and NVidias Tesla/Fermi platforms for single and double floating point precision. Overall, the code has scaled on 256 CPU-GPUs on the Teragrids Lincoln cluster and on 200,000 AMD cores of the Oak Ridge national Laboratorys Jaguar PF system. In our largest simulation, we have achieved 0.7 Petaflops/s of sustained performance on Jaguar.

conference on high performance computing (supercomputing) | 2002

Parallel Multiscale Gauss-Newton-Krylov Methods for Inverse Wave Propagation

Volkan Akcelik; George Biros; Omar Ghattas

In part I of this article, we proposed a Lagrange--Newton--Krylov--Schur (LNKS) method for the solution of optimization problems that are constrained by partial differential equations. LNKS uses Krylov iterations to solve the linearized Karush--Kuhn--Tucker system of optimality conditions in the full space of states, adjoints, and decision variables, but invokes a preconditioner inspired by reduced space sequential quadratic programming (SQP) methods. The discussion in part I focused on the (inner, linear) Krylov solver and preconditioner. In part II, we discuss the (outer, nonlinear) Lagrange--Newton solver and address globalization, robustness, and efficiency issues, including line search methods, safeguarding Newton with quasi-Newton steps, parameter continuation, and inexact Newton ideas. We test the full LNKS method on several large-scale three-dimensional configurations of a problem of optimal boundary control of incompressible Navier--Stokes flow with a dissipation objective functional. Results of numerical experiments on up to 256 Cray T3E-900 processors demonstrate very good scalability of the new method. Moreover, LNKS is an order of magnitude faster than quasi-Newton reduced SQP, and we are able to solve previously intractable problems of up to 800,000 state and 5,000 decision variables at about 5 times the cost of a single forward flow solution.

IEEE Transactions on Medical Imaging | 2012

GLISTR: Glioma Image Segmentation and Registration

Ali Gooya; Kilian M. Pohl; Michel Bilello; L. Cirillo; George Biros; Elias R. Melhem; Christos Davatzikos

In this article, we propose new parallel algorithms for the construction and 2:1 balance refinement of large linear octrees on distributed memory machines. Such octrees are used in many problems in computational science and engineering, e.g., object representation, image analysis, unstructured meshing, finite elements, adaptive mesh refinement, and N-body simulations. Fixed-size scalability and isogranular analysis of the algorithms using an MPI-based parallel implementation was performed on a variety of input data and demonstrated good scalability for different processor counts (1 to 1024 processors) on the Pittsburgh Supercomputing Centers TCS-1 AlphaServer. The results are consistent for different data distributions. Octrees with over a billion octants were constructed and balanced in less than a minute on 1024 processors. Like other existing algorithms for constructing and balancing octrees, our algorithms have

Journal of Computational Physics | 2006