Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Chi-Chung Lam is active.

Publication


Featured researches published by Chi-Chung Lam.


Proceedings of the IEEE | 2005

Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models

Gerald Baumgartner; Alexander A. Auer; David E. Bernholdt; Alina Bibireata; Venkatesh Choppella; Daniel Cociorva; Xiaoyang Gao; Robert J. Harrison; So Hirata; Sriram Krishnamoorthy; Sandhya Krishnan; Chi-Chung Lam; Qingda Lu; Marcel Nooijen; Russell M. Pitzer; J. Ramanujam; P. Sadayappan; Alexander Sibiryakov

This paper provides an overview of a program synthesis system for a class of quantum chemistry computations. These computations are expressible as a set of tensor contractions and arise in electronic structure modeling. The input to the system is a a high-level specification of the computation, from which the system can synthesize high-performance parallel code tailored to the characteristics of the target architecture. Several components of the synthesis system are described, focusing on performance optimization issues that they address.


Molecular Physics | 2006

Automatic code generation for many-body electronic structure methods: the tensor contraction engine‡‡

Alexander A. Auer; Gerald Baumgartner; David E. Bernholdt; Alina Bibireata; Venkatesh Choppella; Daniel Cociorva; Xiaoyang Gao; Robert J. Harrison; Sriram Krishnamoorthy; Sandhya Krishnan; Chi-Chung Lam; Qingda Lu; Marcel Nooijen; Russell M. Pitzer; J. Ramanujam; P. Sadayappan; Alexander Sibiryakov

As both electronic structure methods and the computers on which they are run become increasingly complex, the task of producing robust, reliable, high-performance implementations of methods at a rapid pace becomes increasingly daunting. In this paper we present an overview of the Tensor Contraction Engine (TCE), a unique effort to address issues of both productivity and performance through automatic code generation. The TCE is designed to take equations for many-body methods in a convenient high-level form and acts like an optimizing compiler, producing an implementation tuned to the target computer system and even to the specific chemical problem of interest. We provide examples to illustrate the TCE approach, including the ability to target different parallel programming models, and the effects of particular optimizations.


conference on high performance computing (supercomputing) | 2002

A High-Level Approach to Synthesis of High-Performance Codes for Quantum Chemistry

Gerald Baumgartner; David E. Bernholdt; Daniel Cociorva; Robert J. Harrison; So Hirata; Chi-Chung Lam; Marcel Nooijen; Russell M. Pitzer; J. Ramanujam; P. Sadayappan

This paper discusses an approach to the synthesis of high-performance parallel programs for a class of computations encountered in quantum chemistry and physics. These computations are expressible as a set of tensor contractions and arise in electronic structure modeling. An overview is provided of the synthesis system, that transforms a high-level specification of the computation into high-performance parallel code, tailored to the characteristics of the target architecture. An example from computational chemistry is used to illustrate how different code structures are generated under different assumptions of available memory on the target computer.


programming language design and implementation | 2002

Space-time trade-off optimization for a class of electronic structure calculations

Daniel Cociorva; Gerald Baumgartner; Chi-Chung Lam; P. Sadayappan; J. Ramanujam; Marcel Nooijen; David E. Bernholdt; Robert J. Harrison

The accurate modeling of the electronic structure of atoms and molecules is very computationally intensive. Many models of electronic structure, such as the Coupled Cluster approach, involve collections of tensor contractions. There are usually a large number of alternative ways of implementing the tensor contractions, representing different trade-offs between the space required for temporary intermediates and the total number of arithmetic operations. In this paper, we present an algorithm that starts with an operation-minimal form of the computation and systematically explores the possible space-time trade-offs to identify the form with lowest cost that fits within a specified memory limit. Its utility is demonstrated by applying it to a computation representative of a component in the CCSD(T) formulation in the NWChem quantum chemistry suite from Pacific Northwest National Laboratory.


Parallel Processing Letters | 1997

On Optimizing a Class of Multi-Dimensional Loops with Reduction for Parallel Execution

Chi-Chung Lam; P. Sadayappan; Rephael Wenger

This paper addresses the compile-time optimization of a form of nested-loop computation that is motivated by a computational physics application. The computations involve multi-dimensional surface and volume integrals where the integrand is a product of a number of array terms. Besides the issue of optimal distribution of the arrays among the processors, there is also scope for reordering of the operations using the commutativity and associativity properties of addition and multiplication, and the application of the distributive law to significantly reduce the number of operations executed. A formalization of the operation minimization problem and proof of its NP-completeness is provided. A pruning search strategy for determination of an optimal form is developed. An analysis of the communication requirements and a polynomial-time algorithm for determination of optimal distribution of the arrays are also provided.


Computer Languages, Systems & Structures | 2011

Memory-optimal evaluation of expression trees involving large objects

Chi-Chung Lam; Thomas Rauber; Gerald Baumgartner; Daniel Cociorva; P. Sadayappan

The need to evaluate expression trees involving large objects arises in scientific computing applications such as electronic structure calculations. Often, the tree node objects are so large that only a subset of them can fit into memory at a time. This paper addresses the problem of finding an evaluation order of the nodes in a given expression tree that uses the least amount of memory. We present an algorithm that finds an optimal evaluation order in @Q(nlog^2n) time for an n-node expression tree and prove its correctness. We demonstrate the utility of our algorithm using representative equations from quantum chemistry.


international conference on supercomputing | 2001

Loop optimization for a class of memory-constrained computations

Daniel Cociorva; John W. Wilkins; Chi-Chung Lam; Gerald Baumgartner; J. Ramanujam; P. Sadayappan

Compute-intensive multi-dimensional summations that involve products of several arrays arise in the modeling of electronic structure of materials. Sometimes several alternative formulations of a computation, representing different space-time trade-offs, are possible. By computing and storing some intermediate arrays, reduction of the number of arithmetic operations is possible, but the size of intermediate temporary arrays may be prohibitively large. Loop fusion can be applied to reduce memory requirements, but that could impede effective tiling to minimize memory access costs. This paper develops an integrated model combining loop tiling for enhancing data reuse, and loop fusion for reduction of memory for intermediate temporary arrays. An algorithm is presented that addresses the selection of tile sizes and choice of loops for fusion, with the objective of minimizing cache misses while keeping the total memory usage within a given limit. Experimental results are reported that demonstrate the effectiveness of the combined loop tiling and fusion transformations performed by using the developed framework.


international conference on computational science | 2005

Automated operation minimization of tensor contraction expressions in electronic structure calculations

Albert Hartono; Alexander Sibiryakov; Marcel Nooijen; Gerald Baumgartner; David E. Bernholdt; So Hirata; Chi-Chung Lam; Russell M. Pitzer; J. Ramanujam; P. Sadayappan

Complex tensor contraction expressions arise in accurate electronic structure models in quantum chemistry, such as the Coupled Cluster method. Transformations using algebraic properties of commutativity and associativity can be used to significantly decrease the number of arithmetic operations required for evaluation of these expressions, but the optimization problem is NP-hard. Operation minimization is an important optimization step for the Tensor Contraction Engine, a tool being developed for the automatic transformation of high-level tensor contraction expressions into efficient programs. In this paper, we develop an effective heuristic approach to the operation minimization problem, and demonstrate its effectiveness on tensor contraction expressions for coupled cluster equations.


international parallel and distributed processing symposium | 2003

Global communication optimization for tensor contraction expressions under memory constraints

Daniel Cociorva; Xiaoyang Gao; Sandhya Krishnan; Gerald Baumgartner; Chi-Chung Lam; P. Sadayappan; J. Ramanujam

The accurate modeling of the electronic structure of atoms and molecules involves computationally intensive tensor contractions involving large multi-dimensional arrays. The efficient computation of complex tensor contractions usually requires the generation of temporary intermediate arrays. These intermediates could be extremely large, but they can often be generated and used in batches through appropriate loop fusion transformations. To optimize the performance of such computations on parallel computers, the total amount of inter-processor communication must be minimized, subject to the available memory on each processor In this paper we address the memory-constrained communication minimization problem in the context of this class of computations. Based on a framework that models the relationship between loop fusion and memory usage, we develop an approach to identify the best combination of loop fusion and data partitioning that minimizes inter-processor communication cost without exceeding the per-processor memory limit. The effectiveness of the developed optimization approach is demonstrated on a computation representative of a component used in quantum chemistry suites.


languages and compilers for parallel computing | 1999

Optimization of Memory Usage Requirement for a Class of Loops Implementing Multi-dimensional Integrals

Chi-Chung Lam; Daniel Cociorva; Gerald Baumgartner; P. Sadayappan

Multi-dimensional integrals of products of several arrays arise in certain scientific computations. In the context of these integral calculations, this paper addresses a memory usage minimization problem. Based on a framework that models the relationship between loop fusion and memory usage, we propose an algorithm for finding a loop fusion configuration that minimizes memory usage. A practical example shows the performance improvement obtained by our algorithm on an electronic structure computation.

Collaboration


Dive into the Chi-Chung Lam's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

J. Ramanujam

Louisiana State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

David E. Bernholdt

Oak Ridge National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge