Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Glenn R. Luecke is active.

Publication


Featured researches published by Glenn R. Luecke.


Concurrency and Computation: Practice and Experience | 2003

MPI-CHECK: a tool for checking Fortran 90 MPI programs

Glenn R. Luecke; Hua Chen; James Coyle; Jim Hoekstra; Marina Kraeva; Yan Zou

MPI is commonly used to write parallel programs for distributed memory parallel computers. MPI‐CHECK is a tool developed to aid in the debugging of MPI programs that are written in free or fixed format Fortran 90 and Fortran 77. MPI‐CHECK provides automatic compile‐time and run‐time checking of MPI programs. MPI‐CHECK automatically detects the following problems in the use of MPI routines: (i) mismatch in argument type, kind, rank or number; (ii) messages which exceed the bounds of the source/destination array; (iii) negative message lengths; (iv) illegal MPI calls before MPI_INIT or after MPI_FINALIZE; (v) inconsistencies between the declared type of a message and its associated DATATYPE argument; and (vi) actual arguments which violate the INTENT attribute. Copyright


Concurrency and Computation: Practice and Experience | 2002

Deadlock detection in MPI programs

Glenn R. Luecke; Yan Zou; James Coyle; Jim Hoekstra; Marina Kraeva

The Message‐Passing Interface (MPI) is commonly used to write parallel programs for distributed memory parallel computers. MPI‐CHECK is a tool developed to aid in the debugging of MPI programs that are written in free or fixed format Fortran 90 and Fortran 77. This paper presents the methods used in MPI‐CHECK 2.0 to detect many situations where actual and potential deadlocks occur when using blocking and non‐blocking point‐to‐point routines as well as when using collective routines. Copyright


Journal of Electroanalytical Chemistry | 1986

Pulsed amperometric detection of sulfur compounds: Part II. Dependence of response on adsorption time

Theresa Z. Polta; Dennis C. Johnson; Glenn R. Luecke

Abstract A mathematical model is developed for the response of PAD and is applied to data from the study of Ip as a function of tads for evaluation of the adsorption rate constants, and the maximum molar surface coverage for thiourea at a Pt electrode. The results are, respectively: k1 = 4.1 × 104 M−1 s−1, k−1 = 1.9 s−1, and Γ0 = 1.3 × 10−10 mol cm−2. The calculated adsorption equilibrium constant (k1/k−1) is 2.1 × 104 M−1, compared to 4.9 × 104 M−1 calculated from the plot of 1/Ip vs. 1/cb for cb > 1.0 × 10−4 M and tads = 8500 ms. Analytical calibration procedures are examined; linear plots of 1/Ip vs 1/cb cannot be expected for cases of mixed transport-isotherm control of detector response.


Concurrency and Computation: Practice and Experience | 2004

The performance and scalability of SHMEM and MPI-2 one-sided routines on a SGI Origin 2000 and a Cray T3E-600

Glenn R. Luecke; Silvia Spanoyannis; Marina Kraeva

This paper compares the performance and scalability of SHMEM and MPI‐2 one‐sided routines on different communication patterns for a SGI Origin 2000 and a Cray T3E‐600. The communication tests were chosen to represent commonly used communication patterns with low contention (accessing distant messages, a circular right shift, a binary tree broadcast) to communication patterns with high contention (a ‘naive’ broadcast and an all‐to‐all). For all the tests and for small message sizes, the SHMEM implementation significantly outperformed the MPI‐2 implementation for both the SGI Origin 2000 and Cray T3E‐600. Copyright


Concurrency and Computation: Practice and Experience | 2006

A survey of systems for detecting serial run-time errors

Glenn R. Luecke; James Coyle; Jim Hoekstra; Marina Kraeva; Ying Li; Olga Taborskaia; Yanmei Wang

This paper evaluates the ability of a variety of commercial and non‐commercial software products to detect serial run‐time errors in C and C++ programs, to issue meaningful messages, and to give the line in the source code where the error occurred. The commercial products Insure++ and Purify performed the best of all the software products we evaluated. Error messages were usually better and clearer when using Insure++ than when using Purify. Our evaluation shows that the overall capability of detecting run‐time errors of non‐commercial products is significantly lower than the quality of both Purify and Insure++. Of all non‐commercial products evaluated, Mpatrol provided the best overall capability to detect run‐time errors in C and C++ programs. Copyright


Stochastic Processes and their Applications | 1978

Strongly ergodic Markov chains and rates of convergence using spectral conditions

Dean Isaacson; Glenn R. Luecke

For finite Markov chains the eigenvalues of P can be used to characterize the chain and also determine the geometric rate at which Pn converges to Q in case P is ergodic. For infinite Markov chains the spectrum of P plays the analogous role. It follows from Theorem 3.1 that ||Pn-Q||[less-than-or-equals, slant]C[beta]n if and only if P is strongly ergodic. The best possible rate for [beta] is the spectral radius of P-Q which in this case is the same as sup{[lambda]: [lambda] |-> [sigma] (P), [lambda] [not equal to];1}. The question of when this best rate equals [delta](P) is considered for both discrete and continous time chains. Two characterizations of strong ergodicity are given using spectral properties of P- Q (Theorem 3.5) and spectral properties of a submatrix of P (Theorem 3.16).


Concurrency and Computation: Practice and Experience | 2001

Scalability and performance of OpenMP and MPI on a 128‐processor SGI Origin 2000

Glenn R. Luecke; Wei-Hua Lin

The purpose of this paper is to investigate the scalability and performance of seven, simple OpenMP test programs and to compare their performance with equivalent MPI programs on an SGI Origin 2000. Data distribution directives were used to make sure that the OpenMP implementation had the same data distribution as the MPI implementation. For the matrix‐times‐vector (test 5) and the matrix‐times‐matrix (test 7) tests, the syntax allowed in OpenMP 1.1 does not allow OpenMP compilers to be able to generate efficient code since the reduction clause is not currently allowed for arrays. (This problem is corrected in OpenMP 2.0.) For the remaining five tests, the OpenMP version performed and scaled significantly better than the corresponding MPI implementation, except for the right shift test (test 2) for a small message. Copyright


ICWC 99. IEEE Computer Society International Workshop on Cluster Computing | 1999

Comparing the communication performance and scalability of a Linux and a NT cluster of PCs, a Cray origin 2000, an IBM SP and a Cray T3E-600

Glenn R. Luecke; Bruno Raffin; James Coyle

The paper presents scalability and communication performance results for a cluster of PCs running Linux with the GM communication library, a cluster of PCs running Windows NT with the HPVM communication library, a Cray T3E-600, an IBM SP and a Cray Origin 2000. Both PC clusters were using a Myrinet network. Six communication tests using MPI routines were run for a variety of message sizes and numbers of processors. The tests were chosen to represent commonly used communication patterns with low contention (a ping-pong between processors, a right shift, a binary tree broadcast and a synchronization barrier) to communication patterns with high contention (a naive broadcast and an all-to-all). For most of the tests, the T3E provides the best performance and scalability. For an 8 byte message the NT cluster performs about the same as the T3E for most of the tests. For all the tests but one, the T3E, the Origin and the SP outperform the two clusters for the largest message size (10 Kbytes or 1 Mbyte).


Concurrency and Computation: Practice and Experience | 2004

Performance and Scalability of MPI on PC Clusters

Glenn R. Luecke; Marina Kraeva; Jing Yuan; Silvia Spanoyannis

The purpose of this paper is to compare the communication performance and scalability of MPI communication routines on a Windows Cluster, a Linux Cluster, a Cray T3E‐600, and an SGI Origin 2000. All tests in this paper were run using various numbers of processors and two message sizes. In spite of the fact that the Cray T3E‐600 is about 7 years old, it performed best of all machines for most of the tests. The Linux Cluster with the Myrinet interconnect and Myricoms MPI performed and scaled quite well and, in most cases, performed better than the Origin 2000, and in some cases better than the T3E. The Windows Cluster using the Giganet Full Interconnect and MPI/Pros MPI performed and scaled poorly for small messages compared with all of the other machines. Copyright


International Journal of High Performance Computing Applications | 2018

High-performance epistasis detection in quantitative trait GWAS

Nathan T. Weeks; Glenn R. Luecke; Brandon M. Groth; Marina Kraeva; Li Ma; Luke M Kramer; James E. Koltes; James M. Reecy

epiSNP is a program for identifying pairwise single nucleotide polymorphism (SNP) interactions (epistasis) in quantitative-trait genome-wide association studies (GWAS). A parallel MPI version (EPISNPmpi) was created in 2008 to address this computationally expensive analysis on large data sets with many quantitative traits and SNP markers. However, the falling cost of genotyping has led to an explosion of large-scale GWAS data sets that challenge EPISNPmpi’s ability to compute results in a reasonable amount of time. Therefore, we optimized epiSNP for modern multi-core and highly parallel many-core processors to efficiently handle these large data sets. This paper describes the serial optimizations, dynamic load balancing using MPI-3 RMA operations, and shared-memory parallelization with OpenMP to further enhance load balancing and allow execution on the Intel Xeon Phi coprocessor (MIC). For a large GWAS data set, our optimizations provided a 38.43× speedup over EPISNPmpi on 126 nodes using 2 MICs on TACC’s Stampede Supercomputer. We also describe a Coarray Fortran (CAF) version that demonstrates the suitability of PGAS languages for problems with this computational pattern. We show that the Coarray version performs competitively with the MPI version on the NERSC Edison Cray XC30 supercomputer. Finally, the performance benefits of hyper-threading for this application on Edison (average 1.35× speedup) are demonstrated.

Collaboration


Dive into the Glenn R. Luecke's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ying Li

Iowa State University

View shared research outputs
Top Co-Authors

Avatar

Chao Yang

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge