Valerie E. Taylor | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Valerie E. Taylor is active.

Explore More

Publication

Featured researches published by Valerie E. Taylor.

job scheduling strategies for parallel processing | 1998

Predicting Application Run Times Using Historical Information

Warren Smith; Ian T. Foster; Valerie E. Taylor

We present a technique for deriving predictions for the run times of parallel applications from the run times of “similar” applications that have executed in the past. The novel aspect of our work is the use of search techniques to determine those application characteristics that yield the best definition of similarity for the purpose of making predictions. We use four workloads recorded from parallel computers at Argonne National Laboratory, the Cornell Theory Center, and the San Diego Supercomputer Center to evaluate the effectiveness of our approach. We show that on these workloads our techniques achieve predictions that are between 14 and 60 percent better than those achieved by other researchers; our approach achieves mean prediction errors that are between 40 and 59 percent of mean application run times.

international parallel and distributed processing symposium | 2000

Scheduling with advanced reservations

Warren Smith; Ian T. Foster; Valerie E. Taylor

Some computational grid applications have very large resource requirements and need simultaneous access to resources from more than one parallel computer. Current scheduling systems do not provide mechanisms to gain such simultaneous access without the help of human administrators of the computer systems. In this work, we propose and evaluate several algorithms for supporting advanced reservation of resources in supercomputing scheduling systems. These advanced reservations allow users to request resources from scheduling systems at specific times. We find that the wait times of applications submitted to the queue increases when reservations are supported and the increase depends on how reservations are supported. Further, we find that the best performance is achieved when we assume that applications can be terminated and restarted, backfilling is performed, and relatively accurate run-time predictions are used.

job scheduling strategies for parallel processing | 1999

Using Run-Time Predictions to Estimate Queue Wait Times and Improve Scheduler Performance

Warren Smith; Valerie E. Taylor; Ian T. Foster

On many computers, a request to run a job is not serviced immediately but instead is placed in a queue and serviced only when resources are released by preceding jobs. In this paper, we build on run-time prediction techniques that we developed in previous research to explore two problems. The first problem is to predict how long applications will wait in a queue until they receive resources. We develop run-time estimates that result in more accurate wait-time predictions than other run-time prediction techniques. The second problem we investigate is improving scheduling performance. We use run-time predictions to improve the performance of the least-work-first and backfill scheduling algorithms. We find that using our run-time predictor results in lower mean wait times for the workloads with higher offered loads and for the backfill scheduling algorithm.

measurement and modeling of computer systems | 2003

Prophesy: an infrastructure for performance analysis and modeling of parallel and grid applications

Valerie E. Taylor; Xingfu Wu; Rick Stevens

Performance is an important issue with any application, especially grid applications. Efficient execution of applications requires insight into how the system features impact the performance of the applications. This insight generally results from significant experimental analysis and possibly the development of performance models. This paper present the Prophesy system, for which the novel component is the model development. In particular, this paper discusses the use of our coupling parameter (i.e., a metric that attempts to quantify the interaction between kernels that compose an application) to develop application models. We discuss how this modeling technique can be used in the analysis of grid applications.

Journal of Parallel and Distributed Computing | 2004

Predicting application run times with historical information

Warren Smith; Ian T. Foster; Valerie E. Taylor

We present a technique for predicting the run times of parallel applications based upon the run times of “similar” applications that have executed in the past. The novel aspect of our work is the use of search techniques to determine those application characteristics that yield the best definition of similarity for the purpose of making predictions. We use four workloads recorded from parallel computers at Argonne National Laboratory, the Cornell Theory Center, and the San Diego Supercomputer Center to evaluate the effectiveness of our approach. We show that on these workloads our techniques achieve predictions that are between 21 and 64 percent better than those achieved by other techniques; our approach achieves mean prediction errors that are between 29 and 59 percent of mean application run times.

Computer Science - Research and Development | 2012

Power-aware predictive models of hybrid (MPI/OpenMP) scientific applications on multicore systems

Charles W. Lively; Xingfu Wu; Valerie E. Taylor; Shirley Moore; Hung-Ching Chang; Chun-Yi Su; Kirk W. Cameron

Predictive models enable a better understanding of the performance characteristics of applications on multicore systems. Previous work has utilized performance counters in a system-centered approach to model power consumption for the system, CPU, and memory components. Often, these approaches use the same group of counters across different applications. In contrast, we develop application-centric models (based upon performance counters) for the runtime and power consumption of the system, CPU, and memory components. Our work analyzes four Hybrid (MPI/OpenMP) applications: the NAS Parallel Multizone Benchmarks (BT-MZ, SP-MZ, LU-MZ) and a Gyrokinetic Toroidal Code, GTC. Our models show that cache utilization (L1/L2), branch instructions, TLB data misses, and system resource stalls affect the performance of each application and performance component differently. We show that the L2 total cache hits counter affects performance across all applications. The models are validated for the system and component power measurements with an error rate less than 3%.

Journal of Parallel and Distributed Computing | 2002

A novel dynamic load balancing scheme for parallel systems

Zhiling Lan; Valerie E. Taylor; Greg L. Bryan

Adaptive mesh refinement (AMR) is a type of multiscale algorithm that achieves high resolution in localized regions of dynamic, multidimensional numerical simulations. One of the key issues related to AMR is dynamic load balancing (DLB), which allows large-scale adaptive applications to run efficiently on parallel systems. In this paper, we present an efficient DLB scheme for structured AMR (SAMR) applications. This scheme interleaves a grid-splitting technique with direct grid movements (e.g., direct movement from an overloaded processor to an underloaded processor), for which the objective is to efficiently redistribute workload among all the processors so as to reduce the parallel execution time. The potential benefits of our DLB scheme are examined by incorporating our techniques into a SAMR cosmology application, the ENZO code. Experiments show that by using our scheme, the parallel execution time can be reduced by up to 57 % and the quality of load balancing can be improved by a factor of six, as compared to the original DLB scheme used in ENZO.

international conference on parallel processing | 2001

Dynamic load balancing for structured adaptive mesh refinement applications

Zhiling Lan; Valerie E. Taylor; Greg L. Bryan

Adaptive Mesh Refinement (AMR) is a type of multiscale algorithm that achieves high resolution in localized regions of dynamic, multidimensional numerical simulations. One of the key issues related to AMR is dynamic load balancing (DLB), which allows large-scale adaptive applications to run efficiently on parallel systems. In this paper we present an efficient DLB scheme for structured AMR (SAMR) applications. Our DLB scheme combines a grid-splitting technique with direct grid movements (e.g., direct movement from an overloaded processor to an underloaded proces sor), for which the objective is to efficiently redistribute workload among all the processors so as to reduce the parallel execution time. The potential benefits of our DLB scheme are examined by incorporating our techniques into a parallel, cosmological application that uses SAMR techniques. Experiments show that by using our scheme, the parallel execution time can be reduced by up to 47% and the quality of load-balancing can be improved by a factor of four.

IEEE Transactions on Parallel and Distributed Systems | 2002

Mesh partitioning for efficient use of distributed systems

Jian Chen; Valerie E. Taylor

Mesh partitioning for homogeneous systems has been studied extensively; however, mesh partitioning for distributed systems is a relatively new area of research. To ensure efficient execution on a distributed system, the heterogeneities in the processor and network performance must be taken into consideration in the partitioning process; equal size subdomains and small cut set size, which results from conventional mesh partitioning, are no longer the primary goals. In this paper, we address various issues related to mesh partitioning for distributed systems. These issues include the metric used to compare different partitions, efficiency of the application executing on a distributed system, and the advantage of exploiting heterogeneity in network performance. We present a tool called PART, for automatic mesh partitioning for distributed systems. The novel feature of PART is that it considers heterogeneities in the application and the distributed system. Simulated annealing is used in PART to perform the backtracking search for desired partitions. While it is well-known that simulated annealing is computationally intensive, we describe the parallel version of simulated annealing that is used with PART. The results of the parallelization exhibit superlinear speedup in most cases and nearly perfect speedup for the remaining cases. Experimental results are also presented for partitioning regular and irregular finite element meshes for an explicit, nonlinear finite element application, called WHAMS2D, executing on a distributed system consisting of two IBM SPs with different processors. The results from the regular problems indicate a 33 to 46 percent increase in efficiency when processor performance is considered as compared to the conventional even partitioning. The results indicate a 5 to 15 percent increase in efficiency when network performance is considered as compared to considering only processor performance; this is significant given that the optimal improvement is 15 percent for this application. The results from the irregular problem indicate up to 36 percent increase in efficiency when processor and network performance are considered as compared to even partitioning.

ieee international conference on high performance computing data and analytics | 2011

Energy and performance characteristics of different parallel implementations of scientific applications on multicore systems

Charles W. Lively; Xingfu Wu; Valerie E. Taylor; Shirley Moore; Hung-Ching Chang; Kirk W. Cameron

Energy consumption is a major concern with high-performance multicore systems. In this paper, we explore the energy consumption and performance (execution time) characteristics of different parallel implementations of scientific applications. In particular, the experiments focus on message-passing interface (MPI)-only versus hybrid MPI/OpenMP implementations for hybrid the NAS (NASA Advanced Supercomputing) BT (Block Tridiagonal) benchmark (strong scaling), a Lattice Boltzmann application (strong scaling), and a Gyrokinetic Toroidal Code — GTC (weak scaling), as well as central processing unit (CPU) frequency scaling. Experiments were conducted on a system instrumented to obtain power information; this system consists of eight nodes with four cores per node. The results indicate, with respect to the MPI-only versus the hybrid implementation, that the best implementation is dependent upon the application executed on 16 or fewer cores. For the case of 32 cores, the results were consistent in that hybrid implementation resulted in less execution time and energy. With CPU frequency scaling, the best case for energy saving was not the best case for execution time.

Explore More