Nick Trebon
University of Chicago
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Nick Trebon.
Grid-Based Problem Solving Environments | 2007
Peter H. Beckman; Suman Nadella; Nick Trebon; Ivan Beschastnikh
Modeling and simulation using high-performance computing are playing an increasingly important role in decision making and prediction. For time-critical emergency decision support applications, such as influenza modeling and severe weather prediction, late results may be useless. A specialized infrastructure is needed to provide computational resources quickly. This paper describes the architecture and implementation of SPRUCE, a system for supporting urgent computing on both traditional supercomputers and distributed computing Grids. Currently deployed on the TeraGrid, SPRUCE provides users with “right-of-way tokens” that can be activated from a Web-based portal or Web service invocation in the event of an urgent computing need. Tokens are transferrable and can be restricted to specific resource sets and priority levels. Once a session is activated, job submissions may request elevated priority. Based on local policy, computing resources can respond, for example, by preempting active jobs or raising the job’s priority in the queue. This paper also explores the strengths and weaknesses of the SPRUCE architecture and token-based activation for urgent computing applications.
grid computing | 2004
Allen D. Malony; Sameer Shende; Robert Bell; Kai Li; Li Li; Nick Trebon
To address the increasing complexity in parallel and distributed systems and software, advances in performance technology towards more robust tools and broader, more portable implementations are needed. In doing so, new challenges for performance instrumentation, measurement, analysis, and visualization arise to address evolving requirements for how performance phenomena is observed and how performance data is used. This paper presents recent advances in the TAU performance system in four areas where improvements in performance technology are important: instrumentation control, performance mapping, performance interaction and steering, and performance databases. In the area of instrumentation control, we are concerned with the removal of instrumentation in cases of high measurement overhead. Our approach applies rule-based analysis of performance data in an iterative instrumentation process. Work on performance mapping focuses on measuring performance with respect to dynamic calling paths when the static callgraph cannot be determined prior to execution. We describe an online performance data access, analysis, and visualization system that will form the basis of a large-scale performance interaction and steering system. Finally, we describe our approach to the management of performance data in a database framework that supports multi-experiment analysis.
international parallel and distributed processing symposium | 2009
Jason Cope; Nick Trebon; Henry M. Tufo; Peter H. Beckman
Distributed urgent computing workflows often require data to be staged between multiple computational resources. Since these workflows execute in shared computing environments where users compete for resource usage, it is necessary to allocate resources that can meet the deadlines associated with time-critical workflows and can tolerate interference from other users. In this paper, we evaluate the use of robust resource selection and scheduling heuristics to improve the execution of tasks and workflows in urgent computing environments that are dependent on the availability of data resources and impacted by interference from less urgent tasks.
international parallel and distributed processing symposium | 2004
Jaideep Ray; Nick Trebon; Robert C. Armstrong; Sameer Shende; Allen D. Malony
Summary form only given. We present a case study of performance measurement and modeling of a CCA (common component architecture) component-based application in a high performance computing environment. Component-based HPC applications allow the possibility of creating component-level performance models and synthesizing them into application performance models. However, they impose the restriction that performance measurement/monitoring needs to be done in a nonintrusive manner and at a fairly coarse-grained level. We propose a performance measurement infrastructure for HPC based loosely on recent work done for grid environments. A prototypical implementation of the infrastructure is used to collect data for three components in a scientific application and construct their performance models. Both computational and message-passing performance are addressed.
Concurrency and Computation: Practice and Experience | 2005
Allen D. Malony; Sameer Shende; Nick Trebon; Jaideep Ray; Robert C. Armstrong; Craig Edward Rasmussen; Matthew J. Sottile
This work targets the emerging use of software component technology for high‐performance scientific parallel and distributed computing. While component software engineering will benefit the construction of complex science applications, its use presents several challenges to performance measurement, analysis, and optimization. The performance of a component application depends on the interaction (possibly nonlinear) of the composed component set. Furthermore, a component is a ‘binary unit of composition’ and the only information users have is the interface the component provides to the outside world. A performance engineering methodology and development approach is presented to address evaluation and optimization issues in high‐performance component environments. We describe a prototype implementation of a performance measurement infrastructure for the Common Component Architecture (CCA) system. A case study demonstrating the use of this technology for integrated measurement, monitoring, and optimization in CCA component‐based applications is given. Copyright
Concurrency and Computation: Practice and Experience | 2007
Nick Trebon; Allen Morris; Jaideep Ray; Sameer Shende; Allen D. Malony
A parallel component environment places constraints on performance measurement and modeling. For instance, it must be possible to instrument the application without access to the source code. In addition, a component may admit multiple implementations, based on the choice of algorithm, data structure, parallelization strategy, etc., posing the user with the problem of having to choose the ‘correct’ implementation and achieve an optimal (fastest) component assembly. Under the assumption that an empirical performance model exists for each implementation of each component, simply choosing the optimal implementation of each component does not guarantee an optimal component assembly since components interact with each other. An optimal solution may be obtained by evaluating the performance of all of the possible realizations of a component assembly given the components and all of their implementations, but the exponential complexity renders the approach unfeasible as the number of components and their implementations rise. This paper describes a non‐intrusive, coarse‐grained performance monitoring system that allows the user to gather performance data through the use of proxies. In addition, a simple optimization library that identifies a nearly optimal configuration is proposed. Finally, some experimental results are presented that illustrate the measurement and optimization strategies. Copyright
international conference on cluster computing | 2008
Nick Trebon; Peter H. Beckman
Scientific simulation and modeling often aid in making critical decisions in such diverse fields as city planning, severe weather prediction and influenza modeling. In some of these situations the computations operate under strict deadlines, after which point the results may have very little value. In these cases of urgent computing, it is imperative that these computations begin execution as quickly as possible. The special priority and urgent compute environment (SPRUCE) is a framework designed to enable these high priority computations to quickly access computational grid resources through elevated batch queue priority. However, participating resources are allowed to decide locally how to respond to urgent requests. For instance, some may offer next-to-run status while others may preempt currently executing jobs to clear off the necessary nodes. However, the user is still faced with the problem of resource selection - namely, which resource (and corresponding urgent computing policy) provides the best probability of meeting a given deadline? This paper introduces a set of methodologies and heuristics aimed at generating an empirical-based probabilistic upper bound on the total turnaround time for an urgent computation. These upper bounds can then be used to guide the user in selecting a resource with greater confidence that their deadline will be met.
ieee international conference on high performance computing data and analytics | 2006
Peter H. Beckman; Ivan Beschastnikh; Suman Nadella; Nick Trebon
Archive | 2011
Ian T. Foster; Nick Trebon
Archive | 2005
Nick Trebon; Alan Morris; Jaideep Ray; Sameer Shende; Allen D. Malony