Tia Newhall
Swarthmore College
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tia Newhall.
IEEE Computer | 1995
Barton P. Miller; M.D. Callaghan; Jonathan M. Cargille; Jeffrey K. Hollingsworth; R.B. Irvin; Karen L. Karavanic; Krishna Kunchithapadam; Tia Newhall
Paradyn is a tool for measuring the performance of large-scale parallel programs. Our goal in designing a new performance tool was to provide detailed, flexible performance information without incurring the space (and time) overhead typically associated with trace-based tools. Paradyn achieves this goal by dynamically instrumenting the application and automatically controlling this instrumentation in search of performance problems. Dynamic instrumentation lets us defer insertion until the moment it is needed (and remove it when it is no longer needed); Paradyns Performance Consultant decides when and where to insert instrumentation. >
european conference on parallel processing | 2003
Tia Newhall; Sean Finney; Kuzman Ganchev; Michael Spiegel
Cluster applications that process large amounts of data, such as parallel scientific or multimedia applications, are likely to cause swapping on individual cluster nodes. These applications will perform better on clusters with network swapping support. Network swapping allows any cluster node with over-committed memory to use idle memory of a remote node as its backing store and to “swap” its pages over the network. As the disparity between network speeds and disk speeds continues to grow, network swapping will be faster than traditional swapping to local disk. We present Nswap, a network swapping system for heterogeneous Linux clusters and networks of Linux machines. Nswap is implemented as a loadable kernel module for version 2.4 of the Linux kernel. It is a space-efficient and time-efficient implementation that transparently performs network swapping. Nswap scales to larger clusters, supports migration of remotely swapped pages, and supports dynamic growing and shrinking of Nswap cache (the amount of RAM available to store remote pages) in response to a node’s local memory needs. Results comparing Nswap running on an eight node Linux cluster with 100BaseT Ethernet interconnect and faster disk show that Nswap is comparable to swapping to local, faster disk; depending on the workload, Nswap’s performance is up to 1.7 times faster than disk to between 1.3 and 4.6 times slower than disk for most workloads. We show that with faster networking technology, Nswap will outperform swapping to disk.
technical symposium on computer science education | 2014
Tia Newhall; Lisa Meeden; Andrew Danner; Ameet Soni; Frances Ruiz; Richard Wicentowski
In line with institutions across the United States, the Computer Science Department at Swarthmore College has faced the challenge of maintaining a demographic composition of students that matches the student body as a whole. To combat this trend, our department has made a concerted effort to revamp our introductory course sequence to both attract and retain more women and minority students. The focus of this paper is the changes instituted in our Introduction to Computer Science course (i.e., CS1) intended for both majors and non-majors. In addition to changing the content of the course, we introduced a new student mentoring program that is managed by a full-time coordinator and consists of undergraduate students who have recently completed the course. This paper describes these efforts in detail, including the extension of these changes to our CS2 course and the associated costs required to maintain these efforts. We measure the impact of these changes by tracking student enrollment and performance over 13 academic years. We show that, unlike national trends, enrollment from underrepresented groups has increased dramatically over this time period. Additionally, we show that the student mentoring program has increased both performance and retention of students, particularly from underrepresented groups, at statistically significant levels.
international conference on cluster computing | 2008
Tia Newhall; Daniel Amato; Alexandr Pshenichkin
We present reliability solutions for adaptable network RAM systems running on general-purpose clusters. Network RAM allows nodes with over-committed memory to swap pages over the network, storing them in the idle RAM of other nodes and avoiding swapping to slow, local disk. An adaptable network RAM system adjusts the amount of RAM currently available for storing remotely swapped pages in response to changes in nodespsila local RAM usage. It is important that network RAM systems provide reliability for remotely swapped page data. Without reliability, a single node failure can result in failure of unrelated processes running on other nodes by losing their remotely swapped pages. Adaptable network RAM systems pose extra difficulties in providing reliability because each nodepsilas capacity for storing remotely swapped pages changes over time, and because pages may move from node to node in response to these changes. Our novel dynamic RAID-based reliability solutions use idle RAM for storing page and reliability data, avoiding using slow disk for reliability. They are designed to work with the adaptive nature of our network RAM system (Nswap), allowing page and reliability data to migrate from node to node and allowing pages to be added to or removed from different parity groups. Additionally, page recovery runs concurrently with cluster applications, so that cluster applications do not have to wait until all data from a failed node is recovered before resuming execution. We present results comparing Nswap to disk swapping for a set of benchmarks running on our gigabit cluster. Our results show that reliable Nswap is up to 32 times faster than swapping to disk, and that there is virtually no impact on the performance of applications as they run concurrently with page recovery.
european conference on parallel processing | 1998
Tia Newhall; Barton P. Miller
In an interpreted execution there is an interdependence between the interpreter’s execution and the interpreted application’s execution; the implementation of the interpreter determines how the application is executed, and the application triggers certain activities in the interpreter. We present a representational model for describing performance data from an interpreted execution that explicitly represents the interaction between the interpreter and the application in terms of both the interpreter and application developer’s view of the execution. We present results of a prototype implementation of a performance tool for interpreted Java programs that is based on our model. Our prototype uses two techniques, dynamic instrumentation and transformational instrumentation, to measure Java programs starting with unmodified Java class files and an unmodified Java virtual machine. We use performance data from our tool to tune a Java program, and as a result, improve its performance by more than a factor of three.
international conference on parallel processing | 2012
Sam White; Niels Verosky; Tia Newhall
We present a hybrid CUDA-MPI sorting algorithm that makes use of GPU clusters to sort large data sets. Our algorithm has two phases. In the first phase each node sorts a portion of the data on its GPU using a parallel bitonic sort. In the second phase the sorted subsequences are merged together in parallel using a reduction sorting network implemented in MPI across the cluster nodes. Performance results comparing our sorting algorithm to sequential quick sort yield speed-up values of up to 9.8 for sorting 4GB of data on a 32 node GPU cluster. We anticipate even better speed-up values using our algorithm on larger data sets and larger sized clusters.
Proceedings of the ACM 1999 conference on Java Grande | 1999
Tia Newhall; Barton P. Miller
With the development of dynamic compilers for Java, Javas performance promises to rival that of equivalent C/C++ binary executions. This should ensure that Java will become the platform of choice for ubiquitous Web-based supercomputing. Therefore, being able to build performance tools for dynamically compiled Java executions will become increasingly important. In this paper we discuss those aspects of dynamically compiled Java executions that make performance measurement di cult: (1) some Java application methods may be transformed from byte-code to native code at run-time; and (2), even in native form, application code may interact with the Java virtual machine. We describe Paradyn-J, an experimental version of the Paradyn Parallel Performance Tool that addresses this environment by describing performance data from dynamically compiled executions in terms of the multiple execution forms (interpreted byte-code and directly executed native code) of a method, costs of the dynamic compilation, and costs of residual dependencies of the application on the virtual machine. We use performance data from Paradyn-J to tune a Java application method, and improve its interpreted byte-code execution by 11% and its native form execution by 10%. As a result of tuning just one method, we improve the applications total execution time by 11% when run under Suns ExactVM (included in the Platform2 release of JDK). The results of our work are a guide to virtual machine designers as to what type of performance data should be available through Java VM performance tool APIs.
ieee international symposium on parallel & distributed processing, workshops and phd forum | 2013
Andrew Danner; Tia Newhall
We present changes to our undergraduate computer science curriculum for a small liberal arts college. The changes are designed to incorporate parallel and distributed computing topics into all levels of our curriculum, with the goal of ensuring that all graduating CS majors have exposure to, and experience with, parallel and distributed computing. Our effort is motivated by the ACM/IEEE Ironman Curriculum, which includes a increased focus on these important topics. In addition, we use the NSF/IEEE-TCPP model curriculum as a guide in our effort. Because of the small size of our department, and the breadth constraints of a liberal arts college, we face some unique challenges. Our multi-year effort involves at least six courses in our curriculum. Of these courses, one is a new introductory-level course, while the others are existing courses whose content has been modified to include more focus on these important topics. We present our curricular changes and we discuss an initial evaluation of the first implementation of these changes.
Journal of Parallel and Distributed Computing | 2017
Tia Newhall; Andrew Danner; Kevin C. Webb
Abstract We present a model for incorporating parallel and distributed computing (PDC) throughout an undergraduate CS curriculum. Our curriculum is designed to introduce students early to parallel and distributed computing topics and to expose students to these topics repeatedly in the context of a wide variety of CS courses. The key to our approach is the development of a required intermediate-level course that serves as an introduction to computer systems and parallel computing. It serves as a requirement for every CS major and minor and is a prerequisite to upper-level courses that expand on parallel and distributed computing topics in different contexts. With the addition of this new course, we are able to easily make room in upper-level courses to add and expand parallel and distributed computing topics. The goal of our curricular design is to ensure that every graduating CS major has exposure to parallel and distributed computing, with both a breadth and depth of coverage. Our curriculum is particularly designed for the constraints of a small liberal arts college, however, much of its ideas and its design are applicable to any undergraduate CS curriculum.
Proceedings of the Second International Symposium on Memory Systems | 2016
Tia Newhall; E. Ryerson Lehman-Borer; Benjamin Marks
To support data intensive cluster computing, it is increasingly important that node virtual memory (VM) systems make effective use of available fast storage devices for swap or temporary file space. Nswap2L is a novel system that transparently manages a heterogeneous set of storage options commonly found in clusters, including node RAM, disk, flash SSD, PCM, or network storage devices. Nswap2L implements a two-level device driver interface. At the top level, it appears to node operating systems (OSs) as a single, fast, random access device that can be added as a swap partition on cluster nodes. It transparently manages the underlying heterogeneous storage devices, including its own implementation of Network RAM, to which swapped out data are stored. It implements data placement, migration, and prefetching policies that choose which underlying physical devices store swapped-out page data. Its policies incorporate information about device capacity, system load, and the strengths of different physical storage media. By moving device-specific knowledge into Nswap2L, VM policies in the OS can be based solely on typical application access patterns and not on characteristics of underlying physical storage media. Nswap2Ls policy decisions are abstracted from the OS, freeing the OS from having to implement specialized policies for different combinations of cluster storage---Nswap2L requires no changes to the OSs VM system. Results of our benchmark tests show that data-intensive applications perform up to 6 times faster on Nswap2L-enabled clusters, and show that our two-level device driver design adds minimal I/O latency to the underlying devices that Nswap2L manages. In addition, we found that even though Nswap2Ls Network RAM is faster than any other backing store, its prefetching policy that distributes data over multiple devices results in increased I/O parallelism and can lead to better performance than swapping only to a single underlying device.