Edith Schonberg | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Edith Schonberg is active.

Explore More

Publication

Featured researches published by Edith Schonberg.

Communications of The ACM | 1992

Factoring: a method for scheduling parallel loops

Susan Flynn Hummel; Edith Schonberg; Lawrence E. Flynn

~lw advantage of , capability of rallel machines, application programs must contain sufficient parallelism, and this parallelism must be effectively scheduled on multiple processors. Loops without dependences among their iterations are a rich source of parallelism in scientific code. Restructuring compilers for sequential programs have been particularly successful in determining when loop iterations are independent and can be executed in parallel. Because of the prevalence of parallel loops, optimal parallel-loop scheduling has received considerable attention in both academic and industrial communities. The fundamental trade-off in scheduling parallel-loop iterations is that of maintaining balanced processor workloads vs. minimizing scheduling overhead. Consider, for example, a loop with N iterations that contain an IF statement. Depending on whether the body of the IF statement is executed, an iteration has LONG or SHORT execution time. If we naively schedule the iterations on P processors in chunks of N/P iterations, a strategy called static chunking (SC), a chunk of one processor may consist of iterations that are all LONG, while a chunk of another processor may consist of iterations that are all SHORT. Hence, different processors may finish at widely different times. Since the loop finishing time is equal to the latest finishing time of any of the processors executing the loop, the overall finishing time may be greater than optimal with SC. Alternatively, if we (also naively) schedule the iterations one at a time, a strategy called self-scheduling (SS), then there will be N scheduling operations. With SS, a processor obtains a new iteration whenever it becomes idle, so the processors finish at nearly the same time and the workload is balanced. Because of the scheduling overhead , however, the overall finishing time may be greater than optimal. The characteristics of the iterations determine which scheme performs better. For instance, variable-length, coarse-grained iterations favor SS, while constant-length, fine-grained iterations favor SC. Even when iterations do not contain conditional statements, their running times are likely to be variable because of interference from their environment (other iterations, the operating system, and other programs). The scheduling schemes just discussed are extremes; between the two lie schemes that attempt to minimize the cumulative contribution of uneven processor finishing times and of scheduling overhead. Such schemes schedule iterations in chunks of sizes greater than one but less than N/P, where size is the number of iterations in the chunk. Both fixed-size and variable-size chunk-ing schemes have been proposed. In …

acm sigplan symposium on principles and practice of parallel programming | 1990

An empirical comparison of monitoring algorithms for access anomaly detection

Anne Dinning; Edith Schonberg

One of the major disadvantages of parallel programming with shared memory is the nondeterministic behavior caused by uncoordinated access to shared variables, known as <italic>access anomalies</italic>. Monitoring program execution to detect access anomalies is a promising and relatively unexplored approach to this problem. We present a new algorithm, referred to as <italic>task recycling</italic>, for detecting anomalies, and compare it to an existing algorithm. Empirical results indicate several significant conclusions: (i) While space requirements are bounded by &Ogr;(<italic>T</italic> × <italic>V</italic>), where <italic>T</italic> is the maximum number of threads that may potentially execute in parallel and <italic>V</italic> is the number of variable monitored, for typical programs space requirements are on average &Ogr;(<italic>V</italic>). (ii) Task recycling is more efficient in terms of space requirements and often in performance. (iii) The general approach of monitoring to detect access anomalies is practical.

Communications of The ACM | 2000

Measuring success

Edith Schonberg; Thomas Anthony Cofino; Robert Hoch; Mark Podlaseck; Susan L. Spraragen

COMMUNICATIONS OF THE ACM August 2000/Vol. 43, No. 8 53 The structure of the Web is rapidly evolving from a loose collection of Web sites into organized marketplaces. The phenomena of aggregation, portals, large enterprise sites, and business-to-business applications are resulting in centralized, virtual places, through which millions of visitors pass. With this development, it becomes possible to gather unprecedented amounts of data about individuals. Data sources capturing purchase histories, casual browsing habits, financial activities, credit histories, and demographics can be combined to construct highly detailed personal profiles. Not only is it possible to collect vast amounts of data, it is vital for e-businesses to be able exploit the data effectively. In the Internet environment, products and services are constantly in danger of becoming commodities, shoppers can explore competing Web sites without leaving their chairs, and bots and agents MEASURING SUCCESS Edith Schonberg, Thomas Cofino, Robert Hoch, Mark Podlaseck, and Susan L. Spraragen

workshop on parallel & distributed debugging | 1991

Detecting access anomalies in programs with critical sections

Anne Dinning; Edith Schonberg

ThE paper presents an efficient on-the-fly method for detecting access anomalies in programs that contain critical section coordinw tion. For a large class of programs, a single execution instance is sufficient to determine theexistence of an acceeaanomaly for a given input when the proposed method is used. In contrast, for the same class of programs, previous on-the-fly methods for handling critical sections can fail to detect anomalies for a given input, and can require N! execution instances to find an anomaly, where N is the degree of parallelism. An algorithm for statically determining which programs are in this class is described.

conference on high performance computing (supercomputing) | 1991

Factoring: a practical and robust method for scheduling parallel loops

Susan Flynn Hummel; Edith Schonberg; Lawrence E. Flynn

No abstract available

conference on high performance computing (supercomputing) | 1995

An HPF Compiler for the IBM SP2

Manish Gupta; Samuel P. Midkiff; Edith Schonberg; Ven Seshadri; David Shields; Ko-Yang Wang; Wai-Mee Ching; Ton Ngo

We describe pHPF, an research prototype HPF compiler for the IBM SP series parallel machines. The compiler accepts as input Fortran 90 and Fortran 77 programs, augmented with HPF directives; sequential loops are automatically parallelized. The compiler supports symbolic analysis of expressions. This allows parameters such as the number of processors to be unknown at compile-time without significantly affecting performance. Communication schedules and computation guards are generated in a parameterized form at compile-time. Several novel optimizations and improved versions of well-known optimizations have been implemented in pHPF to exploit parallelism and reduce communication costs. These optimizations include elimination of redundant communication using data-availability analysis; using collective communication; new techniques for mapping scalar variables; coarse-grain wavefronting; and communication reduction in multi-dimensional shift communications. We present experimental results for some well-known benchmark routines. The results show the effectiveness of the compiler in generating efficient code for HPF programs.

international semantic web conference | 2007

Matching patient records to clinical trials using ontologies

Chintan Patel; James J. Cimino; Julian Dolby; Achille Fokoue; Aditya Kalyanpur; Aaron Kershenbaum; Li Ma; Edith Schonberg; Kavitha Srinivas

This paper describes a large case study that explores the applicability of ontology reasoning to problems in the medical domain. We investigate whether it is possible to use such reasoning to automate common clinical tasks that are currently labor intensive and error prone, and focus our case study on improving cohort selection for clinical trials. An obstacle to automating such clinical tasks is the need to bridge the semantic gulf between raw patient data, such as laboratory tests or specific medications, and the way a clinician interprets this data. Our key insight is that matching patients to clinical trials can be formulated as a problem of semantic retrieval. We describe the technical challenges to building a realistic case study, which include problems related to scalability, the integration of large ontologies, and dealing with noisy, inconsistent data. Our solution is based on the SNOMED CT ® ontology, and scales to one year of patient records (approx. 240, 000 patients).

programming language design and implementation | 2010

Finding low-utility data structures

Guoqing Xu; Nick Mitchell; Matthew Arnold; Atanas Rountev; Edith Schonberg; Gary Sevitsky

Many opportunities for easy, big-win, program optimizations are missed by compilers. This is especially true in highly layered Java applications. Often at the heart of these missed optimization opportunities lie computations that, with great expense, produce data values that have little impact on the programs final output. Constructing a new date formatter to format every date, or populating a large set full of expensively constructed structures only to check its size: these involve costs that are out of line with the benefits gained. This disparity between the formation costs and accrued benefits of data structures is at the heart of much runtime bloat. We introduce a run-time analysis to discover these low-utility data structures. The analysis employs dynamic thin slicing, which naturally associates costs with value flows rather than raw data flows. It constructs a model of the incremental, hop-to-hop, costs and benefits of each data structure. The analysis then identifies suspicious structures based on imbalances of its incremental costs and benefits. To decrease the memory requirements of slicing, we introduce abstract dynamic thin slicing, which performs thin slicing over bounded abstract domains. We have modified the IBM J9 commercial JVM to implement this approach. We demonstrate two client analyses: one that finds objects that are expensive to construct but are not necessary for the forward execution, and second that pinpoints ultimately-dead values. We have successfully applied them to large-scale and long-running Java applications. We show that these analyses are effective at detecting operations that have unbalanced costs and benefits.

IEEE Transactions on Parallel and Distributed Systems | 1996

A unified framework for optimizing communication in data-parallel programs

Manish Gupta; Edith Schonberg; Harini Srinivasan

This paper presents a framework, based on global array data-flow analysis, to reduce communication costs in a program being compiled for a distributed memory machine. We introduce available section descriptor, a novel representation of communication involving array sections. This representation allows us to apply techniques for partial redundancy elimination to obtain powerful communication optimizations. With a single framework, we are able to capture optimizations like (1) vectorizing communication, (2) eliminating communication that is redundant on any control flow path, (3) reducing the amount of data being communicated, (4) reducing the number of processors to which data must be communicated, and (5) moving communication earlier to hide latency, and to subsume previous communication. We show that the bidirectional problem of eliminating partial redundancies can be decomposed into simpler unidirectional problems even in the context of an array section representation, which makes the analysis procedure more efficient. We present results from a preliminary implementation of this framework, which are extremely encouraging, and demonstrate the effectiveness of this analysis in improving the performance of programs.

european conference on object oriented programming | 2009

Making Sense of Large Heaps

Nick Mitchell; Edith Schonberg; Gary Sevitsky

It is common for large-scale Java applications to suffer memory problems, whether inefficient designs that impede scalability, or lifetime bugs such as leaks. Making sense of heaps with many millions of objects is difficult given the extensive layering, framework reuse, and shared ownership in current applications. We present Yeti, a tool that summarizes memory usage to uncover the costs of design decisions, rather than of lower-level artifacts as in traditional tools, making it possible to quickly identify and remediate problems. Yeti employs three progressive abstractions and corresponding visualizations: it identifies costly groups of objects that collectively perform a function, recovers a logical data model for each, and summarizes the implementation of each model entity and relationship. Yeti is used by development and service teams within IBM, and has been effective in solving numerous problems. Through case studies we demonstrate how these abstractions help solve common categories of problems.

Explore More