Geoff Lowney
Intel
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Geoff Lowney.
Scientific Programming - Exploring Languages for Expressing Medium to Massive On-Chip Parallelism archive | 2010
Zoran Budimlic; Michael G. Burke; Vincent Cavé; Kathleen Knobe; Geoff Lowney; Ryan R. Newton; Jens Palsberg; David M. Peixotto; Vivek Sarkar; Frank Schlimbach; Sagnak Tasirlar
We introduce the Concurrent Collections (CnC) programming model. CnC supports flexible combinations of task and data parallelism while retaining determinism. CnC is implicitly parallel, with the user providing high-level operations along with semantic ordering constraints that together form a CnC graph. We formally describe the execution semantics of CnC and prove that the model guarantees deterministic computation. We evaluate the performance of CnC implementations on several applications and show that CnC offers performance and scalability equivalent to or better than that offered by lower-level parallel programming models.
workshop on declarative aspects of multicore programming | 2009
Zoran Budimlic; Aparna Chandramowlishwaran; Kathleen Knobe; Geoff Lowney; Vivek Sarkar; Leo Treggiari
Concurrent Collections (CnC) is a declarative parallel language that allows the application developer to express their parallel application as a collection of high-level computations called steps that communicate via single-assignment data structures called items. A CnC program is specified in two levels. At the bottom level, an existing imperative language implements the computations within the individual computation steps. At the top level, CnC describes the relationships (ordering constraints) among the steps. The memory management mechanism of the existing imperative language manages data whose lifetime is within a computation step. A key limitation in the use of CnC for long-running programs is the lack of memory management and garbage collection for data items with lifetimes that are longer than a single computation step. Although the goal here is the same as that of classical garbage collection, the nature of problem and therefore nature of the solution is distinct. The focus of this paper is the memory management problem for these data items in CnC. We introduce a new declarative slicing annotation for CnC that can be transformed into a reference counting procedure for memory management. Preliminary experimental results obtained from a Cholesky example show that our memory management approach can result in space reductions for CnC data items of up to 28x relative to the baseline case of standard CnC without memory management.
acm symposium on parallel algorithms and architectures | 2006
Geoff Lowney
Intel has announced that most of its future processors will contain more than one execution core. This talk will explain the rationale for this decision and discuss why multi-core processors deliver energy-efficient performance.
Sigplan Notices | 2009
Zoran Budimlic; Aparna Chandramowlishwaran; Kathleen Knobe; Geoff Lowney; Vivek Sarkar; Leo Treggiari
Concurrent Collections (CnC) is a declarative parallel language that allows the application developer to express their parallel application as a collection of high-level computations called steps that communicate via single-assignment data structures called items. A CnC program is specified in two levels. At the bottom level, an existing imperative language implements the computations within the individual computation steps. At the top level, CnC describes the relationships (ordering constraints) among the steps. The memory management mechanism of the existing imperative language manages data whose lifetime is within a computation step. A key limitation in the use of CnC for long-running programs is the lack of memory management and garbage collection for data items with lifetimes that are longer than a single computation step. Although the goal here is the same as that of classical garbage collection, the nature of problem and therefore nature of the solution is distinct. The focus of this paper is the memory management problem for these data items in CnC. We introduce a new declarative slicing annotation for CnC that can be transformed into a reference counting procedure for memory management. Preliminary experimental results obtained from a Cholesky example show that our memory management approach can result in space reductions for CnC data items of up to 28x relative to the baseline case of standard CnC without memory management.
symposium on code generation and optimization | 2004
C.-K. Luk; Robert Muth; Harish Patil; Robert Cohn; Geoff Lowney
Ispike is a post-link optimizer developed for the Intel/spl reg/ Itanium Processor Family (IPF) processors. The IPF architecture poses both opportunities and challenges to post-link optimizations. IPF offers a rich set of performance counters to collect detailed profile information at a low cost, which is essential to post-link optimization being practical. At the same time, the predication and bundling features on IPF make post-link code transformation more challenging than on other architectures. In Ispike, we have implemented optimizations like code layout, instruction prefetching, data layout, and data prefetching that exploit the IPF advantages, and strategies that cope with the IPF-specific challenges. Using SPEC CINT2000 as benchmarks, we show that Ispike improves performance by as much as 40% on the ltanium/spl reg/2 processor, with average improvement of 8.5% and 9.9% over executables generated by the Intel/spl reg/ Electron compiler and by the Gcc compiler, respectively. We also demonstrate that statistical profiles collected via IPF performance counters and complete profiles collected via instrumentation produce equal performance benefit, but the profiling overhead is significantly lower for performance counters.
symposium on code generation and optimization | 2004
Chi-Keung Luk; Robert Muth; Harish Patil; Robert Cohn; Geoff Lowney
Ispike is a post-link optimizer developed for the Intel/spl reg/ Itanium Processor Family (IPF) processors. The IPF architecture poses both opportunities and challenges to post-link optimizations. IPF offers a rich set of performance counters to collect detailed profile information at a low cost, which is essential to post-link optimization being practical. At the same time, the predication and bundling features on IPF make post-link code transformation more challenging than on other architectures. In Ispike, we have implemented optimizations like code layout, instruction prefetching, data layout, and data prefetching that exploit the IPF advantages, and strategies that cope with the IPF-specific challenges. Using SPEC CINT2000 as benchmarks, we show that Ispike improves performance by as much as 40% on the ltanium/spl reg/2 processor, with average improvement of 8.5% and 9.9% over executables generated by the Intel/spl reg/ Electron compiler and by the Gcc compiler, respectively. We also demonstrate that statistical profiles collected via IPF performance counters and complete profiles collected via instrumentation produce equal performance benefit, but the profiling overhead is significantly lower for performance counters.
international symposium on computer architecture | 2002
Roger Espasa; Federico Ardanaz; Joel S. Emer; Stephen Felix; Julio Gago; Roger Gramunt; Isaac Hernandez; Toni Juan; Geoff Lowney; Matthew Mattina; André Seznec
Archive | 2003
Chi-Keung Luk; Geoff Lowney
Archive | 2009
Roger Espasa; Joel S. Emer; Geoff Lowney; Roger Gramunt; Santiago Galan; Toni Juan; Jesus Corbal; Federico Ardanaz; Isaac Hernandez
Archive | 2009
Harish Patil; Robert Muth; Geoff Lowney