Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Francis Patrick O'Connell is active.

Publication


Featured researches published by Francis Patrick O'Connell.


international conference on parallel architectures and compilation techniques | 2012

Making data prefetch smarter: adaptive prefetching on POWER7

Víctor Jiménez; Roberto Gioiosa; Francisco J. Cazorla; Alper Buyuktosunoglu; Pradip Bose; Francis Patrick O'Connell

Hardware data prefetch engines are integral parts of many general purpose server-class microprocessors in the field today. Some prefetch engines allow the user to change some of their parameters. The prefetcher, however, is usually enabled in a default configuration during system bring-up and dynamic reconfiguration of the prefetch engine is not an autonomic feature of current machines. Conceptually, however, it is easy to infer that commonly used prefetch algorithms, when applied in a fixed mode will not help performance in many cases. In fact, they may actually degrade performance due to useless bus bandwidth consumption and cache pollution. In this paper, we present an adaptive prefetch scheme that dynamically modifies the prefetch settings in order to adapt to the workload requirements. We implement and evaluate adaptive prefetching in the context of an existing, commercial processor, namely the IBM POWER7. Our adaptive prefetch mechanism improves performance with respect to the default prefetch setting up to 2.7X and 30% for single-threaded and multiprogrammed workloads, respectively.


high-performance computer architecture | 2015

Increasing multicore system efficiency through intelligent bandwidth shifting

Víctor Jiménez; Alper Buyuktosunoglu; Pradip Bose; Francis Patrick O'Connell; Francisco J. Cazorla; Mateo Valero

Memory bandwidth is a crucial resource in computing systems. Current CMP/SMT processors have a significant number of cores and they can run many threads concurrently. This large thread count adds high pressure to the memory bus, which demands high bandwidth to service memory requests from the cores. Hardware data prefetching is a well-known technique for hiding memory latency. Due to its speculative nature, however, in some situations prefetching does not effectively work, wasting memory bandwidth and polluting the caches. Data prefetching efficiency depends on the prefetching algorithm. It also depends on the characteristics of the applications running on the system. In this paper we propose an online bandwidth shifting mechanism that dynamically assigns bandwidth to applications according to their prefetch efficiency. This mechanism maximizes the utilization of memory bandwidth, thereby improving system performance and/or reducing memory power consumption. To the best of our knowledge, this solution is the first to not require hardware support. We evaluate the benefits of using our bandwidth shifting mechanism on a real system - the IBM POWER7. We obtain speedups in the order of 10-20% (in one instance, speedup exceeds 1.6X). Our mechanism does not generate a significant degree of unfairness among the applications. In many cases individual thread performance increases by 10-35%, while virtually no thread experiences a slowdown larger than 5%.


Journal of Systems Architecture | 1999

Bounds modelling and compiler optimizations for superscalar performance tuning

Pradip Bose; Sunil Kim; Francis Patrick O'Connell; William A. Ciarfella

Abstract We consider the floating point microarchitecture support in RISC superscalar processors. We briefly review the fundamental performance trade-offs in the design of such microarchitecutres. We propose a simple, yet effective bounds model to deduce the “best-case” loop performance limits for these processors. We compare these bounds to simulated and real performance measurements. From this study, we identify several loop tuning opportunities. In particular, we illustrate the use of this analysis in suggesting loop unrolling and scheduling heuristics. We report our experimental results in the context of a set of application-based loop test cases. These are designed to stress various resource limits in the core (infinite cache) microarchitecture.


Archive | 2000

Software prefetch system and method for predetermining amount of streamed data

James Allan Kahle; Michael John Mayfield; Francis Patrick O'Connell; David Scott Ray; Edward John Silha; Joel M. Tendler


Archive | 1999

System and method for prefetching data to multiple levels of cache including selectively using a software hint to override a hardware prefetch mechanism

James Allan Kahle; Michael John Mayfield; Francis Patrick O'Connell; David Scott Ray; Edward John Silha; Joel M. Tendler


Archive | 2007

Data stream prefetching in a microprocessor

Eric Fluhr; Bradly G. Frey; John Barry Griswell; Hung Qui Le; Cathy May; Francis Patrick O'Connell; Edward John Silha; Albert Thomas Williams


Archive | 1999

Cache prefetching of L2 and L3

Michael John Mayfield; Francis Patrick O'Connell; David Scott Ray


Archive | 2006

Data Processing System and Method for Reducing Cache Pollution by Write Stream Memory Access Patterns

Ravi Kumar Arimilli; Francis Patrick O'Connell; Hazim Shafi; Derek Edward Williams; Lixin Zhang


Archive | 2002

Method and apparatus for mapping software prefetch instructions to hardware prefetch logic

Michael John Mayfield; Francis Patrick O'Connell; David Scott Ray


Archive | 2004

Fine-grained software-directed data prefetching using integrated high-level and low-level code analysis optimizations

Roch Georges Archambault; Robert James Blainey; Yaoqing Gao; Allan Russell Martin; James Lawrence McInnes; Francis Patrick O'Connell

Researchain Logo
Decentralizing Knowledge