Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Nigel C. Paver is active.

Publication


Featured researches published by Nigel C. Paver.


international symposium on performance analysis of systems and software | 2014

Sources of error in full-system simulation

Anthony Gutierrez; Joseph Pusdesris; Ronald G. Dreslinski; Trevor N. Mudge; Chander Sudanthi; Christopher D. Emmons; Mitchell Hayenga; Nigel C. Paver

In this work we investigate the sources of error in gem5-a state-of-the-art computer simulator-by validating it against a real hardware platform: the ARM Versatile Express TC2 development board. We design a custom gem5 configuration and make several changes to the simulator itself in order to more closely match the Versatile Express TC2 board. With the modifications we make to the simulator, we are able to achieve a mean percentage runtime error of 5% and a mean absolute percentage runtime error of 13% for the SPEC CPU2006 benchmarks. For the PARSEC benchmarks we achieve a mean percentage runtime error of -11% and -12% for single and dual-core runs respectively, and a mean absolute percentage runtime error of 16% and 17% for single and dual-core runs respectively. While prior work typically considers only runtime accuracy, we extend our investigation to include several key microarchitectural statistics as well, showing that we can achieve accuracy within 20% on average for a majority of them. Much of this error is likely from modeling similar, but not identical components.


ieee international symposium on workload characterization | 2013

A structured approach to the simulation, analysis and characterization of smartphone applications

Dam Sunwoo; William Wang; Mrinmoy Ghosh; Chander Sudanthi; Geoffrey Blake; Christopher D. Emmons; Nigel C. Paver

Full-system simulators are invaluable tools for designing new architectures due to their ability to simulate full applications as well as capture operating system behavior, virtual machine or hypervisor behavior, and interference between concurrently-running applications. However, the systems under investigation and applications under test have become increasingly complicated leading to prohibitively long simulation times for a single experiment. This problem is compounded when many permutations of system design parameters and workloads are tested to investigate system sensitivities and full-system effects with confidence. In this paper, we propose a methodology to tractably explore the processor design space and to characterize applications in a full-system simulation environment. We combine SimPoint, Principal Component Analysis and Fractional Factorial experimental designs to substantially reduce the simulation effort needed to characterize and analyze workloads. We also present a non-invasive user-interface automation tool to allow us to study all types of workloads in a simulation environment. While our methodology is generally applicable to many simulators and workloads, we demonstrate the application of our proposed flow on smartphone applications running on the Android operating system within the gem5 simulation environment.


memory performance dealing with applications systems and architecture | 2008

Accurate system-level performance modeling and workload characterization for mobile internet devices

Mitchell Hayenga; Chander Sudanthi; Mrinmoy Ghosh; Prakash Shyamlal Ramrakhyani; Nigel C. Paver

As mobile applications and devices become ubiquitous, consumer demands for performance, power efficiency, and connectivity are increasing. The software framework existing on mobile internet devices is a complex interaction of real-time tasks, non-real-time applications, and operating system management routines. Traditional simulation approaches are poorly suited to modeling the overall performance characteristics of such systems. Additionally, many traditional benchmark suites used in academia and industry for microprocessor benchmarking and design have been found to be unrepresentative of mobile workloads. This paper presents multiple frameworks utilized for accurately modeling system-level performance of embedded systems used for mobile applications. Furthermore, this paper provides an in-depth workload characterization and memory-level analysis of internet and media-centric applications. All workload characterization is performed running full operating systems and software stacks. For comparison purposes, the system-level analysis is performed at three distinct levels: Full RTL emulation with device peripherals, software-level emulation, and through performance counters on real-world devices. The same mobile workload and operating system is run across all of these platforms.


signal processing systems | 2005

Accelerating Mobile Video: A 64-Bit SIMD Architecture for Handheld Applications

Nigel C. Paver; Moinul H. Khan; Bradley C. Aldrich; Christopher D. Emmons

Providing quality mobile video applications in hand-held mobile devices requires increased computational capability. Using Single Instruction Multiple Data (SIMD) techniques to expose and accelerate the data parallelism inherent in video processing increases performance in handheld and wireless systems. The paper introduces a new 64-bit SIMD coprocessor of the Intel® XScale® microarchitecture which is optimized for low-power handheld applications. The architecture blends the SIMD media processing style with the capabilities of the XScale microarchitecture. This paper provides an overview of the architecture, its instruction set, programming model, the pipeline organization and functional units. The paper also describes how key features of architecture improve the performance of video applications as compared to a scalar implementation. The performance and power improvements based upon measured results are analyzed to show how the opportunities of power savings by reducing the frequency and voltage can be realized.


international symposium on multimedia | 2004

Accelerating Mobile Multimedia with Intel Wireless MMX Technology

Nigel C. Paver; Moinul H. Khan; Bradley C. Aldrich

Demand for mobile video applications is growing today in wireless handheld platforms. Intel/spl reg/ Wireless MMX/spl trade/ technology has been designed to accelerate mobile multimedia and applications processing in a power efficient manner. Optimizing instruction set architecture is a logical approach towards attaining higher performance in multimedia applications. Wireless MMX technology is a 64-bit single instruction multiple data, (SIMD), coprocessor for the Intel/spl reg/ Xscale/spl reg/ microarchitecture. This paper provides an overview of Wireless MMX technology and the key features of the architecture that specifically enhance the multimedia performance. Tools and techniques for optimization are also described.


international conference on acoustics, speech, and signal processing | 2003

Intel/spl reg/ wireless MMXTM technology: a 64-bit SIMD architecture for mobile multimedia

Nigel C. Paver; Bradley C. Aldrich; Moinul H. Khan

The growing demand for multimedia rich applications in the wireless mobile domain challenges the capabilities of current wireless handheld devices. Optimizing instruction set architecture is a logical approach towards attaining higher performance in multimedia applications. Intel/spl reg/ wireless MMXTM technology is a 64-bit single instruction multiple data, (SIMD), coprocessor for the Intel/spl reg/ XScale/spl trade/ microarchitecture. It accelerates multimedia applications in handheld and wireless devices by taking advantage of the inherent parallelism and data types of targeted applications. This paper provides an overview of the wireless MMX architecture, its instruction set, pipeline organization, and functional units. Initial benchmark results measured on silicon are also presented.


symposium on cloud computing | 2009

Performance analysis of compressed instruction sets on workloads targeted at mobile internet devices

Chander Sudanthi; Mrinmoy Ghosh; Kevin Welton; Nigel C. Paver

This paper describes the performance advantages of a two and four byte variable length instruction set, Thumb2. over a four byte fixed length instruction set, ARM. Both instruction sets are found in ARMv7-A ISA compatible processors, such as the ARM Cortex-A8 and Cortex-A9. The code size reduction when using a variable length instruction set is well understood and can be significant. The focus of this paper is the performance advantage of increased code density. With Thumb2 more instructions are stored in the I-cache, increasing I-cache hit rates, and in turn increasing the performance of the processor. To demonstrate the performance advantage of Thumb2, a Mozilla based web browser built for Thumb2 and ARM on Linux is run in a full system emulator and in a full system instruction set simulator with a cache model. Switching from the ARM four byte fixed length instruction set to the Thumb2 two and four byte variable length instruction set results in a 1.07x improvement in performance and 33% improvement in code density.


Multimedia Tools and Applications | 2006

Optimizing mobile multimedia using SIMD techniques

Nigel C. Paver; Moinul H. Khan; Bradley C. Aldrich

Demand for mobile video applications is growing today in wireless handheld platforms. Optimizing instruction set architectures and employing SIMD techniques is a logical approach towards attaining higher performance in mobile multimedia applications. Intel® Wireless MMX™ technology has been designed to accelerate mobile multimedia and applications processing in a power efficient manner. This paper provides an overview of Intel® Wireless MMX™ technology, a 64-bit Single Instruction Multiple Data (SIMD) coprocessor for the Intel® XScale® microarchitecture, and the key features of the architecture that specifically enhance the multi-media performance. Tools and techniques for optimization are also described.


signal processing systems | 2003

Accelerating mobile video applications using Intel/sup /spl reg// Wireless MMX/spl trade/ technology

Nigel C. Paver; Moinul H. Khan; Bradley C. Aldrich; Christopher D. Emmons

Demand for mobile video applications is growing today in wireless handheld platforms. Intel/sup /spl reg// Wireless MMX/spl trade/ technology has been designed to accelerate video applications by using single instruction multiple data (SIMD) techniques to expose and accelerate the data parallelism inherent in video processing. Intel Wireless MMX technology is a 64-bit SIMD coprocessor of the Intel XScale/spl trade/ microarchitecture which is optimized for low-power handheld applications. The paper provides an overview of the Wireless MMX technology architecture, its instruction set, pipeline organization and functional units. It also provides analysis of the features of the architecture that specifically enhance the video performance. Initial measured performance results are also provided.


ieee international symposium on workload characterization | 2011

Full-system analysis and characterization of interactive smartphone applications

Anthony Gutierrez; Ronald G. Dreslinski; Thomas F. Wenisch; Trevor N. Mudge; Ali G. Saidi; Christopher D. Emmons; Nigel C. Paver

Collaboration


Dive into the Nigel C. Paver's collaboration.

Researchain Logo
Decentralizing Knowledge