Peter J. Wilson | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Peter J. Wilson is active.

Explore More

Publication

Featured researches published by Peter J. Wilson.

networking architecture and storages | 2015

Efficient parallel packet processing using a shared memory many-core processor with hardware support to accelerate communication

Farrukh Hijaz; Brian Kahne; Peter J. Wilson; Omer Khan

Software IP forwarding routers provide flexibility, programmability and extensibility, while enabling fast deployment. The key question is whether they can keep up with the efficiency of special purpose hardware counterparts. Shared memory stands out as sine qua non for parallel programming of many commercial multicore processors, so it is the paradigm of choice to implement software routers. For efficiency, shared memory is often implemented with hardware support for cache coherence and data consistency among the cores. Although it enables efficient data access in many common case scenarios, the communication between cores using shared memory synchronization primitives often limits scalability. In this paper we perform a thorough characterization of a multithreaded packet processing application to quantify the opportunities from exploiting concurrency, as well as identify scalability bottlenecks in futuristic shared memory multicores. We propose to retain the shared memory model, however, introduce a set of lightweight in-hardware explicit messaging send/receive instructions in the instruction set architecture (ISA). These instructions are used to mitigate the overheads of multi-party communication in shared memory protocols. Using simulations of a 64 core multicore, we identify that scalability of parallel packet processing is limited due to packet ordering requirement that leads to expensive implicit communication under shared memory. Using explicit messaging support in the ISA, the communication bottleneck is mitigated, and the application scales to 30× at 64 cores.

international parallel and distributed processing symposium | 2017

Accelerating Graph and Machine Learning Workloads Using a Shared Memory Multicore Architecture with Auxiliary Support for In-hardware Explicit Messaging

Halit Dogan; Farrukh Hijaz; Masab Ahmad; Brian Kahne; Peter J. Wilson; Omer Khan

Shared Memory stands out as a sine qua non for parallel programming of many commercial and emerging multicore processors. It optimizes patterns of communication that benefit common programming styles. As parallel programming is now mainstream, those common programming styles are challenged with emerging applications that communicate often and involve large amount of data. Such applications include graph analytics and machine learning, and this paper focuses on these domains. We retain the shared memory model and introduce a set of lightweight in-hardware explicit messaging instructions in the instruction set architecture (ISA). A set of auxiliary communication models are proposed that utilize explicit messages to accelerate synchronization primitives, and efficiently move computation towards data. The results on a 256-core simulated multicore demonstrate that the proposed communication models improve performance and dynamic energy by an average of 4x and 42% respectively over traditional shared memory.

microprocessor test and verification | 2005

An Introduction to the Plasma Language

Brian Kahne; Aseem Gupta; Peter J. Wilson; Nikil D. Dutt

The ability to enhance single-thread performance, such as by increasing clock frequency, is reaching a point of diminishing returns: power is becoming a dominating factor and limiting scalability. Adding additional cores is a scalable way to increase performance, but it requires that system designers have a method for developing multithreaded applications. Plasma, (parallel language for system modeling and analysis) is a parallel language for system modeling and multi-threaded application development implemented as a superset of C++. The language extensions are based upon those found in Occam, which is based upon CSP (communicating sequential processes) by C. A. R. Hoare. The goal of the Plasma project is to investigate whether a language with the appropriate constructs might be used to ease the task of developing highly multi-threaded software. In addition, through the inclusion of a discrete event simulation API, we seek to simplify the task of system modeling and increase productivity through clearer representation and increased compile-time checking of the more difficult-to-get-right aspects of systems models (the concurrency). The result is a single language which allows users to develop a parallel application and then to model it within the context of a system, allowing for hardware-software partitioning and various other early tradeoff analyses. We believe that this language offers a simpler and more concise syntax than other offerings and can be targeted at a large range of potential architectures, including heterogeneous systems and those without shared memory

Archive | 2008