Joshua B. Fryman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Joshua B. Fryman is active.

Explore More

Publication

Featured researches published by Joshua B. Fryman.

high-performance computer architecture | 2013

Runnemede: An architecture for Ubiquitous High-Performance Computing

Nicholas P. Carter; Aditya Agrawal; Shekhar Borkar; Romain Cledat; Howard S. David; Dave Dunning; Joshua B. Fryman; Ivan Ganev; Roger A. Golliver; Rob C. Knauerhase; Richard Lethin; Benoît Meister; Asit K. Mishra; Wilfred R. Pinfold; Justin Teller; Josep Torrellas; Nicolas Vasilache; Ganesh Venkatesh; Jianping Xu

DARPAs Ubiquitous High-Performance Computing (UHPC) program asked researchers to develop computing systems capable of achieving energy efficiencies of 50 GOPS/Watt, assuming 2018-era fabrication technologies. This paper describes Runnemede, the research architecture developed by the Intel-led UHPC team. Runnemede is being developed through a co-design process that considers the hardware, the runtime/OS, and applications simultaneously. Near-threshold voltage operation, fine-grained power and clock management, and separate execution units for runtime and application code are used to reduce energy consumption. Memory energy is minimized through application-managed on-chip memory and direct physical addressing. A hierarchical on-chip network reduces communication energy, and a codelet-based execution model supports extreme parallelism and fine-grained tasks. We present an initial evaluation of Runnemede that shows the design process for our on-chip network, demonstrates 2-4x improvements in memory energy from explicit control of on-chip memory, and illustrates the impact of hardware-software co-design on the energy consumption of a synthetic aperture radar algorithm on our architecture.

high-performance computer architecture | 2006

InfoShield: a security architecture for protecting information usage in memory

Weidong Shi; Joshua B. Fryman; Guofei Gu; Hsien Hsin S Lee; Youtao Zhang; Jun Yang

Cyber theft is a serious threat to Internet security. It is one of the major security concerns by both network service providers and Internet users. Though sensitive information can be encrypted when stored in non-volatile memory such as hard disks, for many e-commerce and network applications, sensitive information is often stored as plaintext in main memory. Documented and reported exploits facilitate an adversary stealing sensitive information from an applications memory. These exploits include illegitimate memory scan, information theft oriented buffer overflow, invalid pointer manipulation, integer overflow, password stealing Trojans and so forth. Todays computing system and its hardware cannot address these exploits effectively in a coherent way. This paper presents a unified and lightweight solution, called InfoShield that can strengthen application protection against theft of sensitive information such as passwords, encryption keys, and other private data with a minimal performance impact. Unlike prior whole memory encryption and information flow based efforts, InfoShield protects the usage of information. InfoShield ensures that sensitive data are used only as defined by application semantics, preventing misuse of information. Comparing with prior art, InfoShield handles a broader range of information theft scenarios in a unified framework with less overhead. Evaluation using popular network client-server applications shows that InfoShield is sound for practical use and incurs little performance loss because InfoShield only protects absolute, critical sensitive information. Based on the profiling results, only 0.3% of memory accesses and 0.2% of executed codes are affected by InfoShield.

ieee high performance extreme computing conference | 2016

The Open Community Runtime: A runtime system for extreme scale computing

Timothy G. Mattson; Romain Cledat; Vincent Cavé; Vivek Sarkar; Zoran Budimlic; Sanjay Chatterjee; Joshua B. Fryman; Ivan Ganev; Robin Knauerhase; Min Lee; Benoît Meister; Brian R. Nickerson; Nick Pepperling; Bala Seshasayee; Sagnak Tasirlar; Justin Teller; Nick Vrvilo

The Open Community Runtime (OCR) is a new runtime system designed to meet the needs of extreme-scale computing. While there is growing support for the idea that future execution models will be based on dynamic tasks, there is little agreement on what else should be included. OCR minimally adds events for synchronization and relocatable data-blocks for data management to form a complete system that supports a wide range of higher-level programming models. This paper lays out the fundamental concepts behind OCR and compares OCR performance to that from MPI for two simple benchmarks. OCR has been developed within an open community model with features supporting flexible algorithm expression weighed against the expected realities of extreme-scale computing: power-constrained execution, aggressive growth in the number of compute resources, deepening memory hierarchies and a low mean-time between failures.

international symposium on microarchitecture | 2008

POD: A 3D-Integrated Broad-Purpose Acceleration Layer

Dong Hyuk Woo; Hsien-Hsin S. Lee; Joshua B. Fryman; Allan D. Knies; Marsha Eng

To build a future many-core processor, industry must address the challenges of energy consumption and performance scalability. A 3D-integrated broad-purpose accelerator architecture called parallel-on-demand (POD) integrates a specialized SIMD-based die layer on top of a CISC superscalar processor to accelerate a variety of data-parallel applications. It also maintains binary compatibility and facilitates extensibility by virtualizing the acceleration capability.

ACM Transactions on Architecture and Code Optimization | 2010

Chameleon: Virtualizing idle acceleration cores of a heterogeneous multicore processor for caching and prefetching

Dong Hyuk Woo; Joshua B. Fryman; Allan D. Knies; Hsien-Hsin Sean Lee

Heterogeneous multicore processors have emerged as an energy- and area-efficient architectural solution to improving performance for domain-specific applications such as those with a plethora of data-level parallelism. These processors typically contain a large number of small, compute-centric cores for acceleration while keeping one or two high-performance ILP cores on the die to guarantee single-thread performance. Although a major portion of the transistors are occupied by the acceleration cores, these resources will sit idle when running unparallelized legacy codes or the sequential part of an application. To address this underutilization issue, in this article, we introduce Chameleon, a flexible heterogeneous multicore architecture to virtualize these resources for enhancing memory performance when running sequential programs. The Chameleon architecture can dynamically virtualize the idle acceleration cores into a last-level cache, a data prefetcher, or a hybrid between these two techniques. In addition, Chameleon can operate in an adaptive mode that dynamically configures the acceleration cores between the hybrid mode and the prefetch-only mode by monitoring the effectiveness of the Chameleon cache mode. In our evaluation with SPEC2006 benchmark suite, different levels of performance improvements were achieved in different modes for different applications. In the case of the adaptive mode, Chameleon improves the performance of SPECint06 and SPECfp06 by 31% and 15%, on average. When considering only memory-intensive applications, Chameleon improves the system performance by 50% and 26% for SPECint06 and SPECfp06, respectively.

parallel computing | 2017

Traleika Glacier

Vincent Cav; Romain Cldat; Paul Griffin; Ankit More; Bala Seshasayee; Shekhar Borkar; Sanjay Chatterjee; Dave Dunning; Joshua B. Fryman

The Traleika Glacier architecture, targeted at exascale hardware, is proposed.A task-based runtime system, the Open Community Runtime is presented.The experience of co-designing hardware and software for exascale is described. The move from current petascale machines to future exascale machines will need both hardware improvements and software changes. Hardware will need to evolve to focus primarily on features that lower energy consumption: near-threshold voltage operation, fine-grained power and clock management and heterogeneity. Software will also need to evolve and be able to express more parallelism, become more dynamic and adaptable in order to be able to operate on a much more variable hardware.In this paper, we present Traleika Glacier, an effort that seeks to evaluate radical design changes to meet the constraints, both in terms of power and cost, of exascale computing. The salient features of the hardware design presented in the work include a) a use of heterogeneous cores, b) a redesign of the memory system that centers around hierarchical scratchpads and a global address space, c) the hardware acceleration of certain memory and network operations through specialized engines and, d) very fine-grained control and monitoring capabilities. On the software side, we describe a task-based runtime system, the Open Community Runtime (OCR) which aims to express a wide range of higher-level programming models with a very limited set of core concepts: event-driven tasks for computation, events for synchronization and relocatable data-blocks for data management.

Archive | 2012