Lesley Shannon | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lesley Shannon is active.

Explore More

Publication

Featured researches published by Lesley Shannon.

field-programmable custom computing machines | 2010

Odin II - An Open-Source Verilog HDL Synthesis Tool for CAD Research

Peter Jamieson; Kenneth B. Kent; Farnaz Gharibian; Lesley Shannon

In this work, we present Odin II, a framework for Verilog Hardware Description Language (HDL) synthesis that allows researchers to investigate approaches/improvements to different phases of HDL elaboration that have not been previously possible. Odin II’s output can be fed into traditional back-end flows for both FPGAs and ASICs so that these improvements can be better quantified. Whereas the original Odin [1] provided an open source synthesis tool, Odin II’s synthesis framework offers significant improvements such as a unified environment for both front-end parsing and netlist flattening. Odin II also interfaces directly with VPR [2], a common academic FPGA CAD flow, allowing an architectural description of a target FPGA as an input to enable identification and mapping of design features to custom features. Furthermore, Odin II can also read the netlists from downstream CAD stages into its netlist data-structure to facilitate analysis. Odin II can be used for a wide range of experiments; in this paper, we show three specific instances of how Odin II can be used by ASIC and FPGA researchers for more than basic synthesis. Odin II is open source and released under the MIT License.

field programmable gate arrays | 2004

Using reconfigurability to achieve real-time profiling for hardware/software codesign

Lesley Shannon; Paul Chow

Embedded systems combine a processor with dedicated logic to meet design specifications at a reasonable cost. The attempt to amalgamate two distinct design environments introduces many problems, one being how to partition a single design for the two platforms to achieve the best performance with the least effort. Since the latest FPGA technology allows the integration of soft or hard CPU cores with dedicated logic on a single chip, this presents new opportunities for addressing hardware/software codesign issues in the FPGA design process by utilizing the reconfigurable environment.This paper introduces SnoopP, a non-intrusive, real time, profiling tool. The user is able to obtain a clock cycle accurate profile of the real time performance of a software program running on a soft-core processor instantiated on an FPGA. SnoopP is an essential tool for hardware/software codesign on a reconfigurable platform. It allows the user to quickly obtain accurate profiling information that may greatly influence the partitioning of the design.

field-programmable custom computing machines | 2011

FUSE: Front-End User Framework for O/S Abstraction of Hardware Accelerators

Aws Ismail; Lesley Shannon

SoCs can be implemented on a single FPGA, offering designers a unique opportunity for Embedded Systems. Instead of defining a fixed architecture early in the design process, the reconfigurable platform allows architectural redesign to meet the systems specific needs. However, the ability to instantiate new modules in the reconfigurable hardware provides a unique set of challenges for integration, particularly to the software (SW) designer. Specifically, the Operating System (OS) cannot automatically abstract these platform changes without redesign. In this paper, we present FUSE, a framework for HW accelerator abstraction that provides: 1) transparency to the SW designer at the application level, and 2) OS support for easy HW accelerator integration. We illustrate FUSE as an API for an embedded Linux OS with POSIX threads on Xilinxs Micro Blaze on a Virtex5. For three different applications and HW accelerators, we achieve performance speedups ranging from 6.4-37x.

international symposium on microarchitecture | 2012

Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy

Snehasish Kumar; Hongzhou Zhao; Arrvindh Shriraman; Eric Matthews; Sandhya Dwarkadas; Lesley Shannon

The fixed geometries of current cache designs do not adapt to the working set requirements of modern applications, causing significant inefficiency. The short block lifetimes and moderate spatial locality exhibited by many applications result in only a few words in the block being touched prior to eviction. Unused words occupy between 17 -- 80% of a 64K L1 cache and between 1% -- 79% of a 1MB private LLC. This effectively shrinks the cache size, increases miss rate, and wastes on-chip bandwidth. Scaling limitations of wires mean that unused-word transfers comprise a large fraction (11%) of on-chip cache hierarchy energy consumption. We propose Amoeba-Cache, a design that supports a variable number of cache blocks, each of a different granularity. Amoeba-Cache employs a novel organization that completely eliminates the tag array, treating the storage array as uniform and morph able between tags and data. This enables the cache to harvest space from unused words in blocks for additional tag storage, thereby supporting a variable number of tags (and correspondingly, blocks). Amoeba-Cache adjusts individual cache line granularities according to the spatial locality in the application. It adapts to the appropriate granularity both for different data objects in an application as well as for different phases of access to the same data. Overall, compared to a fixed granularity cache, the Amoeba-Cache reduces miss rate on average (geometric mean) by 18% at the L1 level and by 18% at the L2 level and reduces L1 -- L2 miss bandwidth by ?46%. Correspondingly, Amoeba-Cache reduces on-chip memory hierarchy energy by as much as 36% (mcf) and improves performance by as much as 50% (art).

field-programmable technology | 2004

Maximizing system performance: using reconfigurability to monitor system communications

Lesley Shannon; Paul Chow

Commercial FPGA companies now provide tools that allow users to implement designs comprising soft-core processors and modules of dedicated logic. If a designer chooses to partition a system into multiple processors and hardware modules, tools and techniques for design analysis are necessary to understand system performance. This work introduces WOoDSTOCK, a tool that profiles system performance by adding monitors to the circuit running in real time on the chip. The user is able to generate a system specific profiler tailored to monitor the communication links between the different computing elements. This provides a macroscopic picture of system performance, which highlights the computing elements that cause bottlenecks in the design.

field programmable logic and applications | 2012

Polyblaze: From one to many bringing the microblaze into the multicore era with Linux SMP support

Eric Matthews; Lesley Shannon; Alexandra Fedorova

Modern computing systems increasingly consist of multiple processor cores. From cell phones to datacenters, multicore computing has become the standard. At the same time, our understanding of the performance impact resource sharing has on these platforms is limited, and therefore, prevents these systems from being fully utilized. As the capacity of FPGAs has grown, they have become a viable method for emulating architecture designs as they offer increased performance and visibility into runtime behaviour compared to simulation. With future systems trending towards asymmetric and heterogeneous systems, and thus further increasing complexity, a framework that enables research in this area is highly desirable. In this work, we present PolyBlaze: a multicore Micro- Blaze based system with Linux Symmetric Multi-Processor (SMP) support on an FPGA. Starting with a single-core, Linux supported, MicroBlaze we detail the changes to the platform, both in hardware and software, required to bring Linux SMP support to the MicroBlaze. We then outline the series of tests performed on our platform to demonstrate both its stability (e.g. more than two weeks of up time) and scalability (up to eight cores on an FPGA, with resource usage increasing linearly with the number of cores).

Journal of Bionic Engineering | 2014

Abigaille-III: A Versatile, Bioinspired Hexapod for Scaling Smooth Vertical Surfaces

Michael Henrey; Ausama Ahmed; Paolo Boscariol; Lesley Shannon; Carlo Menon

This paper presents a novel, legged robot, Abigaille-III, which is a hexapod actuated by 24 miniature gear motors. This robot uses dual-layer dry adhesives to climb smooth, vertical surfaces. Because dry adhesives are passive and stick to various surfaces, they have advantages over mechanisms such as suction, claws and magnets. The mechanical design and posture of Abigaille-III were optimized to reduce pitchback forces during vertical climbing. The robot’s electronics were designed around a Field Programmable Gate Array, producing a versatile computing architecture. The robot was reconfigured for vertical climbing with both 5 and 6 legs, and with 3 or 4 motors per leg, without changes to the electronic hardware. Abigaille-III demonstrated dexterity through vertical climbing on uneven surfaces, and by transferring between horizontal and vertical surfaces. In endurance tests, Abigaille-III completed nearly 4 hours of continuous climbing and over 7 hours of loitering, showing that dry adhesive climbing systems can be used for extended missions.

IEEE Transactions on Very Large Scale Integration Systems | 2007

Routability of Network Topologies in FPGAs

Manuel Saldaña; Lesley Shannon; Jia Shuo Yue; Sikang Bian; John Craig; Paul Chow

A fundamental difference between application-specific integrated circuits (ASICs) and field-programmable gate arrays (FPGAs) is that the wires in ASICs are designed to match the requirements of a particular design. Conversely, in an FPGA, the area is fixed and the routing resources exist whether or not they are used. In this paper, we investigate how well several common network topologies map onto a modern FPGA routing fabric. Different multiprocessor network topologies with between 8 and 64 nodes are mapped to a single large FPGA. Except for the fully-connected networks, it is observed that the difference in logic resources used and routing overhead among these topologies is insignificant for the systems tested. Fully-connected networks up to about 22 nodes are also feasible on the same FPGA although the logic and routing utilization clearly grows much faster. The conclusion is that a modern FPGA fabric is very rich in resources and capable of supporting highly interconnected topologies. For systems with a modest number of nodes implemented on current large FPGAs, it is not necessary to use the connectivity-limited topologies typically used for networks-on-chip. Rather, direct point-to-point connections between all communicating nodes can be considered.

system-level interconnect prediction | 2006

The routability of multiprocessor network topologies in FPGAs

Manuel Saldaña; Lesley Shannon; Paul Chow

A fundamental difference between ASICs and FPGAs is that wires in ASICs are designed such that they match the requirements of a particular design. Wire parameters such as length, width, layout and the number of wires can be varied to implement a desired circuit. Conversely, in an FPGA, area is fixed and routing resources exist whether or not they are used, so the goal becomes implementing a circuit within the limits of available resources. The architecture for existing routing structures in FPGAs has evolved over time to suit the requirements of large, localized digital circuits. However, FPGAs now have the capacity to implement networks of such circuits, and system-level interconnection becomes a key element of the design process.Following a standard design flow and using commercial tools, we investigate how this fundamental difference in resource usage affects the mapping of various network topologies to a modern FPGA routing structure. By exploring the routability of different multiprocessor network topologies with 8, 16 and 32 nodes on a single FPGA, we show that the difference between resource utilization of a ring, star, hypercube and mesh topologies is not significant up to 32 nodes. We also show that a fully-connected network can be implemented with at least 16 nodes, but with 32 nodes it exceeds the routing resources available on the FPGA. We also derive a cost metric that helps to estimate the impact of the topology selection based on the number of nodes.

field-programmable custom computing machines | 2005

Simplifying the integration of processing elements in computing systems using a programmable controller

Lesley Shannon; Paul Chow

As technology sizes decrease and die area increases, designers are creating increasingly complex computing systems using FPGAs. To reduce design time for new products, the reuse of previously designed intellectual property (IP) cores is essential. However, since no universally accepted interface standards exist for IP cores, there is often a certain amount of redesign necessary before they are incorporated into the new system. Furthermore, the cores functionality may need updating to support the requirements of the new application. This paper demonstrates how the SIMPPL system model allows designers to rapidly implement on-chip systems comprising multiple computing elements (CEs). Furthermore, using a controller-based interface to manage inter-CE transfers enables users to easily adapt the control sequence of individual CEs to suit the needs of new applications without necessitating the redesign of other elements in the system. Two systems using three different hardware modules adapted to CEs are described to illustrate the power and simplicity of the SIMPPL model. It required a total of six hours to implement both designs on-chip once the individual CEs had been designed.

Explore More