Graham Schelle
University of Colorado Boulder
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Graham Schelle.
high-performance computer architecture | 2006
David A. Penry; Dan Fay; David Hodgdon; Ryan Wells; Graham Schelle; David I. August; Daniel A. Connors
Simulation is an important means of evaluating new microarchitectures. Current trends toward chip multiprocessors (CMPs) try the ability of designers to develop efficient simulators. CMP simulation speed can be improved by exploiting parallelism in the CMP simulation model. This may be done by either running the simulation on multiple processors or by integrating multiple processors into the simulation to replace simulated processors. Doing so usually requires tedious manual parallelization or re-design to encapsulate processors. This paper presents techniques to perform automated simulator parallelization and hardware integration for CMP structural models. We show that automated parallelization can achieve an 7.60 speedup for a 16-processor CMP model on a conventional 4-processor shared-memory multiprocessor. We demonstrate the power of hardware integration by integrating eight hardware PowerPC cores into a CMP model, achieving a speedup of up to 5.82.
design automation conference | 2004
Chidamber R. Kulkarni; Gordon J. Brebner; Graham Schelle
A domain specific language (DSL) enables designers to rapidly specify and implement systems for a particular domain, yielding designs that are easy to understand, reason about, re-use and maintain. However, there is usually a significant overhead in the required infrastructure to map such a DSL on to a programmable logic device. In this paper, we present a mapping of an existing DSL for the networking domain on to a platform FPGA by embedding the DSL into an existing language infrastructure. In particular, we will show that, using few basic concepts, we are able to achieve a successful mapping of the DSL on to a platform FPGA and create a re-usable structure that also makes it easy to extend the DSL. Finally we will present some results of mapping the DSL on to a platform FPGA and comment on the resulting overhead.
field-programmable logic and applications | 2008
Graham Schelle; Dirk Grunwald
The network on chip will become a future general purpose interconnect for FPGAs much like todaypsilas standard OPB or PLB bus architectures. However, performance characteristics and reconfigurable logic resource utilization of different network on chip architectures vary greatly relative to bus architectures. Current mainstream FPGA parts only support very small network on chip topologies, due to the high resource utilization of virtual channel based implementations. This observation is reflected in related research where only modest 2times2 or 2times3 networks are demonstrated on FPGAs. Naively it would be assumed that these complex network on chip architectures would perform better than simplified implementations. We show this assumption to be incorrect under light network loading conditions across 3 separate application domains. Using statistical based network loading, a synthetic benchmarking application, a cryptographic accelerator, and a 802.11 transmitter are each demonstrated across network on chip architectures. From these experiments, it can be seen that network on chips with complex routing and switching functionality are still useful under high network loading conditions. Additionally, it is also shown for our network on chip implementations, a simple solution that uses 4-5times less logic resources can provide better network performance under certain conditions.
international wireless internet conference | 2008
Aveek Dutta; Jeffrey Fifield; Graham Schelle; Dirk Grunwald; Douglas C. Sicker
In this paper, we present an intelligent physical layer for cognitive mesh networks. It is well recognized that wireless mesh networks suffer from the inherent property of per hop delay attributed to store and forward routing and channel contention. We show that an intelligent physical layer coupled with efficient traffic engineering and channel allocation mechanism will reduce latency. In this paper, we discuss the evolution of an OFDM receiver, with sufficient software control to aid reconfigurability, capable of receiving and decoding information on different set of subcarriers, and also capable of switching the incoming signals to a different part of the available spectrum on the fly. Equipped with this enhanced receiver we propose a mechanism for wireless worm-hole routing, which employs frequency domain switching between subchannels where each subchannel is defined by a set of subcarriers. The OFDM receiver handles three primitives: transmit, receive and relay rather than just transmit or receive. Instead of a contention based, store and forward routing, a relay oriented physical layer has been proposed to reduce latency. The processing pipeline at an intermediate node no longer involves higher layer processing, and the hardware relays the incoming signal on-the-fly to a different part of the spectrum allowing for a full duplex transmission as the transmitter can relay signals while it is receiving on a different subchannel.
field-programmable logic and applications | 2007
Graham Schelle; J. Fifield; D. Griinwald
Network on chips are becoming a common onchip interconnect for both FPGA and mainstream processor designs. At the same time, software defined radios (SDR) are a new application field that is gaining much attention. As SDR tasks are mapped onto network on chip architectures, the typically streaming nature of samples will stress the NoC itself and possibly hurt the performance of other applications using that NoC. In this paper, we present the results of our partitioning and placement of a SDR transmitter onto a NoC architecture using an FPGA. We use a 802.11a transmitter example partitioned across a NoC and compare it to a handcrafted design. Additionally, various placement schemes, runtime architecture loads and NoC access methods are examined to determine the feasibility of this application and architecture combination.
field programmable gate arrays | 2005
Graham Schelle; Dirk Grunwald
For several years now, modern FPGAs have included onchip network related hard cores. These cores include Xilinxs RocketIO and Alteras RapidIO serial transceivers. However, to use these cores in a complete networking application may be a daunting task to a non-networking expert. In addition to the complicated use of these components, the high performance needs of modern networking applications require designs that are optimized for low latency and a moderately high clock rate. Therefore to meet these challenges, we present CUSP (Click Utilizing Speculation and Parallelism)for reconfigurable hardware platforms.Click is an accepted software network router framework that is similar to CUSP, but specifically built for a Linux platform and software network routers. CUSP, while also having a modular design of reusable components, additionally provides automated speculation and parallelism to gain better performance on FPGAs. An accompanying scripting language allows quick creation of these routers from a body of existing components. We have implemented an example network application through the CUSP design flow and its performance will be compared against alternative network design methods.
field-programmable custom computing machines | 2007
Graham Schelle; Dirk Grunwald
Mainstream processor architectures and field programmable custom computing machines (FCCMs) are colliding towards a heterogeneous system on chip architecture. This is apparent from Intel and AMD efforts to create new chip architectures with various processing cores focusing on DSP, networking, and graphics. From the embedded processor research, system-on-chips connected by network on chips have allowed scalable architectures with a variety of processing cores connected by an onchip network. In this paper we examine several scheduling and allocation policies that can be utilized across network on chip architectures regardless of the processing cores onchip. By abstracting characteristics of the processing cores with various scheduling data structures, any heterogeneous system on a chip can be allocated and scheduled dynamically.
field-programmable logic and applications | 2004
Graham Schelle; Dirk Grunwald
Speculation and parallel processing can provide performance gains in many diverse applications. Compilers, grid computing, DSP, and bio-informatics are a representation of such areas where these concepts are utilized. In the field of network routers, packet processing can also use such speedups. As the line rate of packets increases with every new standard (Infiniband, 10-gigabit Ethernet), these speedups will become paramount for routers asked to do complicated tasks while still maintaining line speeds.
hot topics in operating systems | 2003
Marco Gruteser; Graham Schelle; Ashish Jain; Richard Han; Dirk Grunwald
Archive | 2007
Dirk Grunwald; Graham Schelle