Is this you? Create Your Porfile

Matthew French

University of Southern California

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Matthew French is active.

Explore More

Publication

Featured researches published by Matthew French.

field programmable gate arrays | 2011

Torc: towards an open-source tool flow

Neil Steiner; Aaron Wood; Hamid Shojaei; Jacob Couch; Peter M. Athanas; Matthew French

We present and describe Torc - (Tools for Open Reconfigurable Computing) - an open-source infrastructure and tool set, provided entirely as C++ source code and available at http://torc.isi.edu. Torc is suitable for custom research applications, for CAD tool development, and for architecture exploration. The Torc infrastructure can (1) read, write, and manipulate generic netlists - currently EDIF, (2) read, write, and manipulate physical netlists - currently XDL, and indirectly NCD, (3) provide exhaustive wiring and logic information for commercial devices, and (4) read, write, and manipulate bitstream packets (but not configuration frame contents). Torc furthermore provides routing and unpacking tools for full or partial designs, soon to be augmented with BLIF support, and with packing and placing tools. The architectural data for Xilinx devices is generated from non-proprietary XDLRC files, and currently supports 140 devices in 11 families: Virtex, Virtex-E, Virtex-II, Virtex-II Pro, Virtex4, Virtex5, Virtex6, Virtex6L, Spartan3E, Spartan6, and Spartan6L. We believe that Altera architectures and designs could be similarly supported if the necessary data were available, and we have successfully used Torc internally with custom architectures.

international symposium on computer architecture | 2003

A performance analysis of PIM, stream processing, and tiled processing on memory-intensive signal processing kernels

Jinwoo Suh; Eun-Gyu Kim; Stephen P. Crago; Lakshmi Srinivasan; Matthew French

Trends in microprocessors of increasing die size and clock speed and decreasing feature sizes have fueled rapidly increasing performance. However, the limited improvements in DRAM latency and bandwidth and diminishing returns of increasing superscalar ILP and cache sizes have led to the proposal of new microprocessor architectures that implement processor-in- memory, stream processing, and tiled processing. Each architecture is typically evaluated separately and compared to a baseline architecture. In this paper, we evaluate the performance of processors that implement these architectures on a common set of signal processing kernels.The implementation results are compared with the measured performance of a conventional system based on the PowerPC with Altivec. The results show that these new processors show significant improvements over conventional systems and that each architecture has its own strengths and weaknesses.

Eurasip Journal on Embedded Systems | 2006

FPGA dynamic power minimization through placement and routing constraints

Li Wang; Matthew French; Azadeh Davoodi; Deepak Agarwal

Field-programmable gate arrays (FPGAs) are pervasive in embedded systems requiring low-power utilization. A novel power optimization methodology for reducing the dynamic power consumed by the routing of FPGA circuits by modifying the constraints applied to existing commercial tool sets is presented. The power optimization techniques influence commercial FPGA Place and Route (PAR) tools by translating power goals into standard throughput and placement-based constraints. The Low-Power Intelligent Tool Environment (LITE) is presented, which was developed to support the experimentation of power models and power optimization algorithms. The generated constraints seek to implement one of four power optimization approaches: slack minimization, clock tree paring, N-terminal net colocation, and area minimization. In an experimental study, we optimize dynamic power of circuits mapped into 0.12 μ m Xilinx Virtex-II FPGAs. Results show that several optimization algorithms can be combined on a single design, and power is reduced by up to 19.4%, with an average power savings of 10.2%.

field programmable custom computing machines | 2008

Autonomous System on a Chip Adaptation through Partial Runtime Reconfiguration

Matthew French; Erik K. Anderson; Dong-In Kang

This paper presents a prototype autonomous signal processing system on a chip. The system is architected such that high performance digital signal processing occurs in the FPGA¿s configurable logic, while resulting higher level data products are monitored by cognitive algorithms residing on an embedded processor. The cognitive algorithms develop situational awareness about the platform¿s environment, and use this information to modify and tune signal processing in real-time using active partial reconfiguration. This system was realized on a Xilinx Virtex4 FX 100 device on a pulse parameter measurement application utilizing a Bayesian Network cognitive algorithm. Changes in the RF environment were correctly detected 96.7% of the time and mitigation filters which resulted in at least 3dB interference rejection improvement were instanced 81% of the time. This system realizes a 71.4× reduction in size compared to static implementations and a 26-43× reduction in reaction times compared to human in the loop systems.

field-programmable custom computing machines | 2011

Checkpoint/Restart and Beyond: Resilient High Performance Computing with FPGAs

Andrew G. Schmidt; Bin Huang; Ron Sass; Matthew French

As FPGA resources continue to increase, FPGAs present attractive features to the High Performance Computing community. These include the power-efficient computation and application-specific acceleration benefits, as well as tighter integration between compute and I/O resources. This paper considers the ability of an FPGA to address another, increasingly important, feature -- resiliency. Specifically, a minimally-invasive monitoring infrastructure operating over a sideband network is presented. This includes a multi-chip protocol, IP cores that implement the protocol, and a tool to instrument existing hardware accelerator FPGA designs. To demonstrate the functionality, the system has been implemented on a cluster of FPGA devices running off-the-shelf MPI and Linux. We demonstrate the ability to do integrated software and hardware accelerator check pointing with restart under a variety of injected faults.

Scientific Reports | 2015

Experimental quantum annealing: case study involving the graph isomorphism problem

Kenneth M. Zick; Omar Shehab; Matthew French

Quantum annealing is a proposed combinatorial optimization technique meant to exploit quantum mechanical effects such as tunneling and entanglement. Real-world quantum annealing-based solvers require a combination of annealing and classical pre- and post-processing; at this early stage, little is known about how to partition and optimize the processing. This article presents an experimental case study of quantum annealing and some of the factors involved in real-world solvers, using a 504-qubit D-Wave Two machine and the graph isomorphism problem. To illustrate the role of classical pre-processing, a compact Hamiltonian is presented that enables a reduced Ising model for each problem instance. On random N-vertex graphs, the median number of variables is reduced from N2 to fewer than N log2 N and solvable graph sizes increase from N = 5 to N = 13. Additionally, error correction via classical post-processing majority voting is evaluated. While the solution times are not competitive with classical approaches to graph isomorphism, the enhanced solver ultimately classified correctly every problem that was mapped to the processor and demonstrated clear advantages over the baseline approach. The results shed some light on the nature of real-world quantum annealing and the associated hybrid classical-quantum solvers.

ACM Journal on Emerging Technologies in Computing Systems | 2016

Real-Time Anomaly Detection Framework for Many-Core Router through Machine-Learning Techniques

Amey M. Kulkarni; Youngok Pino; Matthew French; Tinoosh Mohsenin

In this article, we propose a real-time anomaly detection framework for an NoC-based many-core architecture. We assume that processing cores and memories are safe and anomaly is included through a communication medium (i.e., router). The article targets three different attacks, namely, traffic diversion, route looping, and core address spoofing attacks. The attacks are detected by using machine-learning techniques. Comprehensive analysis on machine-learning algorithms suggests that Support Vector Machine (SVM) and K-Nearest Neighbor (K-NN) have better attack detection efficiency. It has been observed that both algorithms have accuracy in the range of 94% to 97%. Additional hardware complexity analysis advocates SVM to be implemented on hardware. To test the framework, we implement a condition-based attack insertion module; attacks are performed intra- and intercluster. The proposed real-time anomaly detection framework is fully placed and routed on Xilinx Virtex-7 FPGA. Postplace and -route implementation results show that SVM has 12% to 2% area overhead and 3% to 1% power overhead for the quad-core and 16-core implementation, respectively. It is also observed that it takes 25% to 18% of the total execution time to detect an anomaly in transferred packets for quad-core and 16-core, respectively. The proposed framework achieves 65% reduction in area overhead and is 3 times faster compared to previous published work.

field-programmable custom computing machines | 2013

Open-Source Bitstream Generation

Ritesh Soni; Neil Steiner; Matthew French

This work presents an open-source bitstream generation tool for Torc. Bitstream generation has traditionally been the single part of the FPGA design flow that could not be openly reproduced, but our novel approach enables this without reverse-engineering or violating End-User License Agreement terms. We begin by creating a library of “micro-bitstreams” which constitute a collection of primitives at a granularity of our choosing. These primitives can then be combined to create larger designs, or portions thereof, with simple merging operations. Our effort is motivated by a desire to resume earlier work on embedded bitstream generation and autonomous hardware. This is not feasible with Xilinx bitgen because there is no reasonable way to run an x86 binary with complex library and data dependencies on most embedded systems. Initial support is limited to the Virtex5, but we intend to extend this to other Xilinx architectures. We are able to support nearly all routing resources in the device, as well as the most common logic resources.

ieee aerospace conference | 2011

Software fault tolerance methodology and testing for the embedded PowerPC

Mark Bucciero; John Paul Walters; Matthew French

In this paper we describe our software-based fault tolerance strategies for PowerPC devices embedded within Xilinx Virtex 4 FX60 FPGAs. Traditional FPGA fault tolerance techniques, such as scrubbing and TMR, cannot be applied to the embedded PowerPC. Our work targets scientific applications operating on space-based FPGA architectures consisting of an FPGA and a radiation-hardened controller. We use heartbeat monitoring, control flow assertions, and checkpoint/rollback to achieve high performance and low overhead fault tolerance. Our initial results show we are able to add our fault tolerance strategies with only 2% application overhead while recovering from 94% of the faults injected during testing.12

reconfigurable computing and fpgas | 2012

Redsharc: a programming model and on-chip network for multi-core systems on a programmable chip

William V. Kritikos; Andrew G. Schmidt; Ron Sass; Erik K. Anderson; Matthew French

The reconfigurable data-stream hardware software architecture (Redsharc) is a programming model and network-on-a-chip solution designed to scale tomeet the performance needs ofmulti-core Systems on a programmable chip (MCSoPC). Redsharc uses an abstract API that allows programmers to develop systems of simultaneously executing kernels, in software and/or hardware, that communicate over a seamless interface. Redsharc incorporates two on-chip networks that directly implement the API to support high-performance systems with numerous hardware kernels. This paper documents the API, describes the common infrastructure, and quantifies the performance of a complete implementation. Furthermore, the overhead, in terms of resource utilization, is reported along with the ability to integrate hard and soft processor cores with purely hardware kernels being demonstrated.

Explore More