Benjamin Gojman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Benjamin Gojman is active.

Explore More

Publication

Featured researches published by Benjamin Gojman.

2006 1st International Conference on Nano-Networks and Workshops | 2006

3D Nanowire-Based Programmable Logic

Benjamin Gojman; Raphael Rubin; Concetta Pilotto; André DeHon; Tetsufumi Tanamoto

In nanowire-based logic, the semiconducting material (e.g., Si, GaN, SiGe) is grown into individual nanowires rather than being part of the substrate. This offers us the opportunity to stack multiple layers of nanowires to create a three-dimensional logic structure which has high quality semiconductors in all vertical layers. The authors detail a feasible three-dimensional programmable logic architecture which can plausibly be realized from layers of semiconducting nanowires, making only modest assumptions about the control and placement of individual nanowires in the assembly. This shows a natural path for continuing to scale areal logic density once nanowire pitches approach fundamental limits. The authors show that the three dimensional systems are volumetrically efficient, with the surface area reducing roughly in proportion to the number of vertical layers. The authors further show that, on average, delay is reduced 18% from compact layout in three dimensions. For only a 20% area impact, the authors show how to avoid adding any manufacturing steps to physically isolate portions of nanowire layers

field-programmable technology | 2009

VMATCH: Using logical variation to counteract physical variation in bottom-up, nanoscale systems

Benjamin Gojman; André DeHon

Nanowire building blocks provide a promising path to small feature size and thus the ability to more densely pack logic. However, the small feature size and novel, bottom-up manufacturing process will exhibit extreme variation and produce many devices that operate outside acceptable operating ranges. One-mapping-fits-all, prefabrication assignment of logical functions to physical transistors that exhibit high threshold variation will not work-combining the wide range of physical variation in transistor threshold voltage with the wide range of fanouts in the design produces an unworkably large composite range of possible delays. Nonetheless, by carefully matching the fanout of each net to the physical threshold voltages of devices after fabrication, it is possible to reduce the net range of path delays sufficiently to achieve high system yield. By adding a modest amount of extra resources, we achieve 100% yield for systems built out of devices with 38% variation, the ITRS prediction for threshold variation in 5nm transistors. Moreover, for these systems, we maintain delay, energy and area close to the variation-free nominal case.

IEEE Computer | 2011

Crystals and Snowflakes: Building Computation from Nanowire Crossbars

André DeHon; Benjamin Gojman

Suitable architectures and paradigm shifts in assembly and usage models will make it possible to exploit the compactness and energy benefits of single-nanometer dimension devices and allow extending these structures into the third dimension without depending on top-down lithography to define the smallest feature sizes in a system.

ACM Transactions on Reconfigurable Technology and Systems | 2015

GROK-LAB: Generating Real On-chip Knowledge for Intra-cluster Delays Using Timing Extraction

Benjamin Gojman; Sirisha Nalmela; Nikil Mehta; Nicholas Howarth; André DeHon

Timing Extraction identifies the delay of fine-grained components within an FPGA. From these computed delays, the delay of any path can be calculated. Moreover, a comparison of the fine-grained delays allows a detailed understanding of the amount and type of process variation that exists in the FPGA. To obtain these delays, Timing Extraction measures, using only resources already available in the FPGA, the delay of a small subset of the total paths in the FPGA. We apply Timing Extraction to the Logic Array Block (LAB) on an Altera Cyclone III FPGA to obtain a view of the delay down to near-individual LUT SRAM cell granularity, characterizing components with delays on the order of tens to a few hundred picoseconds with a resolution of ±3.2ps, matching the expected error bounds. This information reveals that the 65nm process used has, on average, random variation of σ μ =4.0% with components having an average maximum spread of 83ps. Timing Extraction also shows that as VDD decreases from 1.2V to 0.9V in a Cyclone IV 60nm FPGA, paths slow down, and variation increases from σ μ =4.3% to σ μ =5.8%, a clear indication that lowering VDD magnifies the impact of random variation.

field-programmable custom computing machines | 2014

GROK-INT: Generating Real On-Chip Knowledge for Interconnect Delays Using Timing Extraction

Benjamin Gojman; André DeHon

With continued scaling, all transistors are no longer created equal. The delay of a length 4 horizontal routing segment at coordinates (23,17) will differ from one at (12,14) in the sameFPGA and from the same segment in another FPGA. The vendor tools give conservative values for these delays, but knowing exactly what these delays are can be invaluable. In this paper, we show how to obtain this information, inexpensively, using only components that already exist on the FPGA (configurable PLLs, registers, logic, and interconnect). The techniques we present are general and can be used to measure the delays of any resource on any FPGA with these components. We provide general algorithms for identifying the set of useful delay components, the set of measurements necessary to compute these delay components, and the calculations necessary to perform the computation. We demonstrate our techniques on the interconnect for an Altera Cyclone III (65nm). As a result, we are able to quantify over a 100 ps spread in delays for nominally identical routing segments on a single FPGA.This paper proposes a partitioning and load-balance scheme for parallelizing state-transition applications on computer clusters. Existing schemes insufficiently balance both the computation of complex state-transition algorithms and the increasing volume of scientific data simultaneously. Apala addresses this problem by introducing the time metric to unify the workloads of computation and data. System profiles in terms of CPU and I/O speeds are considered for accurate workload estimations. Apala consists of two major components: (1) an adaptive decomposition scheme that uses the quad-tree structure to break up workloads and manage data dependencies, (2) a decentralized scheme for distributing workloads across processors. Experimental results from the real-world weather data demonstrate that Apala outperforms other partitioning schemes, and can be readily ported to diverse systems with satisfactory performance.To solve the performance bottleneck of tree-based structure and single point failure issues of data center networks, this thesis proposes an undirected double-loop data center network structure, which makes use of the cyclic graphs excellent properties on the node number and path length to achieve high performance, good scalability and fault-tolerance. Main issues include the method to build the minimum distance diagram of undirected double-loop networks, the algorithm to calculate its diameter, and the step to build optimal undirected double-loop networks, etc. We prove that the diameter of the undirected double-loop network is equal to the height of its tree structure, and propose a rapid algorithm to calculate the diameter, and find that there are lots of optimal undirected double-loop networks in some infinite clusters. Finally, the lower bound proposed by Yebra is verified by experiment. According to these results above, the transmission performance of undirected double-loop data center networks will be optimized.Compression of science data for space missions is an established technique whereas data compression of housekeeping telemetry is rare. This paper questions this state of affairs and we investigate the potential advantages and disadvantages of the latter. Using real data from the ROSETTA spacecraft we describe a set of experiments demonstrating that massive compression of housekeeping data can be achieved using standard off the shelf products. We then use these experiments to describe a pre-processing technique that enables a very simple algorithm to obtain similar performances, thus showing that implementation can be relatively straightforward. On average, compression factors often were achievable using standard software and compression factors of seven using very simple software. We then address the issue of risk and demonstrate how a combination of compression and mitigation techniques can be used to reduce mission risk rather than increase it. Given these results the paper attempts to shift the focus on each new mission to ask itself “How can we afford not to compress our housekeeping data?”

field programmable gate arrays | 2016

Pitfalls and Tradeoffs in Simultaneous, On-Chip FPGA Delay Measurement

Timothy A. Linscott; Benjamin Gojman; Raphael Rubin; André DeHon

Recent work shows how to use on-chip structures to measure the fabricated delays of fine-grained resources on modern FPGAs. We show that simultaneous measurement of multiple, disjoint paths will result in different measured delays from isolated configurations that measure a single path. On the Cyclone III, we show differences as large as +/-33ps on 2ns-long paths, even if the simultaneously configured logic is not active. This is over 20x the measurement precision used on these devices and over 50% of the observed delay spread in prior work. We characterize the magnitude of the impact of simultaneous measurements and identify strategies and cases that can reduce the difference. Furthermore, we provide a potential explanation for our observations in terms of self-heating and the configurable clock network architecture. These experiments point to phenomena that must be characterized to better formulate on-chip FPGA delay measurements and to properly interpret their results.

Low-Power Variation-Tolerant Design in Nanometer Silicon | 2011

Component-Specific Mapping for Low-Power Operation in the Presence of Variation and Aging

Benjamin Gojman; Nikil Mehta; Raphael Rubin; André DeHon

Traditional solutions to variation and aging cost energy. Adding static margins to tolerate high device variance and potential device degradation prevent aggressive voltage scaling to reduce energy. Post-fabrication configuration, as we have in FPGAs, provides an opportunity to avoid the high costs of static margins. Rather than assuming worst-case device characteristics, we can deploy devices based on their fabricated or aged characteristics. This allows us to place the high-speed/leaky devices as needed on critical paths and slower/less-leaky devices on non-critical paths. As a result, it becomes possible to meet system timing requirements at lower voltages than conservative margins. To exploit this post-fabrication configurability, we must customize the assignment of logical functions to resources based on the resource characteristics of a particular component after it has been fabricated and the resource characteristics have been determined—that is, component-specific mapping. When we perform this component-specific mapping, we can accommodate extremely high defect rates (e.g., 10%), high variation (e.g., \(\sigma_{V_{t}}=38\)%), as well as lifetime aging effects with low overhead. As the magnitude of aging effects increase, the mapping of functions to resources becomes an adaptive process that is continually refined in-system, throughout the lifetime of the component.

field programmable custom computing machines | 2016

Continuous Online Self-Monitoring Introspection Circuitry for Timing Repair by Incremental Partial-Reconfiguration (COSMIC TRIP)

Hans Giesen; Benjamin Gojman; Raphael Rubin; Ji Kim; André DeHon

We show that continuously monitoring on-chip delays at the LUT-to-LUT link level during operation allows an FPGA to detect and self-adapt to aging and environmental effects on timing. Using a lightweight (

field programmable gate arrays | 2017

Quality-Time Tradeoffs in Component-Specific Mapping: How to Train Your Dynamically Reconfigurable Array of Gates with Outrageous Network-delays

Hans Giesen; Raphael Rubin; Benjamin Gojman; André DeHon

How should we perform component-specific adaptation for FPGAs? Prior work has demonstrated that the negative effects of variation can be largely mitigated using complete knowledge of device characteristics and full per-FPGA CAD flow. However, the cost of per-FPGA characterization and mapping could be prohibitively expensive. We explore light-weight options for per-FPGA mapping that avoid the need for a priori device characterization and perform less expensive per FPGA customization work. We characterize the tradeoff between Quality-of-Results (energy, delay) and per-device mapping costs for 7 design points ranging from complete mapping based on knowledge to no per-device mapping. We show that it is possible to get 48-77% of the component-specific mapping delay benefit or 57% of the energy benefit with a mapping that takes less than 20 seconds per FPGA. An incremental solution can start execution after a 21 ms bitstream load and converge to 77% delay benefit after 18 seconds of runtime.

IEEE Design & Test of Computers | 2017

Self-Adaptive Timing Repair

Hans Giesen; Raphael Rubin; Benjamin Gojman; André DeHon

<italic>Editor’s note:</italic> This article describes a method to continuously monitor paths delays in an operational FPGA design and to improve slow paths by incremental partial reconfiguration. Since online delay measuring is more accurate than design time estimation, this approach allows to balance delays which can be used to improve performance or reduce power consumption. In addition, it counteracts aging effects and prolongs the system’s useful life time. <italic>—Axel Jantsch, TU Wien</italic>

Explore More