Joshua M. Levine | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Joshua M. Levine is active.

Explore More

Publication

Featured researches published by Joshua M. Levine.

field-programmable custom computing machines | 2012

Online Measurement of Timing in Circuits: For Health Monitoring and Dynamic Voltage & Frequency Scaling

Joshua M. Levine; Edward A. Stott; George A. Constantinides; Peter Y. K. Cheung

Reliability, power consumption and timing performance are key considerations for the utilisation of field-programmable gate arrays. Online measurement techniques can determine the timing characteristics of an FPGA application while it is operating, and facilitate a range of benefits. Degradation can be monitored by tracking changes in timing performance, while power consumption can be reduced through dynamic voltage scaling (DVS) of the power supply to exploit any spare timing headroom. If higher performance is the objective, dynamic frequency scaling (DFS) can be used to maximise operating frequency. In both cases, online timing measurement of the application circuit is used to exploit favourable operating conditions. This work demonstrates a method of online measurement, achieved by sweeping the phase of a secondary clock signal, driving additional shadowing registers strategically added to the application design. The measurement technique and initial voltage and frequency scaling experiments are demonstrated on an Alter a Cyclone III FPGA. Timing performance can be measured with a best case resolution of 96ps. The additional circuitry results in minimal overhead in terms of area and performance. Power savings of 23% dynamic and 13% static in an example circuit are achieved through DVS, or performance improvements of 21% through DFS, when compared with operating at nominal core voltage, or timing model FMax.

field programmable gate arrays | 2014

Dynamic voltage & frequency scaling with online slack measurement

Joshua M. Levine; Edward A. Stott; Peter Y. K. Cheung

Timing margins in FPGAs are already significant and as process scaling continues they will have to grow to guarantee operation under increased variation. Margins enforce worst-case operation even in typical conditions and result in devices operating more slowly and consuming more energy than necessary. This paper presents a method of dynamic voltage and frequency scaling that uses online slack measurement to determine timing headroom in a circuit while it is operating and scale the voltage and/or frequency in response. Doing so can significantly reduce power consumption or increase throughput with a minimal overhead. The method is demonstrated on a number of benchmark circuits under a range of operating conditions, constraints and optimisation targets.

power and timing modeling optimization and simulation | 2015

Adaptive energy minimization of embedded heterogeneous systems using regression-based learning

Sheng Yang; Rishad Ahmed Shafik; Edward A. Stott; Joshua M. Levine; James J. Davis; Bashir M. Al-Hashimi

Modern embedded systems consist of heterogeneous computing resources with diverse energy and performance trade-offs. This is because these resources exercise the application tasks differently, generating varying workloads and energy consumption. As a result, minimizing energy consumption in these systems is challenging as continuous adaptation between application task mapping (i.e. allocating tasks among the computing resources) and dynamic voltage/frequency scaling (DVFS) is required. Existing approaches have limitations due to lack of such adaptation with practical validation (Table I). This paper addresses such limitation and proposes a novel adaptive energy minimization approach for embedded heterogeneous systems. Fundamental to this approach is a runtime model, generated through regression-based learning of energy/performance trade-offs between different computing resources in the system. Using this model, an application task is suitably mapped on a computing resource during runtime, ensuring minimum energy consumption for a given application performance requirement. Such mapping is also coupled with a DVFS control to adapt to performance and workload variations. The proposed approach is designed, engineered and validated on a Zynq-ZC702 platform, consisting of CPU, DSP and FPGA cores. Using several image processing applications as case studies, it was demonstrated that our proposed approach can achieve significant energy savings (>70%), when compared to the existing approaches.

field-programmable logic and applications | 2013

SMI: Slack Measurement Insertion for online timing monitoring in FPGAs

Joshua M. Levine; Edward A. Stott; George A. Constantinides; Peter Y. K. Cheung

Shadow registers, driven by a variable-phase clock, can be used to extract useful timing information from a circuit during operation. This paper presents Slack Measurement Insertion (SMI), an automated tool flow for inserting shadow registers into an FPGA design to enable measurement of timing slack. The flow provides a parameterised level of circuit coverage and results in minimal timing and area overheads. We demonstrate the process through its application to three complex benchmark designs.

IEEE Design & Test of Computers | 2013

Variation and Reliability in FPGAs

Edward A. Stott; Zhenyu Guan; Joshua M. Levine; Justin S. J. Wong; Peter Y. K. Cheung

This paper focuses on variability and reliability issues for FPGAs. The paper shows how these issues can be effectively addressed using one of the most powerful features of FPGAs: their ability and flexibility to be reconfigured. The paper also presents techniques for characterizing variability and degradation in these systems.

field-programmable custom computing machines | 2014

Timing Fault Detection in FPGA-Based Circuits

Edward A. Stott; Joshua M. Levine; Peter Y. K. Cheung; Nachiket Kapre

The operation of FPGA systems, like most VLSI technology, is traditionally governed by static timing analysis, whereby safety margins for operating and manufacturing uncertainty are factored in at design-time. If we operate FPGA designs beyond these conservative margins we can obtain substantial energy and performance improvements. However, doing this carelessly would cause unacceptable impacts to reliability, lifespan and yield - issues which are growing more severe with continuing process scaling. Fortunately, the flexibility of FPGA architecture allows us to monitor and control reliability problems with a variety of runtime instrumentation and adaptation techniques. In this paper we develop a system for detecting timing faults in arbitrary FPGA circuits based on Razor-like shadow register insertion. Through a combination of calibration, timing constraint and adaptation of the CAD flow, we deliver low-overhead, trustworthy fault detection for FPGA-based circuits.Metaphor is important in all sorts of mundane discourse [19], [7]: ordinary conversation, news articles, popular novels, advertisements, etc. This presents a challenge to how Artificial Intelligence (AI) systems understand inter-human discourse (e.g. newspaper articles), or produce more natural-seeming language, as most AI research on metaphor has been about its understanding rather than its generation. To redress the balance towards generation of metaphor, we directly tackle the role of AI systems in communication, uniquely combining this with corpus-based results to guide output to more natural forms of expression.The paper describes a map generation system which relies on random individual contributions of GPS traced movements. In a typical use case, mobile phone users would join a specific community to contribute their movements along streets, roads or pathways in form of so called traces. Neither contributing subscribers nor the map generation need to have an a priori knowledge of the charted area. The approach presented here comprises the trace recording, the upload process to a common server and the processing algorithm for map generation. The problem of noisy data, resulting from GPS inaccuracies and random movements is addressed and countermeasures are proposed. One major perspective of this paper is the decomposition of a map into reasonable segments. In this context, the filter mechanism applied to the raw data is a key component for the deduction of precise street maps. Furthermore, the proof of concept is given and a guideline for a lower limit of active subscribers to achieve an operational system is derived.Distributed computing system is considered as a fundamental architecture to extend resources such as computation speed, storage capacity, and network bandwidth, which are limited for a single processor. Emerging big data processing techniques like Hadoop take advantages of distributed servers to accomplish scalable parallel computations. Large-scale processing jobs can run on different servers or even different clusters interdependently and be combined together as a workflow to provide meaningful outputs. In this paper, we analyze the common demands of big-data processing and distributed big-data workflow processing. According to that, we design Pipe Flow Engine that has the matching features to meet each of these demands. It orchestrates all involved jobs and schedules them in a batched pipeline mode. We also present two online ranking algorithms that make use of the Pipe Flow, sharing the experience and best practice of using Pipe Flow.Cloud service providers (CSP) usually deploy geographically distributed data centers to improve QoS for colocated customers. Inter-Data center traffic constitutes almost half of the data centers export traffic and occupies significant part of the operational cost. Many store-and-forward mechanisms have been proposed to improve the efficiency of inter-data center transfer. However, existing store-and-forward solutions to the inter-datacenter traffic problem need sufficient and suitable intermediate nodes for transferring data, which is much more difficult for both the small and the big CSPs. In this paper, we present a shared datacenter scheduling mechanism called SSNF, which focus on providing more idle intermediates to increase the transmission success ratio for the inter-datacenter traffic. In SSNF, CSPs share datacenters which have idle storage resource and provide these idle datacenters as intermediate nodes for other CSPs data transmission. We compared our scheme with existing bulk data transfer mechanism under several different simulation settings. The results demonstrate our solution has a better transmission success ratio and achieves high efficiency for inter-datacenter traffic.

ACM Transactions on Reconfigurable Technology and Systems | 2018

KAPow: High-Accuracy, Low-Overhead Online Per-Module Power Estimation for FPGA Designs

James J. Davis; Eddie Hung; Joshua M. Levine; Edward A. Stott; Peter Y. K. Cheung; George A. Constantinides

In an FPGA system-on-chip design, it is often insufficient to merely assess the power consumption of the entire circuit by compile-time estimation or runtime power measurement. Instead, to make better decisions, one must understand the power consumed by each module in the system. In this work, we combine measurements of register-level switching activity and system-level power to build an adaptive online model that produces live breakdowns of power consumption within the design. Online model refinement avoids time-consuming characterization while also allowing the model to track long-term operating condition changes. Central to our method is an automated flow that selects signals predicted to be indicative of high power consumption, instrumenting them for monitoring. We named this technique KAPow, for ‘K’ounting Activity for Power estimation, which we show to be accurate and to have low overheads across a range of representative benchmarks. We also propose a strategy allowing for the identification and subsequent elimination of counters found to be of low significance at runtime, reducing algorithmic complexity without sacrificing significant accuracy. Finally, we demonstrate an application example in which a module-level power breakdown can be used to determine an efficient mapping of tasks to modules and reduce system-wide power consumption by up to 7%.

IEEE Design & Test of Computers | 2017

KOCL: Power Self- Awareness for Arbitrary FPGA-SoC-Accelerated OpenCL Applications

James J. Davis; Joshua M. Levine; Edward A. Stott; Eddie Hung; Peter Y. K. Cheung; George A. Constantinides

<italic>Editor’s note:</italic> Being aware of its own power consumption is essential for any system under power constraints, i.e. all systems with moderate or high complexity. This paper describes a tool that provides this power awareness for applications written in OpenCL and implemented on FPGAs. <italic>—Axel Jantsch, TU Wien</italic>

IEEE Computer | 2017

Voltage, Throughput, Power, Reliability, and Multicore Scaling

Fei Xia; Ashur Rafiev; Ali Aalsaud; Mohammed A. N. Al-Hayanni; James J. Davis; Joshua M. Levine; Andrey Mokhov; Alexander B. Romanovsky; Rishad Ahmed Shafik; Alex Yakovlev; Sheng Yang

This article studies the interplay between the performance, energy, and reliability (PER) of parallel-computing systems. It describes methods supporting the meaningful cross-platform analysis of this interplay. These methods lead to the PER software tool, which helps designers analyze, compare, and explore these properties. The web extra at https://youtu.be/aijVMM3Klfc illustrates the PER (performance, energy, and reliability) tool, expanding on the main engineering principles described in the article.

field programmable gate arrays | 2015

Delay-Bounded Routing for Shadow Registers

Eddie Hung; Joshua M. Levine; Edward A. Stott; George A. Constantinides; Wayne Luk

The on-chip timing behaviour of synchronous circuits can be quantified at run-time by adding shadow registers, which allow designers to sample the most critical paths of a circuit at a different point in time than the user register would normally. In order to sample these paths precisely, the path skew between the user and the shadow register must be tightly controlled and consistent across all paths that are shadowed. Unlike a custom IC, FPGAs contain prefabricated resources from which composing an arbitrary routing delay is not trivial. This paper presents a method for inserting shadow registers with a minimum skew bound, whilst also reducing the maximum skew. To preserve circuit timing, we apply this to FPGA circuits post place-and-route, using only the spare resources left behind. We find that our techniques can achieve an average STA reported delay bound of +/-200ps on a Xilinx device despite incomplete timing information, and achieve <1ps accuracy against our own delay model.

Explore More