Jacob Kornerup | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jacob Kornerup is active.

Explore More

Publication

Featured researches published by Jacob Kornerup.

compiler construction | 2009

Scheduling Tasks to Maximize Usage of Aggregate Variables in Place

Samah Abu-Mahmeed; Cheryl McCosh; Zoran Budimlic; Ken Kennedy; Kaushik Ravindran; Kevin Hogan; Paul F. Austin; Steve Rogers; Jacob Kornerup

Single-assignment languages with copy semantics have a very simple and approachable programming model. A naive implementation of the copy semantics that copies the result of every computation to a new location, can result in poor performance. Whereas, an implementation that keeps the results in the same location, when possible, can achieve much higher performance. In this paper, we present a greedy algorithm for in-place computation of aggregate (array and structure) variables. Our algorithm greedily picks the most profitable opportunities for in-place computation, then updates the scheduling and in-place constraints in the program graph. The algorithm runs in

international conference on embedded computer systems: architectures, modeling, and simulation | 2010

Efficient static buffering to guarantee throughput-optimal FPGA implementation of synchronous dataflow graphs

Hojin Kee; Shuvra S. Bhattacharyya; Jacob Kornerup

O(T log T + E_W V + V^2)

signal processing systems | 2012

Mapping Parameterized Cyclo-static Dataflow Graphs onto Configurable Hardware

Hojin Kee; Chung-Ching Shen; Shuvra S. Bhattacharyya; Ian C. Wong; Yong Rao; Jacob Kornerup

time, where T is the number of in-placeness opportunities, E W is the number of edges and V the number of computational nodes in a program graph. We evaluate the performance of the code generated by the LabVIEWTMcompiler using our algorithm against the code that performs no in-place computation at all, resulting in significant application performance improvements. We also compare the performance of the code generated by our algorithm against the commercial LabVIEW compiler that uses an ad-hoc in-placeness strategy. The results show that our algorithm matches the performance of the current LabVIEW strategy in most cases, while in some cases outperforming it significantly.

international conference on distributed smart cameras | 2009

Resource-efficient acceleration of 2-dimensional Fast Fourier Transform computations on FPGAs

Hojin Kee; Shuvra S. Bhattacharyya; Newton G. Petersen; Jacob Kornerup

When designing DSP applications for implementation on field programmable gate arrays (FPGAs), it is often important to minimize consumption of limited FPGA resources while satisfying real-time performance constraints. In this paper, we develop efficient techniques to determine dataflow graph buffer sizes that guarantee throughput-optimal execution when mapping synchronous dataflow (SDF) representations of DSP applications onto FPGAs. Our techniques are based on a novel two-actor SDF graph Model (TASM), which efficiently captures the behavior and costs associated with SDF graph edges (flow-graph connections). With our proposed techniques, designers can automatically generate upper bounds on SDF graph buffer distributions that realize maximum achievable throughput performance for the corresponding applications. Furthermore, our proposed techinque is characterized by low polynomial time complexity, which is useful for rapid prototyping in DSP system design.

meeting of the association for computational linguistics | 2009

Formal verification of LabVIEW programs using the ACL2 Theorem Prover

Matt Kaufmann; Jacob Kornerup; Mark Reitblatt

In recent years, parameterized dataflow has evolved as a useful framework for modeling synchronous and cyclo-static graphs in which arbitrary parameters can be changed dynamically. Parameterized dataflow has proven to have significant expressive power for managing dynamics of DSP applications in important ways. However, efficient hardware synthesis techniques for parameterized dataflow representations are lacking. This paper addresses this void; specifically, the paper investigates efficient field programmable gate array (FPGA)-based implementation of parameterized cyclo-static dataflow (PCSDF) graphs. We develop a scheduling technique for throughput-constrained minimization of dataflow buffering requirements when mapping PCSDF representations of DSP applications onto FPGAs. The proposed scheduling technique is integrated with an existing formal schedule model, called the generalized schedule tree, to reduce schedule cost. To demonstrate our new, hardware-oriented PCSDF scheduling technique, we have designed a real-time base station emulator prototype based on a subset of long-term evolution (LTE), which is a key cellular standard.

software and compilers for embedded systems | 2018

MASES: Mobility And Slack Enhanced Scheduling For Latency-Optimized Pipelined Dataflow Graphs

Wenxiao Yu; Jacob Kornerup; Andreas Gerstlauer

The 2-dimensional (2D) Fast Fourier Transform (FFT) is a fundamental, computationally intensive function that is of broad relevance to distributed smart camera systems. In this paper, we develop a systematic method for improving the throughput of 2D-FFT implementations on field-programmable gate arrays (FPGAs). Our method is based on a novel loop unrolling technique for FFT implementation, which is extended from our recent work on FPGA architectures for 1D-FFT implementation [1]. This unrolling technique deploys multiple processing units within a single 1D-FFT core to achieve efficient configurations of data parallelism while minimizing memory space requirements, and FPGA slice consumption. Furthermore, using our techniques for parallel processing within individual 1DFFT cores, the number of input/output (I/O) ports within a given 1D-FFT core is limited to one input port and one output port. In contrast, previous 2D-FFT design approaches require multiple I/O pairs with multiple FFT cores. This streamlining of 1D-FFT interfaces makes it possible to avoid complex interconnection networks and associated scheduling logic for connecting multiple I/O ports from 1D-FFT cores to the I/O channel of external memory devices. Hence, our proposed unrolling technique maximizes the ratio of the achieved throughput to the consumed FPGA resources under pre-defined constraints on I/O channel bandwidth. To provide generality, our framework for 2D-FFT implementation can be efficiently parameterized in terms of key design parameters such as the transform size and I/O data word length.

Embedded Systems Development, From Functional Models to Implementations | 2014

Modeling, Analysis, and Implementation of Streaming Applications for Hardware Targets

Kaushik Ravindran; Arkadeb Ghosal; Rhishikesh Limaye; Douglas Kim; Hugo A. Andrade; Jeff Correll; Jacob Kornerup; Ian C. Wong; Gerald Wang; Guang Yang; Amal Ekbal; Mike Trimborn; Ankita Prasad; Trung N. Tran

The LabVIEW#8482; system is based on a graphical dataflow language, and is widely used for data acquisition, instrument control and industrial automation. This paper presents a methodology for annotating LabVIEW programs with their specifications, translating those annotated programs into ACL2, and proving the translated specifications with ACL2. Our system supports verification of inductive invariants of bounded loops as well as assertions about straight-line code. Our verification methodology supports the user by generating a highly structured set of proof obligations, many or all of which are discharged automatically. This methodology makes extensive use of hints to support scalability, including careful theory control as well as functional instantiation that avoids explicit use of induction. We describe the design, applicability and limitations of the framework. We also present several examples demonstrating our approach.

Archive | 2005

Synchronizing execution of graphical programs executing on different computer systems

Aljosa Vrancic; Jacob Kornerup

Dataflow and task graph descriptions are widely used for mapping and scheduling of real-time streaming applications onto heterogeneous processing platforms. Such applications are often characterized by the need to process large-volume data streams with not only high throughput, but also low latency. Mapping such application descriptions into tightly constrained implementations requires optimization of pipelined scheduling of tasks on different processing elements. This poses the problem of finding an optimal solution across a latency-throughput objective space. In this paper, we present a novel list-scheduling based heuristic called MASES for pipelined dataflow scheduling to minimize latency under throughput and heterogeneous resource constraints. MASES explores the flexibility provided by mobility and slack of actors in a partial schedule. It can find a valid schedule if one exists even under tight throughput and resource constraints. Furthermore, MASES can improve runtime by up to 4x while achieving similar results as other latency-oriented heuristics for problems they can solve.

Archive | 2004

Graphical program which executes a timed loop

Biren Shah; Jacob Kornerup; Aljosa Vrancic; Jeffrey L. Kodosky; Michael L. Santori

Application advances in the signal processing and communications domains are marked by an increasing demand for better performance and faster time to market. This has motivated model-based approaches to design and deploy such applications productively across diverse target platforms. Dataflow models are effective in capturing these applications that are real-time, multi-rate, and streaming in nature. These models facilitate static analysis of key execution properties like buffer sizes and throughput. There are established tools to generate implementations of these models in software for processor targets. However, prototyping and deployment on hardware targets, in particular reconfigurable hardware such as FPGAs, are critical to the development of new applications. FPGAs are increasingly used in computing platforms for high performance streaming applications. They also facilitate integration with real physical I/O by providing tight timing control and allow the flexibility to adapt to new interface standards. Existing tools for hardware implementation from dataflow models are limited in their ability to combine efficient synthesis and I/O integration and deliver realistic system deployments. To close this gap, we present the LabVIEW DSP Design Module from National Instruments, a framework to specify, analyze, and implement streaming applications on hardware targets. DSP Design Module encourages a model-based design approach starting from streaming dataflow models. The back-end supports static analysis of execution properties and generates implementations for FPGAs. It also includes an extensive library of hardware actors and eases third-party IP integration. Overall, DSP Design Module is an unified design-to-deployment framework that translates high-level algorithmic specifications to efficient hardware, enables design space exploration, and generates realistic system deployments. In this chapter, we illustrate the modeling, analysis, and implementation capabilities of DSP Design Module. We then present a case study to show its viability as a model-based design framework for next generation signal processing and communications systems.

Archive | 2004