Jonathan Parri | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jonathan Parri is active.

Explore More

Publication

Featured researches published by Jonathan Parri.

international conference on hardware/software codesign and system synthesis | 2013

pvFPGA: accessing an FPGA-based hardware accelerator in a paravirtualized environment

Wei Wang; Miodrag Bolic; Jonathan Parri

In this paper we present pvFPGA, the first system design solution for virtualizing an FPGA-based hardware accelerator on the x86 platform. Our design adopts the Xen virtual machine monitor (VMM) to build a paravirtualized environment, and a Xilinx Virtex-6 as an FPGA accelerator. The accelerator communicates with the x86 server via PCI Express (PCIe). In comparison to the recent accelerator virtualization solutions which primarily intercept and redirect API calls to the hosted or privileged domains user space, pvFPGA virtualizes an FPGA accelerator directly at the lower device driver level. This gives rise to higher efficiency and lower overhead. In pvFPGA, each unprivileged domain allocates a shared data pool for both user-kernel and inter-domain data transfer. In addition, we propose a new component, the coprovisor, which enables multiple domains to simultaneously access an FPGA accelerator. The experimental results have shown that 1) pvFPGA achieves close-to-zero overhead compared to accessing the FPGA accelerator without the VMM layer, 2) the FPGA accelerator is successfully shared by multiple domains, and 3) distributing different maximum data transfer bandwidths to different domains is achieved by regulating the size of the shared data pool at the split driver loading time.

ACM Queue | 2011

Returning control to the programmer: SIMD intrinsics for virtual machines

Jonathan Parri; Daniel Shapiro; Miodrag Bolic; Voicu Groza

Exposing SIMD units within interpreted languages could simplify programs and unleash floods of untapped processor power.

symposium on applied computational intelligence and informatics | 2011

ASIPs for artificial neural networks

Daniel Shapiro; Jonathan Parri; John-Marc Desmarais; Voicu Groza; Miodrag Bolic

Customized application-specific processors called ASIPs are becoming commonplace in contemporary embedded system designs. Neural networks are an interesting application for which an ASIP can be tailored to increase performance, lower power consumption and/or increase throughput. Here, both the bidirectional associative memory and hopfield auto-associative memory networks are run through an automated instruction-set identification algorithm to identify and select custom instruction candidates suitable for neural network applications. Clusters of neural networks are highly parallel, and therefore it is interesting to consider a homogeneous multiprocessor composed of ASIPs. The two legacy neural network applications showed a 18–120% improvement with the automatic hardware/software partitioning for a uniprocessor ASIP. However, due to pointers and function calling which did not resolve to hardware, the acceleration was concentrated in the network initialization part of the code.

symposium on applied computational intelligence and informatics | 2011

Custom instruction hardware integration within a SoC hybrid environment

Jonathan Parri; Miodrag Bolic; Voicu Groza

Traditionally, common processor augmentation solutions have involved the addition of coprocessors or the datapath integration of custom instructions within extensible processors as Instruction Set Extensions (ISE). Rarely is the hybrid option of using both techniques explored. Much research already exists concerning the identification and selection of custom hardware blocks from hardware/software partitioning techniques, but the question of how to best use this hardware within a user system where both coprocessors and datapath augmentations are possible remains. This paper looks to extend existing ISE algorithms which provide custom hardware as dataflow graphs (DFG) and place them appropriately within a hybrid System-on-Chip (SoC) using standard combinatorial optimization techniques. A combinatorial model is presented to address this placement issue and is applied to two well known kernel programs. We further show that such standard techniques can execute within a reasonable time frame alleviating the need for heuristics.

canadian conference on electrical and computer engineering | 2010

Design of a custom vector operation API exploiting SIMD intrinsics within Java

Jonathan Parri; John-Marc Desmarais; Daniel Shapiro; Miodrag Bolic; Voicu Groza

The use of Single Instruction Multiple Data (SIMD) operations can be instrumental in meeting the needs of high performance computations. Most languages, including C/C++, give a user the power to directly exploit this hardware and inherent parallelism. We have created a retargetable native SIMD library which Java programmers are now able to use to directly access SIMD intrinsics including MMX, SSE1, SSE2 and SSE3 through prescribed Java methods in an API. This API gives users direct control over their high-performance computations instead of solely relying on the SIMD optimizations of the Java Virtual Machine (JVM), or relying on a GPU which must send and receive the data from the CPU. Through the use of this Java API and the included backing library, substantial performance gains can be achieved on large and complex vector operations. We show an example for which the API obtains a 2x to 3x speedup for both small and large data sets as compared to solely relying on the SIMD optimizations in the JVM.

canadian conference on electrical and computer engineering | 2014

Performance and energy consumption analysis of java code utilizing embedded GPU

Iype P. Joseph; Jonathan Parri; Yu Wang; Miodrag Bolic; Amir Rajabzadeh; Voicu Groza

GPUs and multicore CPUs are becoming common in todays embedded world of tablets and smartphones. With CPUs and GPUs getting more complex, maximizing hardware utilization and minimizing energy consumption are becoming problematic. The challenges faced in GPGPU computing on embedded platforms are different from their desktop counterparts due to the memory and computational limitations. This study evaluates the advantages of offloading Java applications to an embedded GPU. By employing two approaches namely, Java Native Interface (JNI-OpenCL) and Java bindings for OpenCL (JOCL) we allowed programmers to program an embedded GPU from Java. Experiments were conducted on a Freescale i.MX6Q SabreLite board which contains a quad-core ARM Cortex A9 CPU and a Vivante GC 2000 GPU that supports the OpenCL 1.1 Embedded Profile. The results show up to an eight times increase in performance efficiency by consuming only one-third the energy compared to the CPU-only version of the Java program. This paper demonstrates the performance and energy benefits achieved by offloading Java programs onto an embedded GPU. To the best of our knowledge, this is the first work involving Java acceleration on embedded GPUs.

symposium on applied computational intelligence and informatics | 2011

Accelerating N-queens problem using OpenMP

A. Ayala; H. Osman; Daniel Shapiro; John-Marc Desmarais; Jonathan Parri; Miodrag Bolic; Voicu Groza

Backtracking algorithms are used to methodically and exhaustively search a solution space for an optimal solution to a given problem. A classic example of a backtracking algorithm is illustrated by finding all solutions to the problem of placing N-queens on an N × N chess board such that no two queens attack each other. This paper demonstrates a methodology for rewriting this backtracking algorithm to take advantage of multi-core computing resources. We accelerated a sequential version of the N-queens problem on ×86 and PPC64 architectures. Using problem sizes between 13 and 17, we observed an average speedup of 3.24 for ×86 and 9.24 for the PPC64.

International Journal of High Performance Computing and Networking | 2017

pvFPGA: paravirtualising an FPGA-based hardware accelerator towards general purpose computing

Wei Wang; Miodrag Bolic; Jonathan Parri

This paper presents an ameliorated design of pvFPGA, which is a novel system design solution for virtualising an FPGA-based hardware accelerator by a virtual machine monitor (VMM). The accelerator design on the FPGA can be used for accelerating various applications, regardless of the application computation latencies. In the implementation, we adopt the Xen VMM to build a paravirtualised environment, and a Xilinx Virtex-6 as an FPGA accelerator. The data transferred between the x86 server and the FPGA accelerator through direct memory access (DMA), and a streaming pipeline technique is adopted to improve the efficiency of data transfer. Several solutions to solve streaming pipeline hazards are discussed in this paper. In addition, we propose a technique, hyper-requesting, which enables portions of two requests bidding to different accelerator applications to be processed on the FPGA accelerator simultaneously through DMA context switches, to achieve request level parallelism. The experimental results show that hyper-requesting reduces request turnaround time by up to 80%.

Archive | 2012

A Case Study on Hardware/Software Codesign in Embedded Artificial Neural Networks

Jonathan Parri; John-Marc Desmarais; Daniel Shapiro; Miodrag Bolic; Voicu Groza

Software/hardware codesign is a complex research problem that has been slowly making headway into industry-ready system design products. Recent advances have shown viability to this direction within the design space exploration scope, especially with regards to rapid development cycles. Here, we exploit the hardware/software codesign landscape in the artificial neural network problem space. Automated tools requiring minimal technical expertise from Altera and Tensilica are examined along with newer advances solely within hardware/software codesign research domain. The design space exploration options discussed here look to achieve better software/hardware partitions using instruction-set extensions and coprocessors. As neural networks continue to find usage in embedded systems, it has become imperative to efficiently optimize their implementation within a short development cycle. Modest speedups can be easily achieved with these automated hardware/software codesign tools on the benchmarks examined.

Archive | 2013