Is this you? Create Your Porfile

Junneng Zhang

University of Science and Technology of China

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Junneng Zhang is active.

Explore More

Publication

Featured researches published by Junneng Zhang.

ieee international conference on services computing | 2011

SOMP: Service-Oriented Multi Processors

Chao Wang; Junneng Zhang; Xuehai Zhou; Xiaojing Feng; Xiaoning Nie

Multi-processor system on chip (MPSoC) has been widely applied in embedded systems design. However, it has posed great challenges in designing and implementing prototype chip for diverse applications due to different instruction set architectures (ISA), programming interfaces and software tool chains. In order to solve the problem, we introduce SOA into MPSoC design, because it can provide flexibility and extensibility for MPSoC chip design at lower cost through adopting re-usable, self-contained modules in the design process. In this paper, we propose a service-oriented multi-processor SOMP, which integrates embedded processors and hardware IP cores as computing servants on a single chip. SOMP provides unified programming interfaces for users through utilizing diverse computing resources. In order to demonstrate the performance of SOMP, we implemented it on a Digilent Virtex5LX110T FPGA board and designed several sample test applications for verification purpose. The experimental results show that SOMP can improve the parallelism greatly and achieve 95.7% of the theoretical speedup on average.

ACM Transactions on Architecture and Code Optimization | 2013

MP-Tomasulo: A Dependency-Aware Automatic Parallel Execution Engine for Sequential Programs

Chao Wang; Xi Li; Junneng Zhang; Xuehai Zhou; Xiaoning Nie

This article presents MP-Tomasulo, a dependency-aware automatic parallel task execution engine for sequential programs. Applying the instruction-level Tomasulo algorithm to MPSoC environments, MP-Tomasulo detects and eliminates Write-After-Write (WAW) and Write-After-Read (WAR) inter-task dependencies in the dataflow execution, therefore to operate out-of-order task execution on heterogeneous units. We implemented the prototype system within a single FPGA. Experimental results on EEMBC applications demonstrate that MP-Tomasulo can execute the tasks out-of-order to achieve as high as 93.6% to 97.6% of ideal peak speedup. A comparative study against a state-of-the-art dataflow execution scheme is illustrated with a classic JPEG application. The promising results show MP-Tomasulo enables programmers to uncover more task-level parallelism on heterogeneous systems, as well as to ease the burden of programmers.

IEEE Transactions on Computers | 2015

Architecture Support for Task Out-of-Order Execution in MPSoCs

Chao Wang; Xi Li; Junneng Zhang; Peng Chen; Yunji Chen; Xuehai Zhou; Ray C. C. Cheung

Multi-processor system on chip (MPSoC) has been widely applied in embedded systems in the past decades. However, it has posed great challenges to efficiently design and implement a rapid prototype for diverse applications due to heterogeneous instruction set architectures (ISA), programming interfaces and software tool chains. In order to solve the problem, this paper proposes a novel high level architecture support for automatic out-of-order (OoO) task execution on FPGA based heterogeneous MPSoCs. The architecture support is composed of a hierarchical middleware with an automatic task level OoO parallel execution engine. Incorporated with a hierarchical OoO layer model, the middleware is able to identify the parallel regions and generate the sources codes automatically. Besides, a runtime middleware Task-Scoreboarding analyzes the inter-task data dependencies and automatically schedules and dispatches the tasks with parameter renaming techniques. The middleware has been verified by the prototype built on FPGA platform. Examples and a JPEG case study demonstrate that our model can largely ease the burden of programmers as well as uncover the task level parallelism.

ieee international conference on services computing | 2012

Regarding Processors and Reconfigurable IP Cores as Services

Chao Wang; Xi Li; Peng Chen; Junneng Zhang; Xiaojing Feng; Xuehai Zhou

This paper proposes a service-oriented reconfigurable co-processing architecture. The novelty of the architecture is to apply service-oriented concepts to system on chip (SoC) design paradigms and utilizes each processor and IP core as a function unit. Regarded as abstract instructions, tasks can be scheduled to IP core for parallel execution automatically. A uniform IP reconfiguration interface is provided to allow function units replacement at run-time. Neither the applications nor the tool chains need to be redesigned after hardware reconfiguration. To evaluate the SOA concepts, we implemented a prototype on a state-of-art Virtex5 FPGA board with IP cores implemented from EEMBC DENBench. The prototype and experimental results demonstrate it can support a range of hardware accelerators in an efficient manner. Furthermore, results also depict that the architecture takes moderate silicon area affordable power consumption. We believe the SOA approach opens a new direction to combine SOA concepts with reconfigurable computing hardware architectures.

international parallel and distributed processing symposium | 2012

FPM: A Flexible Programming Model for MPSoC on FPGA

Chao Wang; Xi Li; Junneng Zhang; Peng Chen; Xiaojing Feng; Xuehai Zhou

This paper proposes a flexible programming model (FPM), which addresses the automatic parallel execution for functional tasks on heterogeneous multiprocessors. Guided by the simply annotated source codes, a front-end source to source compiler is provided to identify the parallel regions and generate the sources codes. A runtime middleware analyzes the inter-task data dependencies and schedules the tasks with renaming techniques automatically. FPM has been verified by the prototype built on state-of-art FPGA. Examples demonstrate that our model can largely ease the burden of programmers as well as uncover the task level parallelism.

international symposium on parallel and distributed processing and applications | 2011

A Flexible High Speed Star Network Based on Peer to Peer Links on FPGA

Chao Wang; Junneng Zhang; Xuehai Zhou; Xiaojing Feng; Aili Wang

Multi-Processor System on Chip (MPSoC) platform plays a vital role in parallel processor architecture design. However, it poses a great challenge to design a flexible high-speed network regarding as the growing number of processors. This paper proposes a star network based on peer to peer links on FPGA. The stat network uses fast simplex links (FSL) for demonstration to connect scheduler and processing elements, including processors and hardware IP cores. Blocking and non-blocking applications interfaces are provided to users for programming. We built a prototype system on FPGA to evaluate the transfer time and hardware costs of the star network architectures. Experiment results shows the average transfer time for each word can be reduced to 7 cycles at least. Moreover, the star network architecture costs only 1.2% Flip Flops and 2.45% LUTs of the whole prototype MPSoC system.

IEEE Transactions on Parallel and Distributed Systems | 2016

Hardware Implementation on FPGA for Task-Level Parallel Dataflow Execution Engine

Chao Wang; Junneng Zhang; Xi Li; Aili Wang; Xuehai Zhou

Heterogeneous multicore platform has been widely used in various areas to achieve both power efficiency and high performance. However, it poses significant challenges to researchers to uncover more coarse-grained task level parallelization. In order to support automatic task parallel execution, this paper proposes a FPGA implementation of a hardware out-of-order scheduler on heterogeneous multicore platform. The scheduler is capable of exploring potential inter-task dependency, leading to a significant acceleration of dependence-aware applications. With the help of renaming scheme, the task dependencies are detected automatically during execution, and then task-level Write-After-Write (WAW) and Write-After-Read (WAR) dependencies can be eliminated dynamically. We extended the instruction level renaming techniques to perform task-level out-of-order execution, and implemented a prototype on a state-of-art Xilinx Virtex-5 FPGA device. Given the reconfigurable characteristic of FPGA, our scheduler supports changing accelerators at runtime to improve the flexibility. Experimental results demonstrate that our scheduler is efficient at both performance and resources usage.

international parallel and distributed processing symposium | 2012

Detecting Data Hazards in Multi-Processor System-on-Chips on FPGA

Chao Wang; Xi Li; Peng Chen; Xiaojing Feng; Junneng Zhang; Xuehai Zhou

This paper presents a novel data hazards detecting engine, task score boarding, which applies instruction level score boarding algorithm to reconfigurable MPSoC on FPGA for out-of-order task execution. Task score boarding can detect inter-task data dependencies and then assign tasks to different processors or IP cores automatically. When the computing resources are sufficient and no data dependences, task score boarding allows tasks to execute out of order. We implemented the prototype system on a state-of-the-art Virtex5 FPGA board. Experimental results on sample applications demonstrated that the task score boarding can achieve more than 97% of theoretical speedup, which shows it can largely uncover task level parallelism.

modeling, analysis, and simulation on computer and telecommunication systems | 2012

Analyzing Parallelization and Program Performance in Heterogeneous MPSoCs

Chao Wang; Xi Li; Junneng Zhang; Gangyong Jia; Peng Chen; Xuehai Zhou

In this paper we extend and analyze Amdahls law to general heterogeneous MPSoC era, to find out how the speedup is affected by the parameters, including amount and speedup for microprocessors and accelerators, as well as the task partition characteristics. We also analyze the theoretical results about how the extended Amdahls Law is applied to leverage load balancing of a heterogeneous MPSoC without the abstract limitation of base core equivalents (BCEs). A prototype on FPGA is constructed with Microblaze processors and JPEG hardware accelerators. The experimental results demonstrate that our extended model reinforces state-of-the-art performance evaluation methods for hybrid MPSoC architectures and also provide creditable new insights on the heterogeneous research communities, in particular for scalable FPGA based reconfigurable MPSoCs.

field programmable logic and applications | 2012

CaaS: Core as a service realizing hardware sercices on reconfigurable MPSoCS

Chao Wang; Xi Li; Junneng Zhang; Peng Chen; Xuehai Zhou

Service-oriented architecture (SOA) has been proved as an efficient way for high level programming paradigms. This paper realizes services into reconfigurable MPSoC to organize CaaS: a core as a service framework, which implements hardware services on state-of-the-art reconfigurable multi-processor system-on-chip (MPSoC) platform for high level parallelization. The integration of SOA concepts can provide structural programming models to ease the burden of high level programming. For demonstration, a prototype with JPEG application has been built on an FPGA, regarding embedded processors and IP cores as computing servants. The experimental results demonstrate the CaaS can achieve high flexibility with dynamic reconfigurable techniques.

Explore More