Is this you? Create Your Porfile

Yanzhou Liu

University of Maryland, College Park

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yanzhou Liu is active.

Explore More

Publication

Featured researches published by Yanzhou Liu.

software and compilers for embedded systems | 2016

A Design Framework for Mapping Vectorized Synchronous Dataflow Graphs onto CPU-GPU Platforms

Shuoxin Lin; Yanzhou Liu; William Plishker; Shuvra S. Bhattacharyya

Heterogeneous computing platforms with multicore central processing units (CPUs) and graphics processing units (GPUs) are of increasing interest to designers of embedded signal processing systems since they offer the potential for significant performance boost while maintaining the flexibility of software-based design flows. Developing optimized implementations for CPU-GPU platforms is challenging due to complex, inter-related design issues, including task scheduling, interprocessor communication, memory management, and modeling and exploitation of different forms of parallelism. In this paper, we present an automated, dataflow based, design framework called DIF-GPU for application mapping and software synthesis on heterogeneous CPU-GPU platforms. DIF-GPU is based on novel extensions to the dataflow interchange format (DIF) package, which is a software environment for developing and experimenting with dataflow-based design methods and synthesis techniques for embedded signal processing systems. DIF-GPU exploits multiple forms of parallelism by deeply incorporating efficient vectorization and scheduling techniques for synchronous dataflow specifications, and incorporating techniques for streamlining interprocessor communication. DIF-GPU also provides software synthesis capabilities to help accelerate the process of moving from high-level application models to optimized implementations.

IEEE Aerospace and Electronic Systems Magazine | 2017

Dynamic, data-driven processing of multispectral video streams

Honglei Li; Kishan Sudusinghe; Yanzhou Liu; Jinsung Yoon; Mihaela van der Schaar; Erik Blasch; Shuvra S. Bhattacharyya

Video analytics plays an important role in a wide variety of defense-, monitoring- and surveillance-related systems for air and ground environments. In this context, multispectral video processing is attracting increased interest in recent years, due in part to technological advances in video capture. Compared with monochromatic video, multispectral video offers better spectral resolution, and different bands of multispectral video streams can enhance video analytics capabilities in different ways. For example, the infrared bands can provide better separation of shadows from objects, and improved spatial resolution in scenes that are impaired by fog or haze [16].

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2018

Reproducible Evaluation of System Efficiency With a Model of Architecture: From Theory to Practice

Maxime Pelcat; Alexandre Mercat; Karol Desnos; Luca Maggiani; Yanzhou Liu; Julien Heulot; Jean-François Nezan; Wassim Hamidouche; Daniel Menard; Shuvra S. Bhattacharyya

Current trends in high performance and embedded computing include design of increasingly complex hardware architectures with high parallelism, heterogeneous processing elements, and nonuniform communication resources. In order to take hardware and software design decisions, early evaluations of the system nonfunctional properties are needed. These evaluations of system efficiency require electronic system-level information on both algorithms and architecture. Contrary to algorithm models for which a major body of work has been conducted on defining formal models of computation (MoCs), architecture models from the literature are mostly empirical models from which reproducible experimentation requires the accompanying software. In this paper, a precise definition of a model of architecture (MoA) is proposed that focuses on reproducibility and abstraction and removes the overlap previously existing between the notions of MoA and MoC. A first MoA, called the linear system-level architecture model (LSLA), is presented. To demonstrate the generic nature of the proposed new architecture modeling concepts, we show that the LSLA model can be integrated flexibly with different MoCs. LSLA is then used to model the energy consumption of a state-of-the-art multiprocessor system-on-chip (MPSoC) when running an application described using the synchronous dataflow MoC. A method to automatically learn LSLA model parameters from platform measurements is introduced. Despite the high complexity of the underlying hardware and software, a simple LSLA model is demonstrated to estimate the energy consumption of the MPSoC with a fidelity of 86%.

signal processing systems | 2017

Data Flow Algorithms for Processors with Vector Extensions

Lee A. Barford; Shuvra S. Bhattacharyya; Yanzhou Liu

Full use of the parallel computation capabilities of present and expected CPUs and GPUs requires use of vector extensions. Yet many actors in data flow systems for digital signal processing have internal state (or, equivalently, an edge that loops from the actor back to itself) that impose serial dependencies between actor invocations that make vectorizing across actor invocations impossible. Ideally, issues of inter-thread coordination required by serial data dependencies should be handled by code written by parallel programming experts that is separate from code specifying signal processing operations. The purpose of this paper is to present one approach for so doing in the case of actors that maintain state. We propose a methodology for using the parallel scan (also known as prefix sum) pattern to create algorithms for multiple simultaneous invocations of such an actor that results in vectorizable code. Two examples of applying this methodology are given: (1) infinite impulse response filters and (2) finite state machines. The correctness and performance of the resulting IIR filters and one class of FSMs are studied.

Archive | 2017

The DSPCAD Framework for Modeling and Synthesis of Signal Processing Systems

Shuoxin Lin; Yanzhou Liu; Kyunghun Lee; Lin Li; William Plishker; Shuvra S. Bhattacharyya

With domain-specific models of computation and widely-used hardware acceleration techniques, hardware/software codesign has the potential of being as agile as traditional software design, while approaching the performance of custom hardware. However, due to increasing use of system heterogeneity, multicore processors, and hardware accelerators, along with traditional software development challenges, codesign processes for complex systems are often slow and error prone. The purpose of this chapter is to discuss a Computer-Aided Design (CAD) framework, called the DSPCAD Framework, that addresses some of these key development issues for the broad domain of digital signal processing (DSP) systems. The emphasis in the DSPCAD Framework on supporting cross-platform, domainspecific approaches enables designers to rapidly arrive at initial implementations Shuoxin Lin Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, USA. e-mail: [email protected] Yanzhou Liu Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, USA. e-mail: [email protected] Kyunghun Lee Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, USA. e-mail: [email protected] Lin Li Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, USA. e-mail: [email protected] William Plishker Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, USA. e-mail: [email protected] Shuvra S. Bhattacharyya Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, USA, and Department of Pervasive Computing, Tampere University of Technology, Finland. e-mail: [email protected] 1 This paper has been accepted for publication in: S. Ha and J. Teich, editors, Handbook of Hardware/Software Codesign, Springer, 2017. The official/final version of the paper is published on http://link.springer.com/.

instrumentation and measurement technology conference | 2015

Constant-rate clock recovery and jitter measurement on deep memory waveforms using dataflow

Yanzhou Liu; Lee A. Barford; Shuvra S. Bhattacharyya

The measurement of jitter is key when verifying the design or performing manufacturing test of ever more complex digital communications circuitry or equipment. As the requirements for bit error rates (BER) become more stringent and data volumes increase, it becomes increasingly important and interesting to measure timing jitter in long, or even temporally unbounded, waveforms. Previous methods for doing constant rate clock recovery and jitter measurement required storing and computing on all samples of the waveform at once. As the waveform grows, and especially if the waveform is unbounded, this storing it in its entirety becomes impractical. We demonstrate the transformation of the previous method to a dataflow method where the entire waveform need never be stored. The new method has been tested on actual measured data. Through its incorporation of dataflow principles, the new method is suitable for efficient mapping to a variety of platforms, including multicore and field programmable gate array platforms for high performance signal processing. Intermediate measurement results converge toward those obtained in the original method. The final measurement result, the jitter standard deviation, agrees with the original method to within well under one percent. Thus, a small amount of additional measurement error is added in order to remove the restriction that the entire waveform fit into memory.

ieee global conference on signal and information processing | 2014

Data flow algorithms for processors with vector extensions: Handling actors with internal state

Lee A. Barford; Shuvra S. Bhattacharyya; Yanzhou Liu

Full use of the parallel computation capabilities of present and expected CPUs and CPUs require use of vector extensions. Yet many actors in data flow systems for digital signal processing have internal state (or, equivalently, an edge that loops from the actor back to itself) that impose serial dependencies between actor invocations that make vectorizing across actor invocations impossible. Ideally, issues of inter-thread coordination required by serial data dependencies should be handled by code written by parallel programming experts that is separate from code specifying signal processing operations. The purpose of this paper is to present one approach for so doing in the case of actors that maintain state. We propose a methodology for using the parallel scan (also known as prefix sum) pattern to create algorithms for multiple simultaneous invocations of such an actor that results in vectorizable code. Two examples of applying this methodology are given: (1) infinite impulse response filters and (2) finite state machines. The correctness and performance of the resulting IIR filters are studied.

application-specific systems, architectures, and processors | 2017

Design and implementation of adaptive signal processing systems using Markov decision processes

Lin Li; Adrian E. Sapio; Jiahao Wu; Yanzhou Liu; Kyunghun Lee; Marilyn Wolf; Shuvra S. Bhattacharyya

In this paper, we propose a novel framework, called Hierarchical MDP framework for Compact System-level Modeling (HMCSM), for design and implementation of adaptive embedded signal processing systems. The HMCSM framework applies Markov decision processes (MDPs) to enable autonomous adaptation of embedded signal processing under multidimensional constraints and optimization objectives. The framework integrates automated, MDP-based generation of optimal reconfiguration policies, dataflow-based application modeling, and implementation of embedded control software that carries out the generated reconfiguration policies. HMCSM systematically decomposes a complex, monolithic MDP into a set of separate MDPs that are connected hierarchically, and that operate more efficiently through such a modularized structure. We demonstrate the effectiveness of our new MDP-based system design framework through experiments with an adaptive wireless communications receiver.

signal processing systems | 2016

Models of Architecture: Reproducible Efficiency Evaluation for Signal Processing Systems

Maxime Pelcat; Karol Desnos; Luca Maggiani; Yanzhou Liu; Julien Heulot; Jean-François Nezan; Shuvra S. Bhattacharyya

The current trend in high performance and embedded signal processing consists of designing increasingly complex heterogeneous hardware architectures with non-uniform communication resources. In order to take hardware and software design decisions, early evaluations of the system non-functional properties are needed. These evaluations of system efficiency require high-level information on both the algorithms and the architecture. In this paper, we define the notion of Model of Architecture (MoA) and study the combination of a Model of Computation (MoC) and an MoA to provide a design space exploration environment for the study of the algorithmic and architectural choices. A cost is computed from the mapping of an application, represented by a model conforming a MoC onto an architecture represented by a model conforming an MoA. The cost is composed of a processing-related part and a communication-related part. It is an abstract scalar value to be minimized and can represent any non-functional requirement of a system such as memory, energy, throughput or latency.

instrumentation and measurement technology conference | 2016

Jitter measurement on deep waveforms with constant memory

Yanzhou Liu; Lee A. Barford; Shuvra S. Bhattacharyya

The time required for jitter measurement in digital communications waveforms can be dominated by computation time which increases with waveform depth. Previous work on decreasing this computation time includes the use of parallel resources on microprocessors and graphics processing units. However, the waveform depth and computation speed were limited by the need to have the entire waveform and intermediate results derived from it in memory all at once. We present a new dataflow-based method for clock recovery and time interval error (TIE) and TIE standard deviation computation. Memory usage does not grow with waveform depth, so the latter is not limited by memory size. We describe an implementation in LIDE-OCL, a tool for simplifying implementation of dataflow signal processing using multicore processors and GPUs. The resulting measurement accuracy is compared on actual measured waveforms with prior methods.

Explore More