Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Dong Ping Zhang is active.

Publication


Featured researches published by Dong Ping Zhang.


high performance distributed computing | 2014

TOP-PIM: throughput-oriented programmable processing in memory

Dong Ping Zhang; Nuwan Jayasena; Alexander Lyashevsky; Joseph L. Greathouse; Lifan Xu; Michael Ignatowski

As computation becomes increasingly limited by data movement and energy consumption, exploiting locality throughout the memory hierarchy becomes critical to continued performance scaling. Moving computation closer to memory presents an opportunity to reduce both energy and data movement overheads. We explore the use of 3D die stacking to move memory-intensive computations closer to memory. This approach to processing in memory addresses some drawbacks of prior research on in-memory computing and is commercially viable in the foreseeable future.n Because 3D stacking provides increased bandwidth, we study throughput-oriented computing using programmable GPU compute units across a broad range of benchmarks, including graph and HPC applications. We also introduce a methodology for rapid design space exploration by analytically predicting performance and energy of in-memory processors based on metrics obtained from execution on todays GPU hardware. Our results show that, on average, viable PIM configurations show moderate performance losses (27%) in return for significant energy efficiency improvements (76% reduction in EDP) relative to a representative mainstream GPU at 22nm technology. At 16nm technology, on average, viable PIM configurations are performance competitive with a representative mainstream GPU (7% speedup) and provide even greater energy efficiency improvements (85% reduction in EDP).


Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness | 2013

A new perspective on processing-in-memory architecture design

Dong Ping Zhang; Nuwan Jayasena; Alexander Lyashevsky; Joseph L. Greathouse; Mitesh R. Meswani; Mark Nutter; Mike Ignatowski

As computation becomes increasingly limited by data movement and energy consumption, exploiting locality throughout the memory hierarchy becomes critical for maintaining the performance scaling that many have come to expect from the computing industry. Moving computation closer to main memory presents an opportunity to reduce the overheads associated with data movement. We explore the potential of using 3D die stacking to move memory-intensive computations closer to memory. This approach to processing-in-memory addresses some drawbacks of prior research on in-memory computing and appears commercially viable in the foreseeable future. We show promising early results from this approach and identify areas that are in need of research to unlock its full potential.


sai intelligent systems conference | 2016

HADM: Hybrid Analysis for Detection of Malware

Lifan Xu; Dong Ping Zhang; Nuwan Jayasena; John Cavazos

Android is the most popular mobile operating system with a market share of over 80% [1]. Due to its popularity and also its open source nature, Android is now the platform most targeted by malware, creating an urgent need for effective defense mechanisms to protect Android-enabled devices.


international conference on cyber security and cloud computing | 2016

Dynamic Android Malware Classification Using Graph-Based Representations

Lifan Xu; Dong Ping Zhang; Marco A. Alvarez; Jose Andre Morales; Xudong Ma; John Cavazos

Malware classification for the Android ecosystem can be performed using a range of techniques. One major technique that has been gaining ground recently is dynamic analysis based on system call invocations recorded during the executions of Android applications. Dynamic analysis has traditionally been based on converting system calls into flat feature vectors and feeding the vectors into machine learning algorithms for classification. In this paper, we implement three traditional feature-vector-based representations for Android system calls. For each feature vector representation, we also propose a novel graph-based representation. We then use graph kernels to compute pair-wise similarities and feed these similarity measures into a Support Vector Machine (SVM) for classification. To speed up the graph kernel computation, we compress the graphs using the Compressed Row Storage format, and then we apply OpenMP to parallelize the computation. Experiments show that the graph-based representations are able to improve the classification accuracy over the corresponding feature-vector-based representations from the same input. Finally we show that different representations can be combined together to further improve classification accuracy.


international parallel and distributed processing symposium | 2016

Fine-Grained Task Migration for Graph Algorithms Using Processing in Memory

Paula Aguilera; Dong Ping Zhang; Nam Sung Kim; Nuwan Jayasena

Graphs are used in a wide variety of application domains, from social science to machine learning. Graph algorithms present large numbers of irregular accesses with little data reuse to amortize the high cost of memory accesses, requiring high memory bandwidth. Processing in memory (PIM) implemented through 3D die-stacking can deliver this high memory bandwidth. In a system with multiple memory modules with PIM, the in-memory compute logic has low latency and high bandwidth access to its local memory, while accesses to remote memory introduce high latency and energy consumption. Ideally, in such a system, computation and data are partitioned among the PIM devices to maximize data locality. But the irregular memory access patterns present in graph applications make it difficult to guarantee that the computation in each PIM device will only access its local data. A large number of remote memory accesses can negate the benefits of using PIM. In this paper, we examine the feasibility and potential of fine-grained work migration to reduce remote data accesses in systems with multiple PIM devices. First, we propose a data-driven implementation of our study algorithms: breadth-first search (BFS), single source shortest path (SSSP) and betweenness centrality (BC) where each PIM has a queue where the vertices that it needs to process are held. New vertices that need to be processed are enqueued at the PIM device co-located with the memory that stores those vertices. Second, we propose hardware support that takes advantage of PIM to implement highly efficient queues that improve the performance of the queuing framework by up to 16.7%. Third, we develop a timing model for the queueing framework to explore the benefits of work migration vs. remote memory accesses. And, finally, our analysis using the above framework shows that naïve task migration can lead to performance degradations and identifies trade-offs among data locality, redundant computation, and load balance among PIM devices that must be taken into account to realize the potential benefits of fine-grain task migration.


Heterogeneous Computing with OpenCL 2.0 (Third Edition) | 2015

Dissecting OpenCL on a heterogeneous system

David R. Kaeli; Perhaad Mistry; Dana Schaa; Dong Ping Zhang

This chapter shows how OpenCL maps to a system with an x86-based FX-8350 central processing unit and a discrete R9 290X graphics processing unit. It also discusses memory performance considerations for global and local memory.


Heterogeneous Computing with OpenCL 2.0 (Third Edition) | 2015

Chapter 4 – Examples

David R. Kaeli; Perhaad Mistry; Dana Schaa; Dong Ping Zhang

This chapter provides four complete OpenCL examples (both host code and kernels). The examples include OpenCL implementations of histogram, image rotation, and convolution, as well as a producer-consumer design executing on multiple devices. These algorithms illustrate features such as communication via local memory, atomic operations, images, samplers, and pipes.


Heterogeneous Computing with OpenCL 2.0 (Third Edition) | 2015

OpenCL host-side memory model

David R. Kaeli; Perhaad Mistry; Dana Schaa; Dong Ping Zhang

This chapter presents OpenCL’s host-side memory model, relating to the allocation and management of memory objects. It also provides an introduction to OpenCL’s support for shared virtual memory.


Heterogeneous Computing with OpenCL 2.0 (Third Edition) | 2015

OpenCL device-side memory model

David R. Kaeli; Perhaad Mistry; Dana Schaa; Dong Ping Zhang

This chapter presents OpenCL’s device-side memory model, including memory spaces, memory objects, and consistency models.


Heterogeneous Computing with OpenCL 2.0 (Third Edition) | 2015

Case study: Image clustering

David R. Kaeli; Perhaad Mistry; Dana Schaa; Dong Ping Zhang

The bag-of-words (BoW) model is one of the most popular approaches to image classification and forms an important component of image search systems. The BoW model treats an image’s features as words, and represents the image as a vector of the occurrence counts of image features (words). This chapter discusses the OpenCL implementation of an important component of the BoW model—namely, the histogram builder. We discuss the OpenCL kernel, and study the performance impact of various source code optimizations.

Collaboration


Dive into the Dong Ping Zhang's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Dana Schaa

Northeastern University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Lifan Xu

University of Delaware

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge