Is this you? Create Your Porfile

Pingfan Meng

University of California, San Diego

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pingfan Meng is active.

Explore More

Publication

Featured researches published by Pingfan Meng.

Journal of Laboratory Automation | 2011

Strategies for Implementing Hardware-Assisted High-Throughput Cellular Image Analysis

Henry Tat Kwong Tse; Pingfan Meng; Daniel R. Gossett; Ali Irturk; Ryan Kastner; Dino Di Carlo

Recent advances in imaging technology for biomedicine, including high-speed microscopy, automated microscopy, and imaging flow cytometry are poised to have a large impact on clinical diagnostics, drug discovery, and biological research. Enhanced acquisition speed, resolution, and automation of sample handling are enabling researchers to probe biological phenomena at an increasing rate and achieve intuitive image-based results. However, the rich image sets produced by these tools are massive, possessing potentially millions of frames with tremendous depth and complexity. As a result, the tools introduce immense computational requirements, and, more importantly, the fact that image analysis operates at a much lower speed than image acquisition limits its ability to play a role in critical tasks in biomedicine such as real-time decision making. In this work, we present our strategy for high-throughput image analysis on a graphical processing unit platform. We scrutinized our original algorithm for detecting, tracking, and analyzing cell morphology in high-speed images and identified inefficiencies in image filtering and potential shortcut routines in the morphological analysis stage. Using our “grid method” for image enhancements resulted in an 8.54× reduction in total run time, whereas origin centering allowed using a look up table for coordinate transformation, which reduced the total run time by 55.64×. Optimization of parallelization and implementation of specialized image processing hardware will ultimately enable real-time analysis of high-throughput image streams and bring wider adoption of assays based on new imaging technologies.

field-programmable technology | 2012

FPGA-GPU-CPU heterogenous architecture for real-time cardiac physiological optical mapping

Pingfan Meng; Matthew Jacobsen; Ryan Kastner

Real-time optical mapping technology is a technique that can be used in cardiac disease study and treatment technology development to obtain accurate and comprehensive electrical activity over the entire heart. It provides a dense spatial electro-physiology. Each pixel essentially plays the role of a probe on that location of the heart. However, the high throughput nature of the computation causes significant challenges in implementing a real-time optical mapping algorithm. This is exacerbated by high frame rate video for many medical applications (order of 1000 fps). Accelerating optical mapping technologies using multiple CPU cores yields modest improvements, but still only performs at 3.66 frames per second (fps). A highly tuned GPU implementation achieves 578 fps. A FPGA-only implementation is infeasible due to the resource requirements for processing intermediate data arrays generated by the algorithm. We present a FPGA-GPU-CPU architecture that is a real-time implementation of the optical mapping algorithm running at 1024 fps. This represents a 273× speed up over a multi-core CPU implementation.

field-programmable technology | 2012

Designing a hardware in the loop wireless digital channel emulator for software defined radio

Janarbek Matai; Pingfan Meng; Lingjuan Wu; Brad T. Weals; Ryan Kastner

The testing, verification and evaluation of wireless systems is an important but challenging endeavor. The most realistic method to test a wireless system is a field deployment. Unfortunately, this is not only expensive but also time consuming. In this paper, we present the design and implementation of a digital wireless channel emulator, which connects directly to a number of radios, and mimics the wireless channels between them, across a range of scenarios, in real-time. We use high-level synthesis tools to design the emulator while performing design space exploration. We describe the optimizations and tradeoffs that were necessary to reach the target throughput and area requirements.

field-programmable logic and applications | 2013

A hardware accelerated approach for imaging flow cytometry

Dajung Lee; Pingfan Meng; Matthew Jacobsen; Henry Tse; Dino Di Carlo; Ryan Kastner

Imaging flow cytometry uses high-speed flows and a camera to capture morphological features of hundreds to thousands of cells per second. These morphological features can be useful to isolate sub-populations of cells for life science research and diagnostics. Our experimental setup utilizes a high speed 208×32 resolution CMOS camera, operating at over 140,000 frames per second (FPS). In each frame, the analysis routine detects the presence of an object, and performs morphology measurements. Real-time cell sorting requires a latency under 10 ms in addition to a throughput of 140,000 FPS. In this paper, we will describe GPU and FPGA accelerated implementations of the image analysis necessary for an automated cell sorting system. Our FPGA design results in a 38× speedup over software, providing 2,262 FPS with 11.9 ms of latency. Our GPU implementation shows a 22× speedup, supporting 1,318 FPS with 152 ms of latency.

design, automation, and test in europe | 2016

Adaptive Threshold Non-Pareto Elimination: Re-thinking machine learning for system level design space exploration on FPGAs

Pingfan Meng; Alric Althoff; Q Gautier; Ryan Kastner

One major bottleneck of the system level OpenCL-to-FPGA design tools is their extremely time consuming synthesis process (including place and route). The design space for a typical OpenCL application contains thousands of possible designs even when considering a small number of design space parameters. It costs months of compute time to synthesize all these possible designs into end-to-end FPGA implementations. Thus, the brute force design space exploration (DSE) is impractical for these design tools. Machine learning is one solution that identifies the valuable Pareto designs by sampling only a small portion of the entire design space. However, most of the existing machine learning frameworks focus on improving the design objective regression accuracy, which is not necessarily suitable for the FPGA DSE task. To address this issue, we propose a novel strategy - Adaptive Threshold Non-Pareto Elimination (ATNE). Instead of focusing on regression accuracy improvement, ATNE focuses on understanding and estimating the inaccuracy. ATNE provides a Pareto identification threshold that adapts to the estimated inaccuracy of the regressor. This adaptive threshold results in a more efficient DSE. For the same prediction quality, ATNE reduces the synthesis complexity by 1.6 - 2.89× (hundreds of synthesis hours) against the other state of the art frameworks for FPGA DSE. In addition, ATNE is capable of identifying the Pareto designs for certain difficult design spaces which the other existing frameworks are incapable of exploring effectively.

field programmable logic and applications | 2014

Hardware accelerated novel optical de novo assembly for large-scale genomes

Pingfan Meng; Matthew Jacobsen; Motoki Kimura; Vladimir Dergachev; Thomas Anantharaman; Michael Requa; Ryan Kastner

De novo assembly is a widely used methodology in bioinformatics. However, the conventional short-read based de novo assembly is incapable of reliably reconstructing the large-scale structures of human genomes. Recently, a novel optical label based technology has enabled reliable large-scale de novo assembly. Despite its advantage in large-scale genome analysis, this new technology requires a more computationally intensive alignment algorithm than its conventional counterpart. For example, the run-time of reconstructing a human genome is on the order of 10; 000 hours on a sequential CPU. Therefore, in order to practically apply this new technology in genome research, accelerated approaches are desirable. In this paper, we present three different accelerated approaches, multi-core CPU, GPU and FPGA. Against the sequential software baseline, our multi-core CPU design achieved a 8.4× speedup while the GPU and FPGA designs achieved 13.6× and 115× speedups respectively. We also reveal the insights of the design space exploration of this new assembly algorithm on these three different devices by comparing the results.

field-programmable technology | 2016

Spector: An OpenCL FPGA benchmark suite

Q Gautier; Alric Althoff; Pingfan Meng; Ryan Kastner

High-level synthesis tools allow programmers to use OpenCL to create FPGA designs. Unfortunately, these tools have a complex compilation process that can take several hours to synthesize a single design. This creates a significant barrier for design optimization since even experts typically need to test many designs due to the non-obvious interactions between the different optimizations. Thus, understanding the design space, and guiding the optimization process is a crucial requirement for enabling the widespread adoption of these high-level synthesis tools. However this requires a significant amount of design space data that is currently unavailable or difficult to generate. To solve this problem, we present an OpenCL FPGA benchmark suite. We outfitted each benchmark with a range of optimization parameters (or knobs), compiled over 8300 unique designs using the Altera OpenCL SDK, executed them on a Terasic DE5 board, and recorded their corresponding performance and utilization characteristics. We describe the resulting design spaces, and perform a statistical analysis of the optimization configurations which provides valuable architecture insights to FPGA developers. We make the benchmarks and results completely open-source to give opportunities for the community to perform additional analyses and provide a repository of well-documented designs for follow-on research.

ACM Transactions on Reconfigurable Technology and Systems | 2016

Hardware Accelerated Alignment Algorithm for Optical Labeled Genomes

Pingfan Meng; Matthew Jacobsen; Motoki Kimura; Vladimir Dergachev; Thomas Anantharaman; Michael Requa; Ryan Kastner

De novo assembly is a widely used methodology in bioinformatics. However, the conventional short-read-based de novo assembly is incapable of reliably reconstructing the large-scale structures of human genomes. Recently, a novel optical label-based technology has enabled reliable large-scale de novo assembly. Despite its advantage in large-scale genome analysis, this new technology requires a more computationally intensive alignment algorithm than its conventional counterpart. For example, the runtime of reconstructing a human genome is on the order of 10,000 hours on a sequential CPU. Therefore, in order to practically apply this new technology in genome research, accelerated approaches are desirable. In this article, we present three different accelerated approaches, multicore CPU, GPU, and FPGA. Against the sequential software baseline, our multicore CPU design achieved an 8.4× speedup, while the GPU and FPGA designs achieved 13.6× and 115× speedups, respectively. We also discuss the details of the design space exploration of this new assembly algorithm on these three different devices. Finally, we compare these devices in performance, optimization techniques, prices, and design efforts.

field-programmable technology | 2014

Real-time 3D reconstruction for FPGAs: A case study for evaluating the performance, area, and programmability trade-offs of the Altera OpenCL SDK

Q Gautier; Alexandria Shearer; Janarbek Matai; Dustin Richmond; Pingfan Meng; Ryan Kastner

Embedding real-time 3D reconstruction of a scene from a low-cost depth sensor can improve the development of technologies in the domains of augmented reality, mobile robotics, and more. However, current implementations require a computer with a powerful GPU, which limits its prospective applications with low-power requirements. To implement low-power 3D reconstruction we embedded two prominent algorithms of 3D reconstruction (Iterative Closest Point and Volumetric Integration) on an Altera Stratix V FPGA by using the OpenCL language and the Altera OpenCL SDK. In this paper, we present our application and evaluation of the Altera tool in terms of performance, area, and programmability trade-offs. We have verified that OpenCL can be a viable method for developing FPGA applications by modifying an open-source version of the Microsoft KinectFusion project to run partially on a FPGA.

international conference of the ieee engineering in medicine and biology society | 2012

GPU acceleration of optical mapping algorithm for cardiac electrophysiology

Pingfan Meng; Ali Irturk; Ryan Kastner; Andrew D. McCulloch; Jeffrey H. Omens; Adam Wright

Optical mapping is an increasingly popular tool for experimentally analyzing the electrical activity in the heart. The optical mapping algorithm is computationally intense and consumes a considerable amount of time even with a highly optimized program running on a state-of-the-art multi-core microprocessor. For example, one second of data requires approximately 5 minutes of computation time (3.66 FPS) with a C++ program parallelized by OpenMP running on a 3.4GHz Quad-Core CPU. This article presents a GPU implementation of the optical mapping algorithm. Our result indicates that the GPU implementation is capable of processing the optical mapping video at 578 FPS which achieves 157.92X speed against the OpenMP optimized CPU implementation.

Explore More