Cheng-Yang Fu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Cheng-Yang Fu is active.

Explore More

Publication

Featured researches published by Cheng-Yang Fu.

european conference on computer vision | 2016

SSD: Single Shot MultiBox Detector

Wei Liu; Dragomir Anguelov; Dumitru Erhan; Christian Szegedy; Scott E. Reed; Cheng-Yang Fu; Alexander C. Berg

We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. Our SSD model is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stage and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, MS COCO, and ILSVRC datasets confirm that SSD has comparable accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference. Compared to other single stage methods, SSD has much better accuracy, even with a smaller input image size. For

embedded software | 2009

An effective synchronization approach for fast and accurate multi-core instruction-set simulation

Meng-Huan Wu; Cheng-Yang Fu; Peng-Chih Wang; Ren-Song Tsay

300\times 300

international conference on 3d vision | 2016

Fast Single Shot Detection and Pose Estimation

Patrick Poirson; Phil Ammirato; Cheng-Yang Fu; Wei Liu; Jana Kosecka; Alexander C. Berg

input, SSD achieves 72.1% mAP on VOC2007 test at 58 FPS on a Nvidia Titan X and for

design automation conference | 2011

A high-parallelism distributed scheduling mechanism for multi-core instruction-set simulation

Meng-Huan Wu; Peng-Chih Wang; Cheng-Yang Fu; Ren-Song Tsay

500\times 500

ACM Transactions in Embedded Computing Systems | 2013

A distributed timing synchronization technique for parallel multi-core instruction-set simulation

Meng-Huan Wu; Cheng-Yang Fu; Peng-Chih Wang; Ren-Song Tsay

input, SSD achieves 75.1% mAP, outperforming a comparable state of the art Faster R-CNN model. Code is available at this https URL .

design, automation, and test in europe | 2011

A shared-variable-based synchronization approach to efficient cache coherence simulation for multi-core systems

Cheng-Yang Fu; Meng-Huan Wu; Ren-Song Tsay

This paper proposes a synchronization approach for fast and accu-rate Multi-Core Instruction-Set Simulation (MCISS). An ideal MCISS should run accurately in a real-time fashion. In order to achieve accurate simulation results of MCISS, a lock-step approach, which synchronizes every cycle, is commonly used. However, this approach introduces immense overhead and lowers the simulation speed. Instead of synchronizing every cycle, our approach synchronizes the MCISS based on the data dependency among the simulated programs. Therefore, the synchronization overheads can be highly reduced while the accurate simulation results are ensured. With the proposed approach applied, the simulation speed of MCISS is up to 40 ~ 1,000 million instructions per second (MIPS) in general.

ACM Transactions on Design Automation of Electronic Systems | 2012

An Extended SystemC Framework for Efficient HW/SW Co-Simulation

Meng-Huan Wu; Peng-Chih Wang; Cheng-Yang Fu; Ren-Song Tsay

For applications in navigation and robotics, estimating the 3D pose of objects is as important as detection. Many approaches to pose estimation rely on detecting or tracking parts or keypoints [11, 21]. In this paper we build on a recent state-of-the-art convolutional network for sliding-window detection [10] to provide detection and rough pose estimation in a single shot, without intermediate stages of detecting parts or initial bounding boxes. While not the first system to treat pose estimation as a categorization problem, this is the first attempt to combine detection and pose estimation at the same level using a deep learning approach. The key to the architecture is a deep convolutional network where scores for the presence of an object category, the offset for its location, and the approximate pose are all estimated on a regular grid of locations in the image. The resulting system is as accurate as recent work on pose estimation (42.4% 8 View mAVP on Pascal 3D+ [21] ) and significantly faster (46 frames per second (FPS) on a TITAN X GPU). This approach to detection and rough pose estimation is fast and accurate enough to be widely applied as a pre-processing step for tasks including high-accuracy pose estimation, object tracking and localization, and vSLAM.

arXiv: Computer Vision and Pattern Recognition | 2017

DSSD : Deconvolutional Single Shot Detector.

Cheng-Yang Fu; Wei Liu; Ananth Ranga; Ambrish Tyagi; Alexander C. Berg

Ideally, multi-core instruction-set simulation should run in parallel to improve simulation performance. However, the conventional low-parallelism centralized scheduler greatly constrains simulation performance. To resolve this issue, we propose a high-parallelism distributed scheduling mechanism. The experimental results show that our proposed approach accelerates simulation by 6 to 20 times, depending on the number of cores.

real time technology and applications symposium | 2016

Attacking the One-Out-Of-m Multicore Problem by Combining Hardware Management with Mixed-Criticality Provisioning

Namhoon Kim; Bryan C. Ward; Micaiah Chisholm; Cheng-Yang Fu; James H. Anderson; F. Donelson Smith

As multi-core architecture has become the mainstream, the corresponding multi-core instruction-set simulation (MCISS) is also needed to aid system development. Ideally, we may run a MCISS in parallel to enhance the simulation speed. However, the conventional centralized timing synchronization mechanism would greatly constrain the parallelism of a MCISS, so the simulation speed is bounded. To resolve this issue, we propose a new distributed timing synchronization technique which allows higher parallelism for a MCISS. Hence, it accelerates the simulation speed by 9 to 20 times as the number of cores increases in contrast to the centralized synchronization approach.

Archive | 2009

Method and device for multi-core instruction-set simulation

Meng-Huan Wu; Cheng-Yang Fu; Peng-Chih Wang; Ren-Song Tsay

This paper proposes a shared-variable-based approach for fast and accurate multi-core cache coherence simulation. While the intuitive, conventional approach — synchronizing at either every cycle or memory access — gives accurate simulation results, it has poor performance due to huge simulation overloads. We observe that timing synchronization is only needed before shared variable accesses in order to maintain accuracy while improving the efficiency in the proposed shared-variable-based approach. The experimental results show that our approach performs 6 to 8 times faster than the memory-access-based approach and 18 to 44 times faster than the cycle-based approach while maintaining accuracy.

Explore More