Cheng-Yang Fu
National Tsing Hua University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Cheng-Yang Fu.
european conference on computer vision | 2016
Wei Liu; Dragomir Anguelov; Dumitru Erhan; Christian Szegedy; Scott E. Reed; Cheng-Yang Fu; Alexander C. Berg
We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. Our SSD model is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stage and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, MS COCO, and ILSVRC datasets confirm that SSD has comparable accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference. Compared to other single stage methods, SSD has much better accuracy, even with a smaller input image size. For
embedded software | 2009
Meng-Huan Wu; Cheng-Yang Fu; Peng-Chih Wang; Ren-Song Tsay
300\times 300
international conference on 3d vision | 2016
Patrick Poirson; Phil Ammirato; Cheng-Yang Fu; Wei Liu; Jana Kosecka; Alexander C. Berg
input, SSD achieves 72.1% mAP on VOC2007 test at 58 FPS on a Nvidia Titan X and for
design automation conference | 2011
Meng-Huan Wu; Peng-Chih Wang; Cheng-Yang Fu; Ren-Song Tsay
500\times 500
ACM Transactions in Embedded Computing Systems | 2013
Meng-Huan Wu; Cheng-Yang Fu; Peng-Chih Wang; Ren-Song Tsay
input, SSD achieves 75.1% mAP, outperforming a comparable state of the art Faster R-CNN model. Code is available at this https URL .
design, automation, and test in europe | 2011
Cheng-Yang Fu; Meng-Huan Wu; Ren-Song Tsay
This paper proposes a synchronization approach for fast and accu-rate Multi-Core Instruction-Set Simulation (MCISS). An ideal MCISS should run accurately in a real-time fashion. In order to achieve accurate simulation results of MCISS, a lock-step approach, which synchronizes every cycle, is commonly used. However, this approach introduces immense overhead and lowers the simulation speed. Instead of synchronizing every cycle, our approach synchronizes the MCISS based on the data dependency among the simulated programs. Therefore, the synchronization overheads can be highly reduced while the accurate simulation results are ensured. With the proposed approach applied, the simulation speed of MCISS is up to 40 ~ 1,000 million instructions per second (MIPS) in general.
ACM Transactions on Design Automation of Electronic Systems | 2012
Meng-Huan Wu; Peng-Chih Wang; Cheng-Yang Fu; Ren-Song Tsay
For applications in navigation and robotics, estimating the 3D pose of objects is as important as detection. Many approaches to pose estimation rely on detecting or tracking parts or keypoints [11, 21]. In this paper we build on a recent state-of-the-art convolutional network for sliding-window detection [10] to provide detection and rough pose estimation in a single shot, without intermediate stages of detecting parts or initial bounding boxes. While not the first system to treat pose estimation as a categorization problem, this is the first attempt to combine detection and pose estimation at the same level using a deep learning approach. The key to the architecture is a deep convolutional network where scores for the presence of an object category, the offset for its location, and the approximate pose are all estimated on a regular grid of locations in the image. The resulting system is as accurate as recent work on pose estimation (42.4% 8 View mAVP on Pascal 3D+ [21] ) and significantly faster (46 frames per second (FPS) on a TITAN X GPU). This approach to detection and rough pose estimation is fast and accurate enough to be widely applied as a pre-processing step for tasks including high-accuracy pose estimation, object tracking and localization, and vSLAM.
arXiv: Computer Vision and Pattern Recognition | 2017
Cheng-Yang Fu; Wei Liu; Ananth Ranga; Ambrish Tyagi; Alexander C. Berg
Ideally, multi-core instruction-set simulation should run in parallel to improve simulation performance. However, the conventional low-parallelism centralized scheduler greatly constrains simulation performance. To resolve this issue, we propose a high-parallelism distributed scheduling mechanism. The experimental results show that our proposed approach accelerates simulation by 6 to 20 times, depending on the number of cores.
real time technology and applications symposium | 2016
Namhoon Kim; Bryan C. Ward; Micaiah Chisholm; Cheng-Yang Fu; James H. Anderson; F. Donelson Smith
As multi-core architecture has become the mainstream, the corresponding multi-core instruction-set simulation (MCISS) is also needed to aid system development. Ideally, we may run a MCISS in parallel to enhance the simulation speed. However, the conventional centralized timing synchronization mechanism would greatly constrain the parallelism of a MCISS, so the simulation speed is bounded. To resolve this issue, we propose a new distributed timing synchronization technique which allows higher parallelism for a MCISS. Hence, it accelerates the simulation speed by 9 to 20 times as the number of cores increases in contrast to the centralized synchronization approach.
Archive | 2009
Meng-Huan Wu; Cheng-Yang Fu; Peng-Chih Wang; Ren-Song Tsay
This paper proposes a shared-variable-based approach for fast and accurate multi-core cache coherence simulation. While the intuitive, conventional approach — synchronizing at either every cycle or memory access — gives accurate simulation results, it has poor performance due to huge simulation overloads. We observe that timing synchronization is only needed before shared variable accesses in order to maintain accuracy while improving the efficiency in the proposed shared-variable-based approach. The experimental results show that our approach performs 6 to 8 times faster than the memory-access-based approach and 18 to 44 times faster than the cycle-based approach while maintaining accuracy.