Yurong Chen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yurong Chen is active.

Explore More

Publication

Featured researches published by Yurong Chen.

computer vision and pattern recognition | 2016

HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection

Tao Kong; Anbang Yao; Yurong Chen; Fuchun Sun

Almost all of the current top-performing object detection networks employ region proposals to guide the search for object instances. State-of-the-art region proposal methods usually need several thousand proposals to get high recall, thus hurting the detection efficiency. Although the latest Region Proposal Network method gets promising detection accuracy with several hundred proposals, it still struggles in small-size object detection and precise localization (e.g., large IoU thresholds), mainly due to the coarseness of its feature maps. In this paper, we present a deep hierarchical network, namely HyperNet, for handling region proposal generation and object detection jointly. Our HyperNet is primarily based on an elaborately designed Hyper Feature which aggregates hierarchical feature maps first and then compresses them into a uniform space. The Hyper Features well incorporate deep but highly semantic, intermediate but really complementary, and shallow but naturally high-resolution features of the image, thus enabling us to construct HyperNet by sharing them both in generating proposals and detecting objects via an end-to-end joint training strategy. For the deep VGG16 model, our method achieves completely leading recall and state-of-the-art object detection accuracy on PASCAL VOC 2007 and 2012 using only 100 proposals per image. It runs with a speed of 5 fps (including all steps) on a GPU, thus having the potential for real-time processing.

international parallel and distributed processing symposium | 2008

SIFT implementation and optimization for multi-core systems

Qi Zhang; Yurong Chen; Yimin Zhang; Yinlong Xu

Scale invariant feature transform (SIFT) is an approach for extracting distinctive invariant features from images, and it has been successfully applied to many computer vision problems (e.g. face recognition and object detection). However, the SIFT feature extraction is compute-intensive, and a real-time or even super-real-time processing capability is required in many emerging scenarios. Nowadays, with the multi- core processor becoming mainstream, SIFT can be accelerated by fully utilizing the computing power of available multi-core processors. In this paper, we propose two parallel SIFT algorithms and present some optimization techniques to improve the implementation s performance on multi-core systems. The result shows our improved parallel SIFT implementation can process general video images in super-real-time on a dual-socket, quad-core system, and the speed is much faster than the implementation on GPUs. We also conduct a detailed scalability and memory performance analysison the 8-core system and on a 32-core chip multiprocessor (CMP) simulator. The analysis helps us identify possible causes of bottlenecks, and we suggest avenues for scalability improvement to make this application more powerful on future large-scale multi- core systems.

computer vision and pattern recognition | 2010

Bundled depth-map merging for multi-view stereo

Jianguo Li; Eric Q. Li; Yurong Chen; Lin Xu; Yimin Zhang

Depth-map merging is one typical technique category for multi-view stereo (MVS) reconstruction. To guarantee accuracy, existing algorithms usually require either sub-pixel level stereo matching precision or continuous depth-map estimation. The merging of inaccurate depth-maps remains a challenging problem. This paper introduces a bundle optimization method for robust and accurate depth-map merging. In the method, depth-maps are generated using DAISY feature, followed by two stages of bundle optimization. The first stage optimizes the track of connected stereo matches to generate initial 3D points. The second stage optimizes the position and normals of 3D points. High quality point cloud is then meshed as geometric models. The proposed method can be easily parallelizable on multi-core processors. Middlebury evaluation shows that it is one of the most efficient methods among non-GPU algorithms, yet still keeps very high accuracy. We also demonstrate the effectiveness of the proposed algorithm on various real-world, high-resolution, self-calibrated data sets including objects with complex details, objects with large area of highlight, and objects with non-Lambertian surface.

ieee international symposium on workload characterization | 2008

Parallelization and characterization of SIFT on multi-core systems

Hao Feng; Eric Q. Li; Yurong Chen; Yimin Zhang

This paper parallelizes and characterizes an important computer vision application -Scale Invariant Feature Transform (SIFT) both on a Symmetric Multiprocessor (SMP) platform and a large scale Chip Multiprocessor (CMP) simulator. SIFT is an approach for extracting distinctive invariant features from images and has been widely applied. In many computer vision problems, a real-time or even super-real-time processing capability of SIFT is required. To meet the computation demand, we optimize and parallelize SIFT to accelerate its execution on multi-core systems. Our study shows that SIFT can achieve a 9.7x ~ llx speedup on a 16 -core SMP system. Furthermore, Single Instruction Multiple Data (SIMD) and cache-conscious optimization bring another 85% performance gain at most. But it is still three times slower than the real-time requirement for High-Definition Television (HDTV) image. Then we study the performance of SIFT on a 64 -core CMP simulator. The results show that for HDTV image, SIFT can achieve an excellent speedup of 52 x and run in real-time finally. Besides the parallelization and optimization work, we also conduct a detailed performance analysis for SIFT on those two platforms. We find that load imbalance significantly limits the scalability and SIFT suffers from intensive burst memory bandwidth requirement on the 16 -core SMP system. However, on the 64 -core CMP simulator the memory pressure is not high due to the shared last-level cache (LLC) which accommodates tremendous read-write sharing in SIFT. Thus it does not affect the scaling performance. In short, understanding the characterization of SIFT can help identify the program bottlenecks and give us further insights into designing better systems.

computer vision and pattern recognition | 2017

RON: Reverse Connection with Objectness Prior Networks for Object Detection

Tao Kong; Fuchun Sun; Anbang Yao; Huaping Liu; Ming Lu; Yurong Chen

We present RON, an efficient and effective framework for generic object detection. Our motivation is to smartly associate the best of the region-based (e.g., Faster R-CNN) and region-free (e.g., SSD) methodologies. Under fully convolutional architecture, RON mainly focuses on two fundamental problems: (a) multi-scale object localization and (b) negative sample mining. To address (a), we design the reverse connection, which enables the network to detect objects on multi-levels of CNNs. To deal with (b), we propose the objectness prior to significantly reduce the searching space of objects. We optimize the reverse connection, objectness prior and object detector jointly by a multi-task loss function, thus RON can directly predict final detection results from all locations of various feature maps. Extensive experiments on the challenging PASCAL VOC 2007, PASCAL VOC 2012 and MS COCO benchmarks demonstrate the competitive performance of RON. Specifically, with VGG-16 and low resolution 384×384 input size, the network gets 81.3% mAP on PASCAL VOC 2007, 80.7% mAP on PASCAL VOC 2012 datasets. Its superiority increases when datasets become larger and more difficult, as demonstrated by the results on the MS COCO dataset. With 1.5G GPU memory at test phase, the speed of the network is 15 FPS, 3 times faster than the Faster R-CNN counterpart. Code will be made publicly available.

international conference on image processing | 2010

A general texture mapping framework for image-based 3D modeling

Lin Xu; Eric Q. Li; Jianguo Li; Yurong Chen; Yimin Zhang

This paper presents a general texture mapping framework for image-based 3D modeling. It aims to generating seamless texture map for 3D model created by real-world photos under uncontrolled environment. Our proposed method addresses two challenging problems: 1) texture discontinuity due to system error in 3D modeling from self-calibration; 2) color/lighting difference among images due to real-world uncontrolled environments. The general framework contains two stages to resolve these problems. The first stage globally optimizes the registration of texture patches and triangle faces with Markov Random Field (MRF) to optimize texture mosaic. The second stage does local radiometric correction to adjust color difference between texture patches and then blend texture boundaries to improve color continuity. The proposed method is evaluated on several 3D models by image-based 3D modeling, and demonstrates promising results.

international conference on parallel processing | 2008

Parallelization and Characterization of Probabilistic Latent Semantic Analysis

Chuntao Hong; Wenguang Chen; Weimin Zheng; Jiulong Shan; Yurong Chen; Yimin Zhang

Probabilistic Latent Semantic Analysis (PLSA) is one of the most popular statistical techniques for the analysis of two-model and co-occurrence data. It has applications in information retrieval and filtering, nature language processing, machine learning from text, and other related areas. However, PLSA is rarely applied to large datasets due to its high computational complexity.This paper presents an optimized and parallelized implementation of PLSA which is capable of processing datasets with 10000 documents in seconds. Compared to the baseline program, our parallelized program can achieve speedup of more than six on an eight-processor machine. The characterization of the parallel program is also presented. The performance analysis of the parallel program indicates that this program is memory intensive and the limited memory bandwidth is the bottleneck for better speedup.

international symposium on microarchitecture | 2008

Accelerating Video-Mining Applications Using Many Small, General-Purpose Cores

Eric Q. Li; Wenlong Li; Xiaofeng Tong; Jianguo Li; Yurong Chen; Tao Wang; Patricia P. Wang; Wei Hu; Yangzhou Du; Yimin Zhang; Yen-Kuang Chen

Emerging video-mining applications such as image and video retrieval and indexing will require real-time processing capabilities. A many-core architecture with 64 small, in-order, general-purpose cores as the accelerator can help meet the necessary performance goals and requirements. The key video-mining modules can achieve parallel speedups of 19times to 62times from 64 cores and get an extra 2.3times speedup from 128-bit SIMD vectorization on the proposed architecture.

international conference on parallel processing | 2007

Parallelization and Performance Analysis of Video Feature Extractions on Multi-Core Based Systems

Qi Zhang; Yurong Chen; Jianguo Li; Yimin Zhang; Yinlong Xu

Content-based video information retrieval (CBVIR) has becoming one of the best solutions for retrieving useful information from todays video information explosion. And with the rapid development of modern technologies, CBVIR is emerging as a mass market desktop application. There is evidence that visual feature extraction is the most time-consuming part in a CBVIR system. In this paper, we implement three video visual feature extractions in parallel by exploring different kinds of thread-level parallelism. We also conduct detailed scalability and memory performance analysis on two multi-core based systems, in order to gain more insights into video-analysis related applications on future multi-core systems. From our analysis we identify the likely causes of bottlenecks in these kinds of applications and suggest ways to improve scalability.

annual computer security applications conference | 2008

Parallelization of spectral clustering algorithm on multi-core processors and GPGPU

Jing Zheng; Wenguang Chen; Yurong Chen; Yimin Zhang; Ying Zhao; Weimin Zheng

Spectral clustering is a widely-used algorithm in the field of information retrieval, data mining, machine learning and many others. It can help to cluster a large number of data into several categories without requiring any additional information about the dataset or the categories, so that people can find information by categories easily. In this paper, we parallelize the algorithm proposed by Andrew Y. Ng, Michael I. Jordan and Yair Weiss. We provide two versions of implementation: one is parallelized in OpenMP; the other is programmed in the NVIDIA CUDA (compute unified device architecture), which is the environment provided by NVIDIA to program on its CUDA-Enabled GPGPUs (general-purpose graphic processing unit). We can achieve about three times speedup in OpenMP and around ten times speedup using CUDA in our experiments.

Explore More