Is this you? Create Your Porfile

Zhilu Chen

Worcester Polytechnic Institute

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhilu Chen is active.

Explore More

Publication

Featured researches published by Zhilu Chen.

international symposium on circuits and systems | 2014

A fast deep learning system using GPU

Zhilu Chen; Jing Wang; Haibo He; Xinming Huang

The invention of deep belief network (DBN) provides a powerful tool for data modeling. The key advantage of DBN is that it is driven by training data only, which can alleviate researchers from the routine of devising explicit models or features for data with complicated distributions. However, as the dimensionality and quantity of data increase, the computing load of training a DBN increases rapidly. Prospectively, the remarkable computing power provided by modern GPU devices can reduce the training time of DBN significantly. As highly efficient computational libraries become available, it provides additional support for GPU based parallel computing. Moreover, GPU server is more affordable and accessible compared with computer cluster or supercomputer. In this paper, we implement a variant of the DBNs, called folded-DBN, on NVIDAs Tesla K20 GPU. In our simulations, two sets of database are used to train the folded-DBNs on both CPU and GPU platforms. Comparing execution time of the fine-tuning process, the GPU implementation results 7 to 11 times speedup over the CPU platform.

international symposium on circuits and systems | 2014

Accelerating leveled fully homomorphic encryption using GPU

Wei Wang; Zhilu Chen; Xinming Huang

Gentry introduced the first plausible fully homomorphic encryption (FHE) scheme, which was considered a major breakthrough in cryptography. Several FHE schemes have been proposed to make FHE more efficient for practical applications since then. The leveled fully homomorphic scheme is among the most well-known schemes. In leveled FHE scheme, large-number matrix-vector multiplication is a crucial part of the encryption algorithm. In this paper, Chinese Remainder Theorem (CRT) is employed to reduce the computational complexity of the large-number element-by-element modular multiplication. The first step is called decomposition, in which each large-number element in the matrix and vector is decomposed into many small words. The next step is vector operation that performs the modular multiplications and additions of the decomposed small words. Finally the matrix-vector multiplication results can be obtained through reconstruction. We compare the CRTbased method with Number Theory Library (NTL), showing the proposed method is about 7.8 times faster when executing on CPU. In addition, it is observed that vector operation takes up to 99.6% of the total computation time and the reconstruction only takes 0.4%. Therefore GPU acceleration is employed to speed up the vector operations. Experiment results show that the GPU implementation of the CRT-based method is 35.2 times faster than the same method implemented on CPU and is 273.6 times faster than the NTL library on CPU.

ieee intelligent vehicles symposium | 2015

Road marking detection and classification using machine learning algorithms

Tairui Chen; Zhilu Chen; Quan Shi; Xinming Huang

This paper presents a novel approach for road marking detection and classification based on machine learning algorithms. Road marking recognition is an important feature of an intelligent transportation system (ITS). Previous works are mostly developed using image processing and decisions are often made using empirical functions, which makes it difficult to be generalized. Hereby, we propose a general framework for object detection and classification, aimed at video-based intelligent transportation applications. It is a two-step approach. The detection is carried out using binarized normed gradient (BING) method. PCA network (PCANet) is employed for object classification. Both BING and PCANet are among the latest algorithms in the field of machine learning. Practically the proposed method is applied to a road marking dataset with 1,443 road images. We randomly choose 60% images for training and use the remaining 40% images for testing. Upon training, the system can detect 9 classes of road markings with an accuracy better than 96.8%. The proposed approach is readily applicable to other ITS applications.

2014 IEEE Symposium on Computational Intelligence in Vehicles and Transportation Systems (CIVTS) | 2014

A GPU-based real-time traffic sign detection and recognition system

Zhilu Chen; Xinming Huang; Zhen Ni; Haibo He

This paper presents a GPU-based system for real-time traffic sign detection and recognition which can classify 48 different traffic signs included in the library. The proposed design implementation has three stages: pre-processing, feature extraction and classification. For high-speed processing, we propose a window-based histogram of gradient algorithm that is highly optimized for parallel processing on a GPU. For detecting signs in various sizes, the processing was applied at 32 scale levels. For more accurate recognition, multiple levels of supported vector machines are employed to classify the traffic signs. The proposed system can process 27.9 frames per second video with active pixels of 1,628 × 1,236 resolution. Evaluating using the BelgiumTS dataset, the experimental results show the detection rate is about 91.69% with false positives per window of 3.39 × 10-5 and the recognition rate is about 93.77%.

ieee intelligent vehicles symposium | 2015

Automatic detection of traffic lights using support vector machine

Zhilu Chen; Quan Shi; Xinming Huang

Many traffic accidents occurred at intersections are caused by drivers who miss or ignore the traffic signals. In this paper, we present a new method for automatic detection of traffic lights that integrates both image processing and support vector machine techniques. An experimental dataset with 21299 samples is built from the captured original videos while driving on the streets. When compared to the traditional object detection and existing methods, the proposed system provides significantly better performance with 96.97% precision and 99.43% recall. The system framework is extensible that users can introduce additional parameters to further improve the detection performance.

ieee intelligent vehicles symposium | 2017

End-to-end learning for lane keeping of self-driving cars

Zhilu Chen; Xinming Huang

Lane keeping is an important feature for self-driving cars. This paper presents an end-to-end learning approach to obtain the proper steering angle to maintain the car in the lane. The convolutional neural network (CNN) model takes raw image frames as input and outputs the steering angles accordingly. The model is trained and evaluated using the comma.ai dataset, which contains the front view image frames and the steering angle data captured when driving on the road. Unlike the traditional approach that manually decomposes the autonomous driving problem into technical components such as lane detection, path planning and steering control, the end-to-end model can directly steer the vehicle from the front view camera data after training. It learns how to keep in lane from human driving data. Further discussion of this end-to-end approach and its limitation are also provided.

international symposium on circuits and systems | 2015

A pipeline architecture for traffic sign classification on an FPGA

Yuteng Zhou; Zhilu Chen; Xinming Huang

This paper presents an efficient FPGA design that can classify 48 different traffic signs at real-time. The method is based on histogram of oriented gradients (HOG) feature extraction and support vector machine (SVM) for classification. A full-pipeline, resource efficient architecture is presented with detail design of each block. The FPGA implementation has a system clock of 241.7 MHz with an absolute response time of 6.5 s. Taking streaming pixel input, the system throughput is about 106 times faster than the same algorithm executed on a general purpose processor.

ieee high performance extreme computing conference | 2014

A GPU accelerated virtual scanning confocal microscope

James L. Kingsley; Zhilu Chen; Jeffrey P. Bibeau; Luis Vidali; Xinming Huang; Erkan Tüzel

Fluorescence Recovery After Photobleaching (FRAP) is a commonly used technique for quantifying the movement of small biological systems. To aid in the evaluation of experimentally produced data, we used the parallel processing power offered by Graphics Processing Units (GPUs) to accelerate a computational simulation of the process. We find that the parallel process is significantly faster when implemented on the GPU, and that further speed increases can be accomplished via various optimizations, bringing the speed increase up to a factor of one hundred in some cases.

Biophysical Journal | 2018

Characterization of Cell Boundary and Confocal Effects Improves Quantitative FRAP Analysis

James L. Kingsley; Jeffrey P. Bibeau; S. Iman Mousavi; Cem Unsal; Zhilu Chen; Xinming Huang; Luis Vidali; Erkan Tüzel

Fluorescence recovery after photobleaching (FRAP) is an important tool used by cell biologists to study the diffusion and binding kinetics of vesicles, proteins, and other molecules in the cytoplasm, nucleus, or cell membrane. Although many FRAP models have been developed over the past decades, the influence of the complex boundaries of 3D cellular geometries on the recovery curves, in conjunction with regions of interest and optical effects (imaging, photobleaching, photoswitching, and scanning), has not been well studied. Here, we developed a 3D computational model of the FRAP process that incorporates particle diffusion, cell boundary effects, and the optical properties of the scanning confocal microscope, and validated this model using the tip-growing cells of Physcomitrella patens. We then show how these cell boundary and optical effects confound the interpretation of FRAP recovery curves, including the number of dynamic states of a given fluorophore, in a wide range of cellular geometries-both in two and three dimensions-namely nuclei, filopodia, and lamellipodia of mammalian cells, and in cell types such as the budding yeast, Saccharomyces pombe, and tip-growing plant cells. We explored the performance of existing analytical and algorithmic FRAP models in these various cellular geometries, and determined that the VCell VirtualFRAP tool provides the best accuracy to measure diffusion coefficients. Our computational model is not limited only to these cells types, but can easily be extended to other cellular geometries via the graphical Java-based application we also provide. This particle-based simulation-called the Digital Confocal Microscopy Suite or DCMS-can also perform fluorescence dynamics assays, such as number and brightness, fluorescence correlation spectroscopy, and raster image correlation spectroscopy, and could help shape the way these techniques are interpreted.

international symposium on circuits and systems | 2016

A system-on-chip FPGA design for real-time traffic signal recognition system

Yuteng Zhou; Zhilu Chen; Xinming Huang

Traffic signal detection has long been an important function in an advanced driver assistance system (ADAS). This paper presents a complete system design based on the techniques of blob detection, histogram of oriented gradients (HOG) and support vector machine (SVM). Blob detection is applied to detect potential candidates, and then HOG and SVM is for feature classification. A novel hardware/software co-design architecture is developed for traffic light recognition at real-time. With well-balanced workload on FPGA fabric and the on-chip ARM processor, the entire system-on-chip can achieve a processing rate of 60 fps for XGA 1024-by-768 video. The system can achieve an accuracy rate of over 90% on both red lights and green lights. The proposed system can be improved by replacing HOG with more advanced feature algorithm to obtain higher accuracy.

Explore More