ZongYuan Ge
Queensland University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by ZongYuan Ge.
Sensors | 2016
Inkyu Sa; ZongYuan Ge; Feras Dayoub; Ben Upcroft; Tristan Perez; Christopher McCool
This paper presents a novel approach to fruit detection using deep convolutional neural networks. The aim is to build an accurate, fast and reliable fruit detection system, which is a vital element of an autonomous agricultural robotic platform; it is a key element for fruit yield estimation and automated harvesting. Recent work in deep neural networks has led to the development of a state-of-the-art object detector termed Faster Region-based CNN (Faster R-CNN). We adapt this model, through transfer learning, for the task of fruit detection using imagery obtained from two modalities: colour (RGB) and Near-Infrared (NIR). Early and late fusion methods are explored for combining the multi-modal (RGB and NIR) information. This leads to a novel multi-modal Faster R-CNN model, which achieves state-of-the-art results compared to prior work with the F1 score, which takes into account both precision and recall performances improving from 0.807 to 0.838 for the detection of sweet pepper. In addition to improved accuracy, this approach is also much quicker to deploy for new fruits, as it requires bounding box annotation rather than pixel-level annotation (annotating bounding boxes is approximately an order of magnitude quicker to perform). The model is retrained to perform the detection of seven fruits, with the entire process taking four hours to annotate and train the new model per fruit.
international conference on image processing | 2016
Alex Bewley; ZongYuan Ge; Lionel Ott; Fabio Ramos; Ben Upcroft
This paper explores a pragmatic approach to multiple object tracking where the main focus is to associate objects efficiently for online and realtime applications. To this end, detection quality is identified as a key factor influencing tracking performance, where changing the detector can improve tracking by up to 18.9%. Despite only using a rudimentary combination of familiar techniques such as the Kalman Filter and Hungarian algorithm for the tracking components, this approach achieves an accuracy comparable to state-of-the-art online trackers. Furthermore, due to the simplicity of our tracking method, the tracker updates at a rate of 260 Hz which is over 20x faster than other state-of-the-art trackers.
international conference on image processing | 2015
ZongYuan Ge; Christopher McCool; Conrad Sanderson; Peter Corke
We propose a local modelling approach using deep convolutional neural networks (CNNs) for fine-grained image classification. Recently, deep CNNs trained from large datasets have considerably improved the performance of object recognition. However, to date there has been limited work using these deep CNNs as local feature extractors. This partly stems from CNNs having internal representations which are high dimensional, thereby making such representations difficult to model using stochastic models. To overcome this issue, we propose to reduce the dimensionality of one of the internal fully connected layers, in conjunction with layer-restricted retraining to avoid retraining the entire network. The distribution of low-dimensional features obtained from the modified layer is then modelled using a Gaussian mixture model. Comparative experiments show that considerable performance improvements can be achieved on the challenging Fish and UEC FOOD-100 datasets.
computer vision and pattern recognition | 2015
ZongYuan Ge; Christopher McCool; Conrad Sanderson; Peter Corke
Fine-grained categorisation has been a challenging problem due to small inter-class variation, large intra-class variation and low number of training images. We propose a learning system which first clusters visually similar classes and then learns deep convolutional neural network features specific to each subset. Experiments on the popular fine-grained Caltech-UCSD bird dataset show that the proposed method outperforms recent fine-grained categorisation methods under the most difficult setting: no bounding boxes are presented at test time. It achieves a mean accuracy of 77.5%, compared to the previous best performance of 73.2%. We also show that progressive transfer learning allows us to first learn domain-generic features (for bird classification) which can then be adapted to specific set of bird classes, yielding improvements in accuracy.
workshop on applications of computer vision | 2016
ZongYuan Ge; Alex Bewley; Christopher McCool; Peter Corke; Ben Upcroft; Conrad Sanderson
We present a novel deep convolutional neural network (DCNN) system for fine-grained image classification, called a mixture of DCNNs (MixDCNN). The fine-grained image classification problem is characterised by large intra-class variations and small inter-class variations. To overcome these problems our proposed MixDCNN system partitions images into K subsets of similar images and learns an expert DCNN for each subset. The output from each of the K DCNNs is combined to form a single classification decision. In contrast to previous techniques, we provide a formulation to perform joint end-to-end training of the K DCNNs simultaneously. Extensive experiments, on three datasets using two network structures (AlexNet and GoogLeNet), show that the proposed MixDCNN system consistently outperforms other methods. It provides a relative improvement of 12.7% and achieves state-of-the-art results on two datasets.
workshop on applications of computer vision | 2014
Kaneswaran Anantharajah; ZongYuan Ge; Christopher McCool; Simon Denman; Clinton Fookes; Peter Corke; Dian Tjondronegoro; Sridha Sridharan
Object classification is plagued by the issue of session variation. Session variation describes any variation that makes one instance of an object look different to another, for instance due to pose or illumination variation. Recent work in the challenging task of face verification has shown that session variability modelling provides a mechanism to overcome some of these limitations. However, for computer vision purposes, it has only been applied in the limited setting of face verification. In this paper we propose a local region based intersession variability (ISV) modelling approach, and apply it to challenging real-world data. We propose a region based session variability modelling approach so that local session variations can be modelled, termed Local ISV. We then demonstrate the efficacy of this technique on a challenging real-world fish image database which includes images taken underwater, providing significant real-world session variations. This Local ISV approach provides a relative performance improvement of, on average, 23% on the challenging MOBIO, Multi-PIE and SCface face databases. It also provides a relative performance improvement of 35% on our challenging fish image dataset.
international conference on image processing | 2015
ZongYuan Ge; Christopher McCool; Conrad Sanderson; Alex Bewley; Peter Corke
We propose a novel method to improve fine-grained bird species classification based on hierarchical subset learning. We first form a similarity tree where classes with strong visual correlations are grouped into subsets. An expert local classifier with strong discriminative power to distinguish visually similar classes is then learnt for each subset. On the challenging Caltech200-2011 bird dataset we show that using the hierarchical approach with features derived from a deep convolutional neural network leads to the average accuracy improving from 64.5% to 72.7%, a relative improvement of 12.7%.
intelligent robots and systems | 2015
Stephanie M. Lowry; Adam Jacobson; ZongYuan Ge; Michael Milford
The recent focus on performing visual navigation and place recognition in changing environments has resulted in a large number of heterogeneous techniques each utilizing their own learnt or hand crafted visual features. This paper presents a generally applicable method for learning the appropriate distance metric by which to compare feature responses from any of these techniques in order to perform place recognition under changing environmental conditions. We implement an approach which learns to cluster images captured at spatially proximal locations under different conditions, separated from frames captured at different places. The formulation is a convex optimization, guaranteeing the existence of a global solution. We evaluate the general applicability of our method on two benchmark change datasets using three typical image pre-processing and feature types: GIST, Principal Component Analysis and learnt Convolutional Neural Network features. The results demonstrate that the distance metric learning approach uniformly improves single-image-based visual place recognition performance across all feature types. Furthermore, we demonstrate that this performance improvement is maintained when the sequence-based algorithm SeqSLAM is applied to the single-image place recognition results, leading to state-of-the-art performance.
digital image computing techniques and applications | 2016
ZongYuan Ge; Christopher McCool; Conrad Sanderson; Peng Wang; Lingqiao Liu; Ian D. Reid; Peter Corke
Fine-grained classification is a relatively new field that has concentrated on using information from a single image, while ignoring the enormous potential of using video data to improve classification. In this work we present the novel task of video-based fine-grained object classification, propose a corresponding new video dataset, and perform a systematic study of several recent deep convolutional neural network (DCNN) based approaches, which we specifically adapt to the task. We evaluate three-dimensional DCNNs, two-stream DCNNs, and bilinear DCNNs. Two forms of the two-stream approach are used, where spatial and temporal data from two independent DCNNs are fused either via early fusion (combination of the fully-connected layers) and late fusion (concatenation of the softmax outputs of the DCNNs). For bilinear DCNNs, information from the convolutional layers of the spatial and temporal DCNNs is combined via local co-occurrences. We then fuse the bilinear DCNN and early fusion of the two-stream approach to combine the spatial and temporal information at the local and global level (Spatio-Temporal Co-occurrence). Using the new and challenging video dataset of birds, classification performance is improved from 23.1% (using single images) to 41.1% when using the Spatio-Temporal Co-occurrence system. Incorporating automatically detected bounding box location further improves the classification accuracy to 53.6%.
CLEF (Working Notes) | 2015
ZongYuan Ge; Chris McCool; Conrad Sanderson; Peter Corke