Is this you? Create Your Porfile

Yi-Zhe Song

Queen Mary University of London

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yi-Zhe Song is active.

Explore More

Publication

Featured researches published by Yi-Zhe Song.

Neurocomputing | 2013

Text extraction from natural scene image: A survey

Honggang Zhang; Kaili Zhao; Yi-Zhe Song; Jun Guo

With the increasing popularity of portable camera devices and embedded visual processing, text extraction from natural scene images has become a key problem that is deemed to change our everyday lives via novel applications such as augmented reality. Text extraction from natural scene images algorithms is generally composed of the following three stages: (i) detection and localization, (ii) text enhancement and segmentation and (iii) optical character recognition (OCR). The problem is challenging in nature due to variations in the font size and color, text alignment, illumination change and reflections. This paper aims to classify and assess the latest algorithms. More specifically, we draw attention to studies on the first two steps in the extraction process, since OCR is a well-studied area where powerful algorithms already exist. This paper offers to the researchers a link to public image database for the algorithm assessment of text extraction from natural scene images.

british machine vision conference | 2015

Sketch-a-Net that Beats Humans

Qian Yu; Yongxin Yang; Yi-Zhe Song; Tao Xiang; Timothy M. Hospedales

We propose a multi-scale multi-channel deep neural network framework that, for the first time, yields sketch recognition performance surpassing that of humans. Our superior performance is a result of explicitly embedding the unique characteristics of sketches in our model: (i) a network architecture designed for sketch rather than natural photo statistics, (ii) a multi-channel generalisation that encodes sequential ordering in the sketching process, and (iii) a multi-scale network ensemble with joint Bayesian fusion that accounts for the different levels of abstraction exhibited in free-hand sketches. We show that state-of-the-art deep networks specifically engineered for photos of natural objects fail to perform well on sketch recognition, regardless whether they are trained using photo or sketch. Our network on the other hand not only delivers the best performance on the largest human sketch dataset to date, but also is small in size making efficient training possible using just CPUs.

computer vision and pattern recognition | 2016

Sketch Me That Shoe

Qian Yu; Feng Liu; Yi-Zhe Song; Tao Xiang; Timothy M. Hospedales; Chen Change Loy

We investigate the problem of fine-grained sketch-based image retrieval (SBIR), where free-hand human sketches are used as queries to perform instance-level retrieval of images. This is an extremely challenging task because (i) visual comparisons not only need to be fine-grained but also executed cross-domain, (ii) free-hand (finger) sketches are highly abstract, making fine-grained matching harder, and most importantly (iii) annotated cross-domain sketch-photo datasets required for training are scarce, challenging many state-of-the-art machine learning techniques. In this paper, for the first time, we address all these challenges, providing a step towards the capabilities that would underpin a commercial sketch-based image retrieval application. We introduce a new database of 1,432 sketchphoto pairs from two categories with 32,000 fine-grained triplet ranking annotations. We then develop a deep tripletranking model for instance-level SBIR with a novel data augmentation and staged pre-training strategy to alleviate the issue of insufficient fine-grained training data. Extensive experiments are carried out to contribute a variety of insights into the challenges of data sufficiency and over-fitting avoidance when training deep networks for finegrained cross-domain ranking tasks.

british machine vision conference | 2013

Sketch Recognition by Ensemble Matching of Structured Features.

Yi Li; Yi-Zhe Song; Shaogang Gong

Sketch recognition aims to automatically classify human hand sketches of objects into known categories. This has become increasingly a desirable capability due to recent advances in human computer interaction on portable devices. The problem is nontrivial because of the sparse and abstract nature of hand drawings as compared to photographic images of objects, compounded by a highly variable degree of details in human sketches. To this end, we present a method for the representation and matching of sketches by exploiting not only local features but also global structures of sketches, through a star graph based ensemble matching strategy. Different local feature representations were evaluated using the star graph model to demonstrate the effectiveness of the ensemble matching of structured features. We further show that by encapsulating holistic structure matching and learned bag-of-features models into a single framework, notable recognition performance improvement over the state-of-the-art can be observed. Extensive comparative experiments were carried out using the currently largest sketch dataset released by Eitz et al. [15], with over 20,000 sketches of 250 object categories generated by AMT (Amazon Mechanical Turk) crowd-sourcing.

Computer Vision and Image Understanding | 2015

Free-hand sketch recognition by multi-kernel feature learning ☆

Yi Li; Timothy M. Hospedales; Yi-Zhe Song; Shaogang Gong

Free-hand sketch recognition has become increasingly popular due to the recent expansion of portable touchscreen devices. However, the problem is non-trivial due to the complexity of internal structures that leads to intra-class variations, coupled with the sparsity in visual cues that results in inter-class ambiguities. In order to address the structural complexity, a novel structured representation for sketches is proposed to capture the holistic structure of a sketch. Moreover, to overcome the visual cue sparsity problem and therefore achieve state-of-the-art recognition performance, we propose a Multiple Kernel Learning (MKL) framework for sketch recognition, fusing several features common to sketches. We evaluate the performance of all the proposed techniques on the most diverse sketch dataset to date (Mathias et al., 2012), and offer detailed and systematic analyses of the performance of different features and representations, including a breakdown by sketch-super-category. Finally, we investigate the use of attributes as a high-level feature for sketches and show how this complements low-level features for improving recognition performance under the MKL framework, and consequently explore novel applications such as attribute-based retrieval.

international conference on computer graphics and interactive techniques | 2011

Modeling and generating moving trees from video

Chuan Li; Oliver Deussen; Yi-Zhe Song; Philip J. Willis; Peter M. Hall

We present a probabilistic approach for the automatic production of tree models with convincing 3D appearance and motion. The only input is a video of a moving tree that provides us an initial dynamic tree model, which is used to generate new individual trees of the same type. Our approach combines global and local constraints to construct a dynamic 3D tree model from a 2D skeleton. Our modeling takes into account factors such as the shape of branches, the overall shape of the tree, and physically plausible motion. Furthermore, we provide a generative model that creates multiple trees in 3D, given a single example model. This means that users no longer have to make each tree individually, or specify rules to make new trees. Results with different species are presented and compared to both reference input data and state of the art alternatives.

international conference on image processing | 2016

Sketch-based image retrieval via Siamese convolutional neural network

Yonggang Qi; Yi-Zhe Song; Honggang Zhang; Jun Liu

Sketch-based image retrieval (SBIR) is a challenging task due to the ambiguity inherent in sketches when compared with photos. In this paper, we propose a novel convolutional neural network based on Siamese network for SBIR. The main idea is to pull output feature vectors closer for input sketch-image pairs that are labeled as similar, and push them away if irrelevant. This is achieved by jointly tuning two convolutional neural networks which linked by one loss function. Experimental results on Flickr15K demonstrate that the proposed method offers a better performance when compared with several state-of-the-art approaches.

computer vision and pattern recognition | 2015

Making better use of edges via perceptual grouping

Yonggang Qi; Yi-Zhe Song; Tao Xiang; Honggang Zhang; Timothy M. Hospedales; Yi Li; Jun Guo

We propose a perceptual grouping framework that organizes image edges into meaningful structures and demonstrate its usefulness on various computer vision tasks. Our grouper formulates edge grouping as a graph partition problem, where a learning to rank method is developed to encode probabilities of candidate edge pairs. In particular, RankSVM is employed for the first time to combine multiple Gestalt principles as cue for edge grouping. Afterwards, an edge grouping based object proposal measure is introduced that yields proposals comparable to state-of-the-art alternatives. We further show how human-like sketches can be generated from edge groupings and consequently used to deliver state-of-the-art sketch-based image retrieval performance. Last but not least, we tackle the problem of freehand human sketch segmentation by utilizing the proposed grouper to cluster strokes into semantic object parts.

Structural Health Monitoring-an International Journal | 2014

Virtual visual sensors and their application in structural health monitoring

Yi-Zhe Song; Chris R. Bowen; Alicia H. Kim; Aydin Nassehi; Julian Padget; Nicholas Gathercole

Wireless sensor networks are being increasingly accepted as an effective tool for structural health monitoring. The ability to deploy a wireless array of sensors efficiently and effectively is a key factor in structural health monitoring. Sensor installation and management can be difficult in practice for a variety of reasons: a hostile environment, high labour costs and bandwidth limitations. We present and evaluate a proof-of-concept application of virtual visual sensors to the well-known engineering problem of the cantilever beam, as a convenient physical sensor substitute for certain problems and environments. We demonstrate the effectiveness of virtual visual sensors as a means to achieve non-destructive evaluation. Major benefits of virtual visual sensors are its non-invasive nature, ease of installation and cost-effectiveness. The novelty of virtual visual sensors lies in the combination of marker extraction with visual tracking realised by modern computer vision algorithms. We demonstrate that by deploying a collection of virtual visual sensors on an oscillating structure, its modal shapes and frequencies can be readily extracted from a sequence of video images. Subsequently, we perform damage detection and localisation by means of a wavelet-based analysis. The contributions of this article are as follows: (1) use of a sub-pixel accuracy marker extraction algorithm to construct virtual sensors in the spatial domain, (2) embedding dynamic marker linking within a tracking-by-correspondence paradigm that offers benefits in computational efficiency and registration accuracy over traditional tracking-by-searching systems and (3) validation of virtual visual sensors in the context of a structural health monitoring application.

european conference on computer vision | 2010

Finding semantic structures in image hierarchies using Laplacian graph energy

Yi-Zhe Song; Pablo Andrés Arbeláez; Peter M. Hall; Chuan Li; Anupriya Balikai

Many segmentation algorithms describe images in terms of a hierarchy of regions. Although such hierarchies can produce state of the art segmentations and have many applications, they often contain more data than is required for an efficient description. This paper shows Laplacian graph energy is a generic measure that can be used to identify semantic structures within hierarchies, independently of the algorithm that produces them. Quantitative experimental validation using hierarchies from two state of art algorithms show we can reduce the number of levels and regions in a hierarchy by an order of magnitude with little or no loss in performance when compared against human produced ground truth. We provide a tracking application that illustrates the value of reduced hierarchies.

Explore More