Sareh Shirazi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sareh Shirazi is active.

Explore More

Publication

Featured researches published by Sareh Shirazi.

computer vision and pattern recognition | 2011

Graph embedding discriminant analysis on Grassmannian manifolds for improved image set matching

Mehrtash Tafazzoli Harandi; Conrad Sanderson; Sareh Shirazi; Brian C. Lovell

A convenient way of dealing with image sets is to represent them as points on Grassmannian manifolds. While several recent studies explored the applicability of discriminant analysis on such manifolds, the conventional formalism of discriminant analysis suffers from not considering the local structure of the data. We propose a discriminant analysis approach on Grassmannian manifolds, based on a graph-embedding framework. We show that by introducing within-class and between-class similarity graphs to characterise intra-class compactness and inter-class separability, the geometrical structure of data can be exploited. Experiments on several image datasets (PIE, BANCA, MoBo, ETH-80) show that the proposed algorithm obtains considerable improvements in discrimination accuracy, in comparison to three recent methods: Grassmann Discriminant Analysis (GDA), Kernel GDA, and the kernel version of Affine Hull Image Set Distance. We further propose a Grassmannian kernel, based on canonical correlation between subspaces, which can increase discrimination accuracy when used in combination with previous Grassmannian kernels.

intelligent robots and systems | 2015

On the performance of ConvNet features for place recognition

Niko Sünderhauf; Sareh Shirazi; Feras Dayoub; Ben Upcroft; Michael Milford

After the incredible success of deep learning in the computer vision domain, there has been much interest in applying Convolutional Network (ConvNet) features in robotic fields such as visual navigation and SLAM. Unfortunately, there are fundamental differences and challenges involved. Computer vision datasets are very different in character to robotic camera data, real-time performance is essential, and performance priorities can be different. This paper comprehensively evaluates and compares the utility of three state-of-the-art ConvNets on the problems of particular relevance to navigation for robots; viewpoint-invariance and condition-invariance, and for the first time enables real-time place recognition performance using ConvNets with large maps by integrating a variety of existing (locality-sensitive hashing) and novel (semantic search space partitioning) optimization techniques. We present extensive experiments on four real world datasets cultivated to evaluate each of the specific challenges in place recognition. The results demonstrate that speed-ups of two orders of magnitude can be achieved with minimal accuracy degradation, enabling real-time performance. We confirm that networks trained for semantic place categorization also perform better at (specific) place recognition when faced with severe appearance changes and provide a reference for which networks and layers are optimal for different aspects of the place recognition problem.

robotics science and systems | 2015

Place Recognition with ConvNet Landmarks: Viewpoint-Robust, Condition-Robust, Training-Free

Niko Suenderhauf; Sareh Shirazi; Adam Jacobson; Feras Dayoub; Edward Pepperell; Ben Upcroft; Michael Milford

Place recognition has long been an incompletely solved problem in that all approaches involve significant compromises. Current methods address many but never all of the critical challenges of place recognition – viewpoint-invariance, condition-invariance and minimizing training requirements. Here we present an approach that adapts state-of-the-art object proposal techniques to identify potential landmarks within an image for place recognition. We use the astonishing power of convolutional neural network features to identify matching landmark proposals between images to perform place recognition over extreme appearance and viewpoint variations. Our system does not require any form of training, all components are generic enough to be used off-the-shelf. We present a range of challenging experiments in varied viewpoint and environmental conditions. We demonstrate superior performance to current state-of-the- art techniques. Furthermore, by building on existing and widely used recognition frameworks, this approach provides a highly compatible place recognition system with the potential for easy integration of other techniques such as object detection and semantic scene interpretation.

international conference on image processing | 2012

Clustering on Grassmann manifolds via kernel embedding with application to action analysis

Sareh Shirazi; Mehrtash Tafazzoli Harandi; Conrad Sanderson; Azadeh Alavi; Brian C. Lovell

With the aim of improving the clustering of data (such as image sequences) lying on Grassmann manifolds, we propose to embed the manifolds into Reproducing Kernel Hilbert Spaces. To this end, we define a measure of cluster distortion and embed the manifolds such that the distortion is minimised. We show that the optimal solution is a generalised eigenvalue problem that can be solved very efficiently. Experiments on several clustering tasks (including human action clustering) show that in comparison to the recent intrinsic Grassmann k-means algorithm, the proposed approach obtains notable improvements in clustering accuracy, while also being several orders of magnitude faster.

computer vision and pattern recognition | 2015

Sequence searching with deep-learnt depth for condition- and viewpoint-invariant route-based place recognition

Michael Milford; Stephanie M. Lowry; Niko Sünderhauf; Sareh Shirazi; Edward Pepperell; Ben Upcroft; Chunhua Shen; Guosheng Lin; Fayao Liu; Cesar Cadena; Ian D. Reid

Vision-based localization on robots and vehicles remains unsolved when extreme appearance change and viewpoint change are present simultaneously. The current state of the art approaches to this challenge either deal with only one of these two problems; for example FAB-MAP (viewpoint invariance) or SeqSLAM (appearance-invariance), or use extensive training within the test environment, an impractical requirement in many application scenarios. In this paper we significantly improve the viewpoint invariance of the SeqSLAM algorithm by using state-of-the-art deep learning techniques to generate synthetic viewpoints. Our approach is different to other deep learning approaches in that it does not rely on the ability of the CNN network to learn invariant features, but only to produce“good enough” depth images from day-time imagery only. We evaluate the system on a new multi-lane day-night car dataset specifically gathered to simultaneously test both appearance and viewpoint change. Results demonstrate that the use of synthetic viewpoints improves the maximum recall achieved at 100% precision by a factor of 2.2 and maximum recall by a factor of 2.7, enabling correct place recognition across multiple road lanes and significantly reducing the time between correct localizations.

workshop on applications of computer vision | 2014

Object tracking via non-Euclidean geometry: A Grassmann approach

Sareh Shirazi; Mehrtash Tafazzoli Harandi; Brian C. Lovell; Conrad Sanderson

A robust visual tracking system requires an object appearance model that is able to handle occlusion, pose, and illumination variations in the video stream. This can be difficult to accomplish when the model is trained using only a single image. In this paper, we first propose a tracking approach based on affine subspaces (constructed from several images) which are able to accommodate the above-mentioned variations. We use affine subspaces not only to represent the object, but also the candidate areas that the object may occupy. We furthermore propose a novel approach to measure affine subspace-to-subspace distance via the use of non-Euclidean geometry of Grassmann manifolds. The tracking problem is then considered as an inference task in a Markov Chain Monte Carlo framework via particle filtering. Quantitative evaluation on challenging video sequences indicates that the proposed approach obtains considerably better performance than several recent state-of-the-art methods such as Tracking-Learning-Detection and MILtrack.

international symposium on communications, control and signal processing | 2008

A new approach in super resolution based on an adaptive regularization parameter

Sareh Shirazi; Mehran Yazdi

Super-resolution image reconstruction has been one of the most important research areas in recent years which goals to obtain a high resolution (HR) image from several low resolution (LR) blurred, noisy, under sampled and displaced images. Relation of the HR image and LR images can be modeled by a linear system using a transformation matrix and additive noise. However, a unique solution may not be available because of the singularity of transformation matrix. To overcome this ill- posed problem, stochastic methods such as ML and MAP have been introduced. However, their performance is not good because the effect of noise energy has been ignored. In this paper, we propose an adaptive regularization approach based on the fact that the regularization parameter should be a linear function of noise variance. The performance of the proposed approach has been tested on several images and the obtained results demonstrate the superiority of our approach compared with existing methods.

digital image computing techniques and applications | 2015

Bags of Affine Subspaces for Robust Object Tracking

Sareh Shirazi; Conrad Sanderson; Christopher McCool; Mehrtash Tafazzoli Harandi

We propose an adaptive tracking algorithm where the object is modelled as a continuously updated bag of affine subspaces, with each subspace constructed from the objects appearance over several consecutive frames. In contrast to linear subspaces, affine subspaces explicitly model the origin of subspaces. Furthermore, instead of using a brittle point-to-subspace distance during the search for the object in a new frame, we propose to use a subspace-to-subspace distance by representing candidate image areas also as affine subspaces. Distances between subspaces are then obtained by exploiting the non-Euclidean geometry of Grassmann manifolds. Experiments on challenging videos (containing object occlusions, deformations, as well as variations in pose and illumination) indicate that the proposed method achieves higher tracking accuracy than several recent discriminative trackers.

Archive | 2013

Graph-Embedding Discriminant Analysis on Riemannian Manifolds for Visual Recognition

Sareh Shirazi; Azadeh Alavi; Mehrtash Tafazzoli Harandi; Brian C. Lovell

Recently, several studies have utilised non-Euclidean geometry to address several computer vision problems including object tracking [17], characterising the diffusion of water molecules as in diffusion tensor imaging [24], face recognition [23, 31], human re-identification [4], texture classification [16], pedestrian detection [39] and action recognition [22, 43].

computer vision and pattern recognition | 2017

Unsupervised Human Action Detection by Action Matching

Basura Fernando; Sareh Shirazi; Stephen Gould

We propose a new task of unsupervised action detection by action matching. Given two long videos, the objective is to temporally detect all pairs of matching video segments. A pair of video segments are matched if they share the same human action. The task is category independent—it does not matter what action is being performed—and no supervision is used to discover such video segments. Unsupervised action detection by action matching allows us to align videos in a meaningful manner. As such, it can be used to discover new action categories or as an action proposal technique within, say, an action detection pipeline. Moreover, it is a useful pre-processing step for generating video highlights, e.g., from sports videos.,,,,,, We present an effective and efficient method for unsupervised action detection. We use an unsupervised temporal encoding method and exploit the temporal consistency in human actions to obtain candidate action segments. We evaluate our method on this challenging task using three activity recognition benchmarks, namely, the MPII Cooking activities dataset, the THUMOS15 action detection benchmark and a new dataset called the IKEA dataset. On the MPII Cooking dataset we detect action segments with a precision of 21.6% and recall of 11.7% over 946 long video pairs and over 5000 ground truth action segments. Similarly, on THUMOS dataset we obtain 18.4% precision and 25.1% recall over 5094 ground truth action segment pairs.

Explore More