Edward Johns
Imperial College London
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Edward Johns.
international conference on robotics and automation | 2013
Edward Johns; Guang-Zhong Yang
In this paper we present a new method, Feature Co-occurrence Maps, for appearance-based localisation over the course of a day. We show that by quantising local features in both feature and image space, discriminative statistics can be learned on the co-occurrences of features at different times of the day. This allows for matching at any time, without requiring individual images to be stored representing each time of day, and matching is performed efficiently by simultaneously matching to the entire database. We further show how matching along image sequences can be incorporated into the system and adapt existing methods by allowing for non-zero acceleration. Results on a 20km outdoor dataset show improved performance in precision-recall over state of the art.
wearable and implantable body sensor networks | 2012
Jindong Liu; Edward Johns; Louis Atallah; Claire Pettitt; Benny Lo; Gary Frost; Guang-Zhong Yang
The prevalence of obesity worldwide presents a great challenge to existing healthcare systems. There is a general need for pervasive monitoring of the dietary behaviour of those who are at risk of co-morbidities. Currently, however, there is no accurate method of assessing the nutritional intake of people in their home environment. Traditional methods require subjects to manually respond to questionnaires for analysis, which is subjective, prone to errors, and difficult to ensure consistency and compliance. In this paper, we present a wearable sensor platform that autonomously provides detailed information regarding a subjects dietary habits. The sensor consists of a microphone and a camera and is worn discretely on the ear. Sound features are extracted in real-time and if a chewing activity is classified, the camera captures a video sequence for further analysis. From this sequence, a number of key frames are extracted to represent important episodes during the course of a meal. Results show a high classification rate of chewing activities, and the visual log demonstrates a detailed overview of the subjects food intake that is difficult to quantify from manually-acquired food records.
computer vision and pattern recognition | 2016
Edward Johns; Stefan Leutenegger; Andrew J. Davison
A multi-view image sequence provides a much richer capacity for object recognition than from a single image. However, most existing solutions to multi-view recognition typically adopt hand-crafted, model-based geometric methods, which do not readily embrace recent trends in deep learning. We propose to bring Convolutional Neural Networks to generic multi-view recognition, by decomposing an image sequence into a set of image pairs, classifying each pair independently, and then learning an object classifier by weighting the contribution of each pair. This allows for recognition over arbitrary camera trajectories, without requiring explicit training over the potentially infinite number of camera paths and lengths. Building these pairwise relationships then naturally extends to the next-best-view problem in an active recognition framework. To achieve this, we train a second Convolutional Neural Network to map directly from an observed image to next viewpoint. Finally, we incorporate this into a trajectory optimisation task, whereby the best recognition confidence is sought for a given trajectory length. We present state-of-the-art results in both guided and unguided multi-view recognition on the ModelNet dataset, and show how our method can be used with depth images, greyscale images, or both.
intelligent robots and systems | 2016
Edward Johns; Stefan Leutenegger; Andrew J. Davison
This paper presents a new method for parallel-jaw grasping of isolated objects from depth images, under large gripper pose uncertainty. Whilst most approaches aim to predict the single best grasp pose from an image, our method first predicts a score for every possible grasp pose, which we denote the grasp function. With this, it is possible to achieve grasping robust to the grippers pose uncertainty, by smoothing the grasp function with the pose uncertainty function. Therefore, if the single best pose is adjacent to a region of poor grasp quality, that pose will no longer be chosen, and instead a pose will be chosen which is surrounded by a region of high grasp quality. To learn this function, we train a Convolutional Neural Network which takes as input a single depth image of an object, and outputs a score for each grasp pose across the image. Training data for this is generated by use of physics simulation and depth image simulation with 3D object meshes, to enable acquisition of sufficient data without requiring exhaustive real-world experiments. We evaluate with both synthetic and real experiments, and show that the learned grasp score is more robust to gripper pose uncertainty than when this uncertainty is not accounted for.
International Journal of Computer Vision | 2014
Edward Johns; Guang-Zhong Yang
This paper proposes a new framework for visual place recognition that incrementally learns models of each place and offers adaptability to dynamic elements in the scene. Traditional Bag-Of-Words (BOW) image-retrieval approaches to place recognition typically treat images in a holistic manner and are not capable of dealing with sub-scene dynamics, such as structural changes to a building façade or seasonal effects on foliage. However, by treating local features as observations of real-world landmarks in a scene that is observed repeatedly over a period of time, such dynamics can be modelled at a local level, and the spatio-temporal properties of each landmark can be independently updated incrementally. The method proposed models each place as a set of such landmarks and their geometric relationships. A new BOW filtering stage and geometric verification scheme are introduced to compute a similarity score between a query image and each scene model. As further training images are acquired for each place, the landmark properties are updated over time and in the long term, the model can adapt to dynamic behaviour in the scene. Results on an outdoor dataset of images captured along a 7 km path, over a period of 5 months, show an improvement in recognition performance when compared to state-of-the-art image retrieval approaches to place recognition.
international conference on computer vision | 2011
Edward Johns; Guang-Zhong Yang
The recognition of a place depicted in an image typically adopts methods from image retrieval in large-scale databases. First, a query image is described as a “bag-of-features” and compared to every image in the database. Second, the most similar images are passed to a geometric verification stage. However, this is an inefficient approach when considering that some database images may be almost identical, and many image features may not repeatedly occur. We address this issue by clustering similar database images to represent distinct scenes, and tracking local features that are consistently detected to form a set of real-world landmarks. Query images are then matched to landmarks rather than features, and a probabilistic model of landmark properties is learned from the cluster to appropriately verify or reject putative feature matches. We present novelties in both a bag-of-features retrieval and geometric verification stage based on this concept. Results on a database of 200K images of popular tourist destinations show improvements in both recognition performance and efficiency compared to traditional image retrieval methods.
international conference on robotics and automation | 2013
Edward Johns; Guang-Zhong Yang
In this paper we present a new appearance-based localisation system that is able to deal with dynamic elements in the scene. By independently modelling the properties of local features observed in a scene over long periods of time, we show that feature appearances and geometric relationships can be learned more accurately than when representing a location by a single image. We also present a new dataset consisting of a 6 km outdoor path traversed once per month for a period of 5 months, which contains several challenges including short-term and long-term dynamic behaviour, lateral deviations in the path, repetitive scene appearances and strong illumination changes. We show superior performance of the dynamic mapping system compared to state-of-the-art techniques on our dataset.
european conference on computer vision | 2014
Edward Johns; Guang-Zhong Yang
Place recognition currently suffers from a lack of scalability due to the need for strong geometric constraints, which as of yet are typically limited to RANSAC implementations. In this paper, we present a method to successfully achieve state-of-the-art performance, in both recognition accuracy and speed, without the need for RANSAC. We propose to discretise each feature pair in an image, in both appearance and 2D geometry, to create a triplet of words: one each for the appearance of the two features, and one for the pairwise geometry. This triplet is then passed through an inverted index to find examples of such pairwise configurations in the database. Finally, a global geometry constraint is enforced by considering the maximum-clique in an adjacency graph of pairwise correspondences. The discrete nature of the problem allows for tractable probabilistic scores to be assigned to each correspondence, and the least informative feature pairs can be eliminated from the database for memory and time efficiency. We demonstrate the performance of our method on several large-scale datasets, and show improvements over several baselines.
international conference on robotics and automation | 2011
Edward Johns; Guang-Zhong Yang
Vision-based topological maps for mobile robot localization traditionally consist of a set of images captured along a path, with a query image then compared to every individual map image. This paper introduces a new approach to topological mapping, whereby the map consists of a set of landmarks that are detected across multiple images, spanning the continuous space between nodal images. Matches are then made to landmarks, rather than to individual images, enabling a topological map of far greater density than traditionally possible, without sacrificing computational speed. Furthermore, by treating each landmark independently, a probabilistic approach to localization can be employed by taking into account the learned discriminative properties of each landmark. An optimization stage is then used to adjust the map according to speed and localization accuracy requirements. Results for global localization show a greater positive location identification rate compared to the traditional topological map, together with enabling a greater localization resolution in the denser topological map, without requiring a decrease in frame rate.
medical image computing and computer assisted intervention | 2014
Menglong Ye; Edward Johns; Stamatia Giannarou; Guang-Zhong Yang
Endoscopic surveillance is a widely used method for monitoring abnormal changes in the gastrointestinal tract such as Barretts esophagus. Direct visual assessment, however, is both time consuming and error prone, as it involves manual labelling of abnormalities on a large set of images. To assist surveillance, this paper proposes an online scene association scheme to summarise an endoscopic video into scenes, on-the-fly. This provides scene clustering based on visual contents, and also facilitates topological localisation during navigation. The proposed method is based on tracking and detection of visual landmarks on the tissue surface. A generative model is proposed for online learning of pairwise geometrical relationships between landmarks. This enables robust detection of landmarks and scene association under tissue deformation. Detailed experimental comparison and validation have been conducted on in vivo endoscopic videos to demonstrate the practical value of our approach.