Suman Saha | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Suman Saha is active.

Explore More

Publication

Featured researches published by Suman Saha.

british machine vision conference | 2016

Deep Learning for Detecting Multiple Space-Time Action Tubes in Videos.

Suman Saha; Gurkirt Singh; Michael Sapienza; Philip H. S. Torr; Fabio Cuzzolin

In this work, we propose an approach to the spatiotemporal localisation (detection) and classification of multiple concurrent actions within temporally untrimmed videos. Our framework is composed of three stages. In stage 1, appearance and motion detection networks are employed to localise and score actions from colour images and optical flow. In stage 2, the appearance network detections are boosted by combining them with the motion detection scores, in proportion to their respective spatial overlap. In stage 3, sequences of detection boxes most likely to be associated with a single action instance, called action tubes, are constructed by solving two energy maximisation problems via dynamic programming. While in the first pass, action paths spanning the whole video are built by linking detection boxes over time using their class-specific scores and their spatial overlap, in the second pass, temporal trimming is performed by ensuring label consistency for all constituting detection boxes. We demonstrate the performance of our algorithm on the challenging UCF101, J-HMDB-21 and LIRIS-HARL datasets, achieving new state-of-the-art results across the board and significantly increasing detection speed at test time. We achieve a huge leap forward in action detection performance and report a 20% and 11% gain in mAP (mean average precision) on UCF-101 and J-HMDB-21 datasets respectively when compared to the state-of-the-art.

international conference on computer vision | 2017

Online Real-Time Multiple Spatiotemporal Action Localisation and Prediction

Gurkirt Singh; Suman Saha; Michael Sapienza; Philip H. S. Torr; Fabio Cuzzolin

We present a deep-learning framework for real-time multiple spatio-temporal (S/T) action localisation and classification. Current state-of-the-art approaches work offline, and are too slow to be useful in real-world settings. To overcome their limitations we introduce two major developments. Firstly, we adopt real-time SSD (Single Shot Multi-Box Detector) CNNs to regress and classify detection boxes in each video frame potentially containing an action of interest. Secondly, we design an original and efficient online algorithm to incrementally construct and label ‘action tubes’ from the SSD frame level detections. As a result, our system is not only capable of performing S/T detection in real time, but can also perform early action prediction in an online fashion. We achieve new state-of-the-art results in both S/T action localisation and early action prediction on the challenging UCF101-24 and J-HMDB-21 benchmarks, even when compared to the top offline competitors. To the best of our knowledge, ours is the first real-time (up to 40fps) system able to perform online S/T action localisation on the untrimmed videos of UCF101-24.

international conference on computer vision | 2017

AMTnet: Action-Micro-Tube Regression by End-to-end Trainable Deep Architecture

Suman Saha; Gurkirt Singh; Fabio Cuzzolin

Dominant approaches to action detection can only provide sub-optimal solutions to the problem, as they rely on seeking frame-level detections, to later compose them into ‘action tubes’ in a post-processing step. With this paper we radically depart from current practice, and take a first step towards the design and implementation of a deep network architecture able to classify and regress whole video subsets, so providing a truly optimal solution of the action detection problem. In this work, in particular, we propose a novel deep net framework able to regress and classify 3D region proposals spanning two successive video frames, whose core is an evolution of classical region proposal networks (RPNs). As such, our 3D-RPN net is able to effectively encode the temporal aspect of actions by purely exploiting appearance, as opposed to methods which heavily rely on expensive flow maps. The proposed model is end-to-end trainable and can be jointly optimised for action localisation and classification in a single step. At test time the network predicts ‘micro-tubes’ encompassing two successive frames, which are linked up into complete action tubes via a new algorithm which exploits the temporal encoding learned by the network and cuts computation time by 50%. Promising results on the J-HMDB-21 and UCF-101 action detection datasets show that our model does outperform the state-of-the-art when relying purely on appearance.

Gait & Posture | 2017

Metric learning for Parkinsonian identification from IMU gait measurements

Fabio Cuzzolin; Michael Sapienza; Patrick Esser; Suman Saha; Miss Marloes Franssen; Johnny Collett; Helen Dawes

Diagnosis of people with mild Parkinsons symptoms is difficult. Nevertheless, variations in gait pattern can be utilised to this purpose, when measured via Inertial Measurement Units (IMUs). Human gait, however, possesses a high degree of variability across individuals, and is subject to numerous nuisance factors. Therefore, off-the-shelf Machine Learning techniques may fail to classify it with the accuracy required in clinical trials. In this paper we propose a novel framework in which IMU gait measurement sequences sampled during a 10m walk are first encoded as hidden Markov models (HMMs) to extract their dynamics and provide a fixed-length representation. Given sufficient training samples, the distance between HMMs which optimises classification performance is learned and employed in a classical Nearest Neighbour classifier. Our tests demonstrate how this technique achieves accuracy of 85.51% over a 156 people with Parkinsons with a representative range of severity and 424 typically developed adults, which is the top performance achieved so far over a cohort of such size, based on single measurement outcomes. The method displays the potential for further improvement and a wider application to distinguish other conditions.

Archive | 2016