Colin Lea | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Colin Lea is active.

Explore More

Publication

Featured researches published by Colin Lea.

computer vision and pattern recognition | 2017

Temporal Convolutional Networks for Action Segmentation and Detection

Colin Lea; Michael D. Flynn; René Vidal; Austin Reiter; Gregory D. Hager

The ability to identify and temporally segment fine-grained human actions throughout a video is crucial for robotics, surveillance, education, and beyond. Typical approaches decouple this problem by first extracting local spatiotemporal features from video frames and then feeding them into a temporal classifier that captures high-level temporal patterns. We describe a class of temporal models, which we call Temporal Convolutional Networks (TCNs), that use a hierarchy of temporal convolutions to perform fine-grained action segmentation or detection. Our Encoder-Decoder TCN uses pooling and upsampling to efficiently capture long-range temporal patterns whereas our Dilated TCN uses dilated convolutions. We show that TCNs are capable of capturing action compositions, segment durations, and long-range dependencies, and are over a magnitude faster to train than competing LSTM-based Recurrent Neural Networks. We apply these models to three challenging fine-grained datasets and show large improvements over the state of the art.

intelligent robots and systems | 2011

Comparative evaluation of range sensing technologies for underground void modeling

Uland Wong; Aaron Morris; Colin Lea; James Lee; Chuck Whittaker; Ben Garney; Red Whittaker

This paper compares a broad cross-section of range sensing technologies for underground void modeling. In this family of applications, a tunnel environment is incrementally mapped with range sensors from a mobile robot to recover scene geometry. Distinguishing contributions of this work include an unprecedented number of configurations evaluated utilizing common methodology and metrics as well as a significant in situ environmental component lacking in prior characterization work. Sensors are experimentally compared against both an ideal geometric target and in example void environments such as a mine and underground tunnel. Three natural groupings of sensors were identified from these results and performances were found to be strongly cost-correlated. While the results presented are specific to the experimental configurations tested, the generality of tunnel environments and the metrics of reconstruction are extensible to a spectrum of outdoor and surface applications.

workshop on applications of computer vision | 2015

An Improved Model for Segmentation and Recognition of Fine-Grained Activities with Application to Surgical Training Tasks

Colin Lea; Gregory D. Hager; René Vidal

Automated segmentation and recognition of fine-grained activities is important for enabling new applications in industrial automation, human-robot collaboration, and surgical training. Many existing approaches to activity recognition assume that a video has already been segmented and perform classification using an abstract representation based on spatio-temporal features. While some approaches perform joint activity segmentation and recognition, they typically suffer from a poor modeling of the transitions between actions and a representation that does not incorporate contextual information about the scene. In this paper, we propose a model for action segmentation and recognition that improves upon existing work in two directions. First, we develop a variation of the Skip-Chain Conditional Random Field that captures long-range state transitions between actions by using higher-order temporal relationships. Second, we argue that in constrained environments, where the relevant set of objects is known, it is better to develop features using high-level object relationships that have semantic meaning instead of relying on abstract features. We apply our approach to a set of tasks common for training in robotic surgery: suturing, knot tying, and needle passing, and show that our method increases micro and macro accuracy by 18.46% and 44.13% relative to the state of the art on a widely used robotic surgery dataset.

IEEE Transactions on Biomedical Engineering | 2017

A Dataset and Benchmarks for Segmentation and Recognition of Gestures in Robotic Surgery

Narges Ahmidi; Lingling Tao; Shahin Sefati; Yixin Gao; Colin Lea; Benjamin Bejar Haro; Luca Zappella; Sanjeev Khudanpur; René Vidal; Gregory D. Hager

Objective: State-of-the-art techniques for surgical data analysis report promising results for automated skill assessment and action recognition. The contributions of many of these techniques, however, are limited to study-specific data and validation metrics, making assessment of progress across the field extremely challenging. Methods: In this paper, we address two major problems for surgical data analysis: First, lack of uniform-shared datasets and benchmarks, and second, lack of consistent validation processes. We address the former by presenting the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), a public dataset that we have created to support comparative research benchmarking. JIGSAWS contains synchronized video and kinematic data from multiple performances of robotic surgical tasks by operators of varying skill. We address the latter by presenting a well-documented evaluation methodology and reporting results for six techniques for automated segmentation and classification of time-series data on JIGSAWS. These techniques comprise four temporal approaches for joint segmentation and classification: hidden Markov model, sparse hidden Markov model (HMM), Markov semi-Markov conditional random field, and skip-chain conditional random field; and two feature-based ones that aim to classify fixed segments: bag of spatiotemporal features and linear dynamical systems. Results: Most methods recognize gesture activities with approximately 80% overall accuracy under both leave-one-super-trial-out and leave-one-user-out cross-validation settings. Conclusion: Current methods show promising results on this shared dataset, but room for significant progress remains, particularly for consistent prediction of gesture activities across different surgeons. Significance: The results reported in this paper provide the first systematic and uniform evaluation of surgical activity recognition techniques on the benchmark database.

european conference on computer vision | 2016

Segmental Spatiotemporal CNNs for Fine-Grained Action Segmentation

Colin Lea; Austin Reiter; René Vidal; Gregory D. Hager

Joint segmentation and classification of fine-grained actions is important for applications of human-robot interaction, video surveillance, and human skill evaluation. However, despite substantial recent progress in large-scale action classification, the performance of state-of-the-art fine-grained action recognition approaches remains low. We propose a model for action segmentation which combines low-level spatiotemporal features with a high-level segmental classifier. Our spatiotemporal CNN is comprised of a spatial component that represents relationships between objects and a temporal component that uses large 1D convolutional filters to capture how object relationships change across time. These features are used in tandem with a semi-Markov model that captures transitions from one action to another. We introduce an efficient constrained segmental inference algorithm for this model that is orders of magnitude faster than the current approach. We highlight the effectiveness of our Segmental Spatiotemporal CNN on cooking and surgical action datasets for which we observe substantially improved performance relative to recent baseline methods.

The International Journal of Robotics Research | 2017

Transition State Clustering: Unsupervised Surgical Trajectory Segmentation for Robot Learning

Sanjay Krishnan; Animesh Garg; Sachin Patil; Colin Lea; Gregory D. Hager; Pieter Abbeel; Ken Goldberg

A large and growing corpus of synchronized kinematic and video recordings of robot-assisted surgery has the potential to facilitate training and subtask automation. One of the challenges in segmenting such multi-modal trajectories is that demonstrations vary spatially, temporally, and contain random noise and loops (repetition until achieving the desired result). Segments of task trajectories are often less complex, less variable, and allow for easier detection of outliers. As manual segmentation can be tedious and error-prone, we propose a new segmentation method that combines hybrid dynamical systems theory and Bayesian non-parametric statistics to automatically segment demonstrations. Transition State Clustering (TSC) models demonstrations as noisy realizations of a switched linear dynamical system, and learns spatially and temporally consistent transition events across demonstrations. TSC uses a hierarchical Dirichlet Process Gaussian Mixture Model to avoid having to select the number of segments a priori. After a series of merging and pruning steps, the algorithm adaptively optimizes the number of segments. In a synthetic case study with two linear dynamical regimes, where demonstrations are corrupted with noise and temporal variations, TSC finds up to a 20% more accurate segmentation than GMM-based alternatives. On 67 recordings of surgical needle passing and suturing tasks from the JIGSAWS surgical training dataset [7], supplemented with manually annotated visual features, TSC finds 83% of needle passing segments and 73% of the suturing segments found by human experts. Qualitatively, TSC also identifies transitions overlooked by human annotators.

medical image computing and computer assisted intervention | 2016

Recognizing Surgical Activities with Recurrent Neural Networks

Robert S. DiPietro; Colin Lea; Anand Malpani; Narges Ahmidi; S. Swaroop Vedula; Gyusung I. Lee; Mija R. Lee; Gregory D. Hager

We apply recurrent neural networks to the task of recognizing surgical activities from robot kinematics. Prior work in this area focuses on recognizing short, low-level activities, or gestures, and has been based on variants of hidden Markov models and conditional random fields. In contrast, we work on recognizing both gestures and longer, higher-level activites, or maneuvers, and we model the mapping from kinematics to gestures/maneuvers with recurrent neural networks. To our knowledge, we are the first to apply recurrent neural networks to this task. Using a single model and a single set of hyperparameters, we match state-of-the-art performance for gesture recognition and advance state-of-the-art performance for maneuver recognition, in terms of both accuracy and edit distance. Code is available at this https URL .

international conference on robotics and automation | 2015

A framework for end-user instruction of a robot assistant for manufacturing

Kelleher Guerin; Colin Lea; Chris Paxton; Gregory D. Hager

Small Manufacturing Entities (SMEs) have not incorporated robotic automation as readily as large companies due to rapidly changing product lines, complex and dexterous tasks, and the high cost of start-up. While recent low-cost robots such as the Universal Robots UR5 and Rethink Robotics Baxter are more economical and feature improved programming interfaces, based on our discussions with manufacturers further incorporation of robots into the manufacturing work flow is limited by the ability of these systems to generalize across tasks and handle environmental variation. Our goal is to create a system designed for small manufacturers that contains a set of capabilities useful for a wide range of tasks, is both powerful and easy to use, allows for perceptually grounded actions, and is able to accumulate, abstract, and reuse plans that have been taught. We present an extension to Behavior Trees that allows for representing the system capabilities of a robot as a set of generalizable operations that are exposed to an end-user for creating task plans. We implement this framework in CoSTAR, the Collaborative System for Task Automation and Recognition, and demonstrate its effectiveness with two case studies. We first perform a complex tool-based object manipulation task in a laboratory setting. We then show the deployment of our system in an SME where we automate a machine tending task that was not possible with current off the shelf robots.

european conference on computer vision | 2016

Temporal Convolutional Networks: A Unified Approach to Action Segmentation

Colin Lea; René Vidal; Austin Reiter; Gregory D. Hager

The dominant paradigm for video-based action segmentation is composed of two steps: first, compute low-level features for each frame using Dense Trajectories or a Convolutional Neural Network to encode local spatiotemporal information, and second, input these features into a classifier such as a Recurrent Neural Network (RNN) that captures high-level temporal relationships. While often effective, this decoupling requires specifying two separate models, each with their own complexities, and prevents capturing more nuanced long-range spatiotemporal relationships. We propose a unified approach, as demonstrated by our Temporal Convolutional Network (TCN), that hierarchically captures relationships at low-, intermediate-, and high-level time-scales. Our model achieves superior or competitive performance using video or sensor data on three public action segmentation datasets and can be trained in a fraction of the time it takes to train an RNN.

international conference on robotics and automation | 2016

Learning convolutional action primitives for fine-grained action recognition

Colin Lea; René Vidal; Gregory D. Hager

Fine-grained action recognition is important for many applications of human-robot interaction, automated skill assessment, and surveillance. The goal is to segment and classify all actions occurring in a time series sequence. While recent recognition methods have shown strong performance in robotics applications, they often require hand-crafted features, use large amounts of domain knowledge, or employ overly simplistic representations of how objects change throughout an action. In this paper we present the Latent Convolutional Skip Chain Conditional Random Field (LC-SC-CRF). This time series model learns a set of interpretable and composable action primitives from sensor data. We apply our model to cooking tasks using accelerometer data from the University of Dundee 50 Salads dataset and to robotic surgery training tasks using robot kinematic data from the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS). Our performance on 50 Salads and JIGSAWS are 18.0% and 5.3% higher than the state of the art, respectively. This model performs well without requiring hand-crafted features or intricate domain knowledge. The code and features have been made public.

Explore More