Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Du Tran is active.

Publication


Featured researches published by Du Tran.


international conference on computer vision | 2015

Learning Spatiotemporal Features with 3D Convolutional Networks

Du Tran; Lubomir D. Bourdev; Rob Fergus; Lorenzo Torresani; Manohar Paluri

We propose a simple, yet effective approach for spatiotemporal feature learning using deep 3-dimensional convolutional networks (3D ConvNets) trained on a large scale supervised video dataset. Our findings are three-fold: 1) 3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets, 2) A homogeneous architecture with small 3x3x3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets, and 3) Our learned features, namely C3D (Convolutional 3D), with a simple linear classifier outperform state-of-the-art methods on 4 different benchmarks and are comparable with current best methods on the other 2 benchmarks. In addition, the features are compact: achieving 52.8% accuracy on UCF101 dataset with only 10 dimensions and also very efficient to compute due to the fast inference of ConvNets. Finally, they are conceptually very simple and easy to train and use.


european conference on computer vision | 2008

Human Activity Recognition with Metric Learning

Du Tran; Alexander Sorokin

This paper proposes a metric learning based approach for human activity recognition with two main objectives: (1) reject unfamiliar activities and (2) learn with few examples. We show that our approach outperforms all state-of-the-art methods on numerous standard datasets for traditional action classification problem. Furthermore, we demonstrate that our method not only can accurately label activities but also can reject unseen activities and can learn from few examples with high accuracy. We finally show that our approach works well on noisy YouTube videos.


computer vision and pattern recognition | 2011

Optimal spatio-temporal path discovery for video event detection

Du Tran; Junsong Yuan

We propose a novel algorithm for video event detection and localization as the optimal path discovery problem in spatio-temporal video space. By finding the optimal spatio-temporal path, our method not only detects the starting and ending points of the event, but also accurately locates it in each video frame. Moreover, our method is robust to the scale and intra-class variations of the event, as well as false and missed local detections, therefore improves the overall detection and localization accuracy. The proposed search algorithm obtains the global optimal solution with proven lowest computational complexity. Experiments on realistic video datasets demonstrate that our proposed method can be applied to different types of event detection tasks, such as abnormal event detection and walking pedestrian detection.


computer vision and pattern recognition | 2016

Deep End2End Voxel2Voxel Prediction

Du Tran; Lubomir D. Bourdev; Rob Fergus; Lorenzo Torresani; Manohar Paluri

Over the last few years deep learning methods have emerged as one of the most prominent approaches for video analysis. However, so far their most successful applications have been in the area of video classification and detection, i.e., problems involving the prediction of a single class label or a handful of output variables per video. Furthermore, while deep networks are commonly recognized as the best models to use in these domains, there is a widespread perception that in order to yield successful results they often require time-consuming architecture search, manual tweaking of parameters and computationally intensive preprocessing or post-processing methods. In this paper we challenge these views by presenting a deep 3D convolutional architecture trained end to end to perform voxel-level prediction, i.e., to output a variable at every voxel of the video. Most importantly, we show that the same exact architecture can be used to achieve competitive results on three widely different voxel-prediction tasks: video semantic segmentation, optical flow estimation, and video coloring. The three networks learned on these problems are trained from raw video without any form of preprocessing and their outputs do not require post-processing to achieve outstanding performance. Thus, they offer an efficient alternative to traditional and much more computationally expensive methods in these video domains.


International Journal of Computer Vision | 2016

EXMOVES: Mid-level Features for Efficient Action Recognition and Video Analysis

Du Tran; Lorenzo Torresani

In this paper we present EXMOVES—learned exemplar-based features for efficient recognition and analysis of actions in videos. The entries in our descriptor are produced by evaluating a set of movement classifiers over spatial-temporal volumes of the input video sequences. Each movement classifier is a simple exemplar-SVM trained on low-level features, i.e., an SVM learned using a single annotated positive space-time volume and a large number of unannotated videos. Our representation offers several advantages. First, since our mid-level features are learned from individual video exemplars, they require minimal amount of supervision. Second, we show that simple linear classification models trained on our global video descriptor yield action recognition accuracy approaching the state-of-the-art but at orders of magnitude lower cost, since at test-time no sliding window is necessary and linear models are efficient to train and test. This enables scalable action recognition, i.e., efficient classification of a large number of actions even in massive video databases. Third, we show the generality of our approach by training our mid-level descriptors from different low-level features and testing them on two distinct video analysis tasks: human activity recognition as well as action similarity labeling. Experiments on large-scale benchmarks demonstrate the accuracy and efficiency of our proposed method on both these tasks.


Archive | 2014

C3D: Generic Features for Video Analysis.

Du Tran; Lubomir D. Bourdev; Rob Fergus; Lorenzo Torresani; Manohar Paluri


neural information processing systems | 2012

Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

Du Tran; Junsong Yuan


arXiv: Computer Vision and Pattern Recognition | 2017

ConvNet Architecture Search for Spatiotemporal Feature Learning.

Du Tran; Jamie Ray; Zheng Shou; Shih-Fu Chang; Manohar Paluri


computer vision and pattern recognition | 2018

A Closer Look at Spatiotemporal Convolutions for Action Recognition

Du Tran; Heng Wang; Lorenzo Torresani; Jamie Ray; Yann LeCun; Manohar Paluri


computer vision and pattern recognition | 2018

Detect-and-Track: Efficient Pose Estimation in Videos

Rohit Girdhar; Georgia Gkioxari; Lorenzo Torresani; Manohar Paluri; Du Tran

Collaboration


Dive into the Du Tran's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Rohit Girdhar

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge