Is this you? Create Your Porfile

Tsung-Yu Lin

University of Massachusetts Amherst

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tsung-Yu Lin is active.

Explore More

Publication

Featured researches published by Tsung-Yu Lin.

international conference on computer vision | 2015

Bilinear CNN Models for Fine-Grained Visual Recognition

Tsung-Yu Lin; Aruni RoyChowdhury; Subhransu Maji

We propose bilinear models, a recognition architecture that consists of two feature extractors whose outputs are multiplied using outer product at each location of the image and pooled to obtain an image descriptor. This architecture can model local pairwise feature interactions in a translationally invariant manner which is particularly useful for fine-grained categorization. It also generalizes various orderless texture descriptors such as the Fisher vector, VLAD and O2P. We present experiments with bilinear models where the feature extractors are based on convolutional neural networks. The bilinear form simplifies gradient computation and allows end-to-end training of both networks using image labels only. Using networks initialized from the ImageNet dataset followed by domain specific fine-tuning we obtain 84.1% accuracy of the CUB-200-2011 dataset requiring only category labels at training time. We present experiments and visualizations that analyze the effects of fine-tuning and the choice two networks on the speed and accuracy of the models. Results show that the architecture compares favorably to the existing state of the art on a number of fine-grained datasets while being substantially simpler and easier to train. Moreover, our most accurate model is fairly efficient running at 8 frames/sec on a NVIDIA Tesla K40 GPU. The source code for the complete system will be made available at http://vis-www.cs.umass.edu/bcnn.

computer vision and pattern recognition | 2016

Visualizing and Understanding Deep Texture Representations

Tsung-Yu Lin; Subhransu Maji

A number of recent approaches have used deep convolutional neural networks (CNNs) to build texture representations. Nevertheless, it is still unclear how these models represent texture and invariances to categorical variations. This work conducts a systematic evaluation of recent CNN-based texture descriptors for recognition and attempts to understand the nature of invariances captured by these representations. First we show that the recently proposed bilinear CNN model [25] is an excellent generalpurpose texture descriptor and compares favorably to other CNN-based descriptors on various texture and scene recognition benchmarks. The model is translationally invariant and obtains better accuracy on the ImageNet dataset without requiring spatial jittering of data compared to corresponding models trained with spatial jittering. Based on recent work [13, 28] we propose a technique to visualize pre-images, providing a means for understanding categorical properties that are captured by these representations. Finally, we show preliminary results on how a unified parametric model of texture analysis and synthesis can be used for attribute-based image manipulation, e.g. to make an image more swirly, honeycombed, or knitted. The source code and additional visualizations are available at http://vis-www.cs.umass.edu/texture.

workshop on applications of computer vision | 2016

One-to-many face recognition with bilinear CNNs

Aruni Roy Chowdhury; Tsung-Yu Lin; Subhransu Maji; Erik G. Learned-Miller

The recent explosive growth in convolutional neural network (CNN) research has produced a variety of new architectures for deep learning. One intriguing new architecture is the bilinear CNN (B-CNN), which has shown dramatic performance gains on certain fine-grained recognition problems [15]. We apply this new CNN to the challenging new face recognition benchmark, the IARPA Janus Benchmark A (IJB-A) [12]. It features faces from a large number of identities in challenging real-world conditions. Because the face images were not identified automatically using a computerized face detection system, it does not have the bias inherent in such a database. We demonstrate the performance of the B-CNN model beginning from an AlexNet-style network pre-trained on ImageNet. We then show results for fine-tuning using a moderate-sized and public external database, FaceScrub [17]. We also present results with additional fine-tuning on the limited training data provided by the protocol. In each case, the fine-tuned bilinear model shows substantial improvements over the standard CNN. Finally, we demonstrate how a standard CNN pre-trained on a large face database, the recently released VGG-Face model [20], can be converted into a B-CNN without any additional feature training. This B-CNN improves upon the CNN performance on the IJB-A benchmark, achieving 89.5% rank-1 recall.

conference on multimedia modeling | 2011

People localization in a camera network combining background subtraction and scene-aware human detection

Tung-Ying Lee; Tsung-Yu Lin; Szu-Hao Huang; Shang-Hong Lai; Shang-Chih Hung

In a network of cameras, people localization is an important issue. Traditional methods utilize camera calibration and combine results of background subtraction in different views to locate people in the three dimensional space. Previous methods usually solve the localization problem iteratively based on background subtraction results, and high-level image information is neglected. In order to fully exploit the image information, we suggest incorporating human detection into multi-camera video surveillance. We develop a novel method combining human detection and background subtraction for multi-camera human localization by using convex optimization. This convex optimization problem is independent of the image size. In fact, the problem size only depends on the number of interested locations in ground plane. Experimental results show this combination performs better than background subtraction-based methods and demonstrate the advantage of combining these two types of complementary information.

international conference on image processing | 2014

Efficient binary codes for extremely high-dimensional data

Tsung-Yu Lin; Tyng-Luh Liu

Recent advances in tackling large-scale computer vision problems have supported the use of an extremely high-dimensional descriptor to encode the image data. Under such a setting, we focus on how to efficiently carry out similarity search via employing binary codes. Observe that most of the popular high-dimensional descriptors induce feature vectors that have an implicit 2-D structure. We exploit this property to reduce the computation cost and high complexity. Specifically, our method generalizes the Iterative Quantization (ITQ) framework to handle extremely high-dimensional data in two steps. First, we restrict the dimensionality-reduction projection to a block-diagonal form and decide it by independently solving several moderate-size PCA sub-problems. Second, we replace the full rotation in ITQ with a bilinear rotation to improve the efficiency both in training and testing. Our experimental results on a large-scale dataset and comparisons with a state-of-the-art technique are promising.

british machine vision conference | 2017