Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Zuxuan Wu is active.

Publication


Featured researches published by Zuxuan Wu.


acm multimedia | 2015

Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification

Zuxuan Wu; Xi Wang; Yu-Gang Jiang; Hao Ye; Xiangyang Xue

Classifying videos according to content semantics is an important problem with a wide range of applications. In this paper, we propose a hybrid deep learning framework for video classification, which is able to model static spatial information, short-term motion, as well as long-term temporal clues in the videos. Specifically, the spatial and the short-term motion features are extracted separately by two Convolutional Neural Networks (CNN). These two types of CNN-based features are then combined in a regularized feature fusion network for classification, which is able to learn and utilize feature relationships for improved performance. In addition, Long Short Term Memory (LSTM) networks are applied on top of the two features to further model longer-term temporal clues. The main contribution of this work is the hybrid learning framework that can model several important aspects of the video data. We also show that (1) combining the spatial and the short-term motion features in the regularized fusion network is better than direct classification and fusion using the CNN with a softmax layer, and (2) the sequence-based LSTM is highly complementary to the traditional classification strategy without considering the temporal frame orders. Extensive experiments are conducted on two popular and challenging benchmarks, the UCF-101 Human Actions and the Columbia Consumer Videos (CCV). On both benchmarks, our framework achieves very competitive performance: 91.3% on the UCF-101 and 83.5% on the CCV.


acm multimedia | 2014

Exploring Inter-feature and Inter-class Relationships with Deep Neural Networks for Video Classification

Zuxuan Wu; Yu-Gang Jiang; Jun Wang; Jian Pu; Xiangyang Xue

Videos contain very rich semantics and are intrinsically multimodal. In this paper, we study the challenging task of classifying videos according to their high-level semantics such as human actions or complex events. Although extensive efforts have been paid to study this problem, most existing works combined multiple features using simple fusion strategies and neglected the exploration of inter-class semantic relationships. In this paper, we propose a novel unified framework that jointly learns feature relationships and exploits the class relationships for improved video classification performance. Specifically, these two types of relationships are learned and utilized by rigorously imposing regularizations in a deep neural network (DNN). Such a regularized DNN can be efficiently launched using a GPU implementation with an affordable training cost. Through arming the DNN with better capability of exploring both the inter-feature and the inter-class relationships, the proposed regularized DNN is more suitable for identifying video semantics. With extensive experimental evaluations, we demonstrate that the proposed framework exhibits superior performance over several state-of-the-art approaches. On the well-known Hollywood2 and Columbia Consumer Video benchmarks, we obtain to-date the best reported results: 65.7% and 70.6% respectively in terms of mean average precision.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2018

Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks

Yu-Gang Jiang; Zuxuan Wu; Jun Wang; Xiangyang Xue; Shih-Fu Chang

In this paper, we study the challenging problem of categorizing videos according to high-level semantics such as the existence of a particular human action or a complex event. Although extensive efforts have been devoted in recent years, most existing works combined multiple video features using simple fusion strategies and neglected the utilization of inter-class semantic relationships. This paper proposes a novel unified framework that jointly exploits the feature relationships and the class relationships for improved categorization performance. Specifically, these two types of relationships are estimated and utilized by imposing regularizations in the learning process of a deep neural network (DNN). Through arming the DNN with better capability of harnessing both the feature and the class relationships, the proposed regularized DNN (rDNN) is more suitable for modeling video semantics. We show that rDNN produces better performance over several state-of-the-art approaches. Competitive results are reported on the well-known Hollywood2 and Columbia Consumer Video benchmarks. In addition, to stimulate future research on large scale video categorization, we collect and release a new benchmark dataset, called FCVID, which contains 91,223 Internet videos and 239 manually annotated categories.


computer vision and pattern recognition | 2016

Harnessing Object and Scene Semantics for Large-Scale Video Understanding

Zuxuan Wu; Yanwei Fu; Yu-Gang Jiang; Leonid Sigal

Large-scale action recognition and video categorization are important problems in computer vision. To address these problems, we propose a novel object-and scene-based semantic fusion network and representation. Our semantic fusion network combines three streams of information using a three-layer neural network: (i) frame-based low-level CNN features, (ii) object features from a state-of-the-art large-scale CNN object-detector trained to recognize 20K classes, and (iii) scene features from a state-of-the-art CNN scene-detector trained to recognize 205 scenes. The trained network achieves improvements in supervised activity and video categorization in two complex large-scale datasets - ActivityNet and FCVID, respectively. Further, by examining and back propagating information through the fusion network, semantic relationships (correlations) between video classes and objects/scenes can be discovered. These video class-object/video class-scene relationships can in turn be used as semantic representation for the video classes themselves. We illustrate effectiveness of this semantic representation through experiments on zero-shot action/video classification and clustering.


acm multimedia | 2016

Multi-Stream Multi-Class Fusion of Deep Networks for Video Classification

Zuxuan Wu; Yu-Gang Jiang; Xi Wang; Hao Ye; Xiangyang Xue

This paper studies deep network architectures to address the problem of video classification. A multi-stream framework is proposed to fully utilize the rich multimodal information in videos. Specifically, we first train three Convolutional Neural Networks to model spatial, short-term motion and audio clues respectively. Long Short Term Memory networks are then adopted to explore long-term temporal dynamics. With the outputs of the individual streams on multiple classes, we propose to mine class relationships hidden in the data from the trained models. The automatically discovered relationships are then leveraged in the multi-stream multi-class fusion process as a prior, indicating which and how much information is needed from the remaining classes, to adaptively determine the optimal fusion weights for generating the final scores of each class. Our contributions are two-fold. First, the multi-stream framework is able to exploit multimodal features that are more comprehensive than those previously attempted. Second, our proposed fusion method not only learns the best weights of the multiple network streams for each class, but also takes class relationship into account, which is known as a helpful clue in multi-class visual classification tasks. Our framework produces significantly better results than the state of the arts on two popular benchmarks, 92.2% on UCF-101 (without using audio) and 84.9% on Columbia Consumer Videos.


Tumor Biology | 2015

Gene mutations in gastric cancer: a review of recent next-generation sequencing studies

Yandan Lin; Zuxuan Wu; Weijian Guo; Jiwei Li

Gastric cancer (GC) is one of the most common malignancies worldwide. Although some driver genes have been identified in GC, the molecular compositions of GC have not been fully understood. The development of next-generation sequencing (NGS) provides a high-throughput and systematic method to identify all genetic alterations in the cancer genome, especially in the field of mutation detection. NGS studies in GC have discovered some novel driver mutations. In this review, we focused on novel gene mutations discovered by NGS studies, along with some well-known driver genes in GC. We organized mutated genes from the perspective of related biological pathways. Mutations in genes relating to genome integrity (TP53, BRCA2), chromatin remodeling (ARID1A), cell adhesion (CDH1, FAT4, CTNNA1), cytoskeleton and cell motility (RHOA), Wnt pathway (CTNNB1, APC, RNF43), and RTK pathway (RTKs, RAS family, MAPK pathway, PIK pathway) are discussed. Efforts to establish a molecular classification based on NGS data which is valuable for future targeted therapy for GC are introduced. Comprehensive dissection of the molecular profile of GC cannot only unveil the molecular basis for GC but also identify genes of clinical utility, especially potential and specific therapeutic targets for GC.


acm multimedia | 2017

Learning Fashion Compatibility with Bidirectional LSTMs

Xintong Han; Zuxuan Wu; Yu-Gang Jiang; Larry S. Davis

The ubiquity of online fashion shopping demands effective recommendation services for customers. In this paper, we study two types of fashion recommendation: (i) suggesting an item that matches existing components in a set to form a stylish outfit (a collection of fashion items), and (ii) generating an outfit with multimodal (images/text) specifications from a user. To this end, we propose to jointly learn a visual-semantic embedding and the compatibility relationships among fashion items in an end-to-end fashion. More specifically, we consider a fashion outfit to be a sequence (usually from top to bottom and then accessories) and each item in the outfit as a time step. Given the fashion items in an outfit, we train a bidirectional LSTM (Bi-LSTM) model to sequentially predict the next item conditioned on previous ones to learn their compatibility relationships. Further, we learn a visual-semantic space by regressing image features to their semantic representations aiming to inject attribute and category information as a regularization for training the LSTM. The trained network can not only perform the aforementioned recommendations effectively but also predict the compatibility of a given outfit. We conduct extensive experiments on our newly collected Polyvore dataset, and the results provide strong qualitative and quantitative evidence that our framework outperforms alternative methods.


arXiv: Computer Vision and Pattern Recognition | 2016

Deep Learning for Video Classification and Captioning.

Zuxuan Wu; Ting Yao; Yanwei Fu; Yu-Gang Jiang

Accelerated by the tremendous increase in Internet bandwidth and storage space, video data has been generated, published and spread explosively, becoming an indispensable part of todays big data. In this paper, we focus on reviewing two lines of research aiming to stimulate the comprehension of videos with deep learning: video classification and video captioning. While video classification concentrates on automatically labeling video clips based on their semantic contents like human actions or complex events, video captioning attempts to generate a complete and natural sentence, enriching the single label as in video classification, to capture the most informative dynamics in videos. In addition, we also provide a review of popular benchmarks and competitions, which are critical for evaluating the technical progress of this vibrant field.


acm multimedia | 2017

LSVC2017: Large-Scale Video Classification Challenge

Zuxuan Wu; Yu-Gang Jiang; Larry S. Davis; Shih-Fu Chang

Recognizing visual contents in unconstrained videos has become a very important problem for many applications, such as Web video search and recommendation, smart advertising, robotics, etc. This workshop and challenge aims at exploring new challenges and approaches for large-scale video classification with large number of classes from open source videos in a realistic setting, based upon an extension of Fudan-Columbia Video Dataset (FCVID). This newly collected dataset contains over 8000 hours of video data from YouTube and Flicker, annotated into 500 categories. We hope this dataset can stimulate innovative research on this challenging and important problem.


international conference on multimedia and expo | 2014

Challenge Huawei challenge: Fusing multimodal features with deep neural networks for Mobile Video Annotation

Jian Tu; Zuxuan Wu; Qi Dai; Yu-Gang Jiang; Xiangyang Xue

We participated in the Huawei Accurate and Fast Mobile Video Annotation Challenge (MoVAC) at IEEE ICME 2014. Three result runs were submitted by combining different features and classification techniques, with emphasis on both accuracy and efficiency. In this paper, we briefly summarize the techniques used in our system, and the components used for generating each of the three submitted results. One novel component in our system is a specially tailored deep neural network (DNN) that can explore the relationships of multiple features for improved annotation performance, which is very efficient based on an implementation with the GPU. Only 18.8 seconds were needed by one of our DNN-based submissions to process a test video. By combining the DNN with the traditional SVM learning, we achieved the best accuracy across all the worldwide submissions to this challenge.

Collaboration


Dive into the Zuxuan Wu's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jinhui Tang

Nanjing University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge