Is this you? Create Your Porfile

Wu Liu

Chinese Academy of Sciences

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wu Liu is active.

Explore More

Publication

Featured researches published by Wu Liu.

computer vision and pattern recognition | 2015

Multi-task deep visual-semantic embedding for video thumbnail selection

Wu Liu; Tao Mei; Yongdong Zhang; Cherry Che; Jiebo Luo

Given the tremendous growth of online videos, video thumbnail, as the common visualization form of video content, is becoming increasingly important to influence users browsing and searching experience. However, conventional methods for video thumbnail selection often fail to produce satisfying results as they ignore the side semantic information (e.g., title, description, and query) associated with the video. As a result, the selected thumbnail cannot always represent video semantics and the click-through rate is adversely affected even when the retrieved videos are relevant. In this paper, we have developed a multi-task deep visual-semantic embedding model, which can automatically select query-dependent video thumbnails according to both visual and side information. Different from most existing methods, the proposed approach employs the deep visual-semantic embedding model to directly compute the similarity between the query and video thumbnails by mapping them into a common latent semantic space, where even unseen query-thumbnail pairs can be correctly matched. In particular, we train the embedding model by exploring the large-scale and freely accessible click-through video and image data, as well as employing a multi-task learning strategy to holistically exploit the query-thumbnail relevance from these two highly related datasets. Finally, a thumbnail is selected by fusing both the representative and query relevance scores. The evaluations on 1,000 query-thumbnail dataset labeled by 191 workers in Amazon Mechanical Turk have demonstrated the effectiveness of our proposed method.

IEEE Transactions on Multimedia | 2014

Instant Mobile Video Search With Layered Audio-Video Indexing and Progressive Transmission

Wu Liu; Tao Mei; Yongdong Zhang

The proliferation of mobile devices is producing a new wave of applications that enable users to sense their surroundings with smart phones. People are preferring mobile devices to search and browse video content on the move. In this paper, we have developed an innovative mobile video search system through which users can discover videos by simply pointing their phones at a screen to capture a very few seconds of what they are watching. Different than most existing mobile video search applications, the proposed system is aiming at instant and progressive video search by leveraging the light-weight computing capacity of mobile devices. In particular, the system is able to index large-scale video data using a new layered audio-video indexing approach in the cloud, as well as generate lightweight joint audio-video signatures with progressive transmission and perform progressive search on mobile devices. Furthermore, we showcase that the system can be applied to two novel applications-video entity search and video clip localization. The evaluations on the real-world mobile video query dataset show that our system significantly improves users search experience due to search accuracy, low retrieval latency, and very short recording duration.

IEEE Transactions on Systems, Man, and Cybernetics | 2013

Accurate Estimation of Human Body Orientation From RGB-D Sensors

Wu Liu; Yongdong Zhang; Sheng Tang; Jinhui Tang; Jintao Li

Accurate estimation of human body orientation can significantly enhance the analysis of human behavior, which is a fundamental task in the field of computer vision. However, existing orientation estimation methods cannot handle the various body poses and appearances. In this paper, we propose an innovative RGB-D-based orientation estimation method to address these challenges. By utilizing the RGB-D information, which can be real time acquired by RGB-D sensors, our method is robust to cluttered environment, illumination change and partial occlusions. Specifically, efficient static and motion cue extraction methods are proposed based on the RGB-D superpixels to reduce the noise of depth data. Since it is hard to discriminate all the 360 ° orientation using static cues or motion cues independently, we propose to utilize a dynamic Bayesian network system (DBNS) to effectively employ the complementary nature of both static and motion cues. In order to verify our proposed method, we build a RGB-D-based human body orientation dataset that covers a wide diversity of poses and appearances. Our intensive experimental evaluations on this dataset demonstrate the effectiveness and efficiency of the proposed method.

IEEE Transactions on Multimedia | 2012

Web Video Geolocation by Geotagged Social Resources

Yicheng Song; Yongdong Zhang; Juan Cao; Tian Xia; Wu Liu; Jintao Li

This paper considers the problem of web video geolocation: we hope to determine where on the Earth a web video was taken. By analyzing a 6.5-million geotagged web video dataset, we observe that there exist inherent geography intimacies between a video with its relevant videos (related videos and same-author videos). This social relationship supplies a direct and effective cue to locate the video to a particular region on the earth. Based on this observation, we propose an effective web video geolocation algorithm by propagating geotags among the web video social relationship graph. For the video that have no geotagged relevant videos, we aim to collect those geotagged relevant images that are content similar with the video (share some visual or textual information with the video) as the cue to infer the location of the video. The experiments have demonstrated the effectiveness of both methods, with the geolocation accuracy much better than state-of-the-art approaches. Finally, an online web video geolocation system: Video2Locatoin (V2L) is developed to provide public access to our algorithm.

conference on multimedia modeling | 2012

RGB-D based multi-attribute people search in intelligent visual surveillance

Wu Liu; Tian Xia; Ji Wan; Yongdong Zhang; Jintao Li

Searching people in surveillance videos is a typical task in intelligent visual surveillance (IVS). However, current IVS techniques can hardly handle multi-attribute queries, which is a natural way of finding people in real-world. The challenges arise from the extraction of multiple attributes which largely suffer from illumination change, shadow and complicated background in the real-world surveillance environments. In this paper, we investigate how these challenges can be addressed when IVS is equipped with RGB-D information obtained by an RGB-D camera. With the RGB-D information, we propose methods that accurately and robustly segment human region and extract three groups of attributes including biometrical attributes, appearance attributes and motion attributes. Furthermore, we introduce a novel IVS system which is capable of handling multi-attribute queries for searching people in surveillance videos. Experimental evaluations demonstrate the effectiveness of the proposed method and system, and also the promising applications of bringing RGB-D information into IVS.

acm multimedia | 2013

Listen, look, and gotcha: instant video search with mobile phones by layered audio-video indexing

Wu Liu; Tao Mei; Yongdong Zhang; Jintao Li; Shipeng Li

Mobile video is quickly becoming a mass consumer phenomenon. More and more people are using their smartphones to search and browse video content while on the move. In this paper, we have developed an innovative instant mobile video search system through which users can discover videos by simply pointing their phones at a screen to capture a very few seconds of what they are watching. The system is able to index large-scale video data using a new layered audio-video indexing approach in the cloud, as well as extract light-weight joint audio-video signatures in real time and perform progressive search on mobile devices. Unlike most existing mobile video search applications that simply send the original video query to the cloud, the proposed mobile system is one of the first attempts at instant and progressive video search leveraging the light-weight computing capacity of mobile devices. The system is characterized by four unique properties: 1) a joint audio-video signature to deal with the large aural and visual variances associated with the query video captured by the mobile phone, 2) layered audio-video indexing to holistically exploit the complementary nature of audio and video signals, 3) light-weight fingerprinting to comply with mobile processing capacity, and 4) a progressive query process to significantly reduce computational costs and improve the user experience---the search process can stop anytime once a confident result is achieved. We have collected 1,400 query videos captured by 25 mobile users from a dataset of 600 hours of video. The experiments show that our system outperforms state-of-the-art methods by achieving 90.79% precision when the query video is less than 10 seconds and 70.07% even when the query video is less than 5 seconds.

international conference on internet multimedia computing and service | 2015

Multimodal tag localization based on deep learning

Rui Zhang; Sheng Tang; Wu Liu; Jintao Li

Tag localization which localizes the relevant video clips for an associated semantic tag has become an important research topic in the field of video retrieval and recommendation. Most existing approaches adopt and depend in large degree on carefully selected features which are manually designed by experts and do not take into consideration of multimodality. In order to take into account complementarity of different modalities and take advantage of learned features, in this paper, we propose a multimodal tag localization framework by exploiting deep learning to learn both visual and textual features of videos for tag localization, followed by the multimodal fusion of both visual and textual results. Extensive experiments on the public dataset show that our proposed approach achieves promising results. The tag localization based on visual deep learning greatly improves the precision of tag localization, and the multi-modal fusion of both visual and textual modalities further improves the precision despite the low performances of single textual modality.

Multimedia Systems | 2017

Multi-modal tag localization for mobile video search

Rui Zhang; Sheng Tang; Wu Liu; Yongdong Zhang; Jintao Li

Given the tremendous growth of mobile videos, video tag localization, which localizes the relevant video clips for an associated semantic tag, is becoming increasingly important to influence users browsing and searching experience. However, most existing approaches adopt and depend to large degree on carefully selected visual features, which are manually designed by experts and do not take multi-modality into consideration. Aiming to take into account complementarity of different modalities, in this paper, we propose a multi-modal tag localization framework by exploiting deep learning to learn visual, auditory, and semantic features of videos for tag localization. Furthermore, we showcase that the framework can be applied to two novel mobile video search applications: (1) automatic time-code-level tags generation and (2) query-dependent video thumbnail selection. Extensive experiments on the public dataset show that the proposed approach achieves promising results, which obtains

acm multimedia | 2013