Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yifang Yin is active.

Publication


Featured researches published by Yifang Yin.


ACM Transactions on Multimedia Computing, Communications, and Applications | 2015

Spatial-Temporal Tag Mining for Automatic Geospatial Video Annotation

Yifang Yin; Zhijie Shen; Luming Zhang; Roger Zimmermann

Videos are increasingly geotagged and used in practical and powerful GIS applications. However, video search and management operations are typically supported by manual textual annotations, which are subjective and laborious. Therefore, research has been conducted to automate or semi-automate this process. Since a diverse vocabulary for video annotations is of paramount importance towards good search results, this article proposes to leverage crowdsourced data from social multimedia applications that host tags of diverse semantics to build a spatio-temporal tag repository, consequently acting as input to our auto-annotation approach. In particular, to build the tag store, we retrieve the necessary data from several social multimedia applications, mine both the spatial and temporal features of the tags, and then refine and index them accordingly. To better integrate the tag repository, we extend our previous approach by leveraging the temporal characteristics of videos as well. Moreover, we set up additional ranking criteria on the basis of tag similarity, popularity and location bias. Experimental results demonstrate that, by making use of such a tag repository, the generated tags have a wide range of semantics, and the resulting rankings are more consistent with human perception.


acm multimedia | 2016

GeoUGV: user-generated mobile video dataset with fine granularity spatial metadata

Ying Lu; Hien To; Abdullah Alfarrarjeh; Seon Ho Kim; Yifang Yin; Roger Zimmermann; Cyrus Shahabi

When analyzing and processing videos, it has become increasingly important in many applications to also consider contextual information, in addition to the content. With the ubiquity of sensor-rich smartphones, acquiring a continuous stream of geo-spatial metadata that includes the location and orientation of a camera together with the video frames has become practical. However, no such detailed dataset is publicly available. In this paper we present an extensive geo-tagged video dataset named GeoUGV that has been collected as part of the MediaQ [3] and GeoVid [1] projects. The key features of the dataset are that each video file is accompanied by a metadata sequence of geo-tags consisting of GPS locations, compass directions, and spatial keywords at fine-grained intervals. The GeoUGV dataset has been collected by volunteer users and its statistics can be summarized as follows: 2,397 videos containing 208,976 video frames that are geo-tagged, collected by 289 users in more than 20 cities across the world over a period of 10 years (2007-2016). We hope that this dataset will be useful for researchers, scientists and practitioners alike in their work.


IEEE Transactions on Multimedia | 2017

Fusion of Magnetic and Visual Sensors for Indoor Localization: Infrastructure-Free and More Effective

Zhenguang Liu; Luming Zhang; Qi Liu; Yifang Yin; Li Cheng; Roger Zimmermann

Accurate and infrastructure-free indoor positioning can be very useful in a variety of applications. However, most existing approaches (e.g., WiFi and infrared-based methods) for indoor localization heavily rely on infrastructure, which is neither scalable nor pervasively available. In this paper, we propose a novel indoor localization and tracking approach, termed VMag, that does not require any infrastructure assistance. The user can be localized while simply holding a smartphone. To the best of our knowledge, the proposed method is the first exploration of fusing geomagnetic and visual sensing for indoor localization. More specifically, we conduct an in-depth study on both the advantageous properties and the challenges in leveraging the geomagnetic field and visual images for indoor localization. Based on these studies, we design a context-aware particle filtering framework to track the user with the goal of maximizing the positioning accuracy. We also introduce a neural-network-based method to extract deep features for the purpose of indoor positioning. We have conducted extensive experiments on four different indoor settings including a laboratory, a garage, a canteen, and an office building. Experimental results demonstrate the superior performance of VMag over the state of the art with these four indoor settings.


ACM Transactions on Multimedia Computing, Communications, and Applications | 2015

Content vs. Context: Visual and Geographic Information Use in Video Landmark Retrieval

Yifang Yin; Beomjoo Seo; Roger Zimmermann

Due to the ubiquity of sensor-equipped smartphones, it has become increasingly feasible for users to capture videos together with associated geographic metadata, for example the location and the orientation of the camera. Such contextual information creates new opportunities for the organization and retrieval of geo-referenced videos. In this study we explore the task of landmark retrieval through the analysis of two types of state-of-the-art techniques, namely media-content-based and geocontext-based retrievals. For the content-based method, we choose the Spatial Pyramid Matching (SPM) approach combined with two advanced coding methods: Sparse Coding (SC) and Locality-Constrained Linear Coding (LLC). For the geo-based method, we present the Geo Landmark Visibility Determination (GeoLVD) approach which computes the visibility of a landmark based on intersections of a cameras field-of-view (FOV) and the landmarks geometric information available from Geographic Information Systems (GIS) and services. We first compare the retrieval results of the two methods, and discuss the strengths and weaknesses of each approach in terms of precision, recall and execution time. Next we analyze the factors that affect the effectiveness for the content-based and the geo-based methods, respectively. Finally we propose a hybrid retrieval method based on the integration of the visual (content) and geographic (context) information, which is shown to achieve significant improvements in our experiments. We believe that the results and observations in this work will enlighten the design of future geo-referenced video retrieval systems, improve our understanding of selecting the most appropriate visual features for indexing and searching, and help in selecting between the most suitable methods for retrieval based on different conditions.


advances in geographic information systems | 2013

Orientation data correction with georeferenced mobile videos

Guanfeng Wang; Yifang Yin; Beomjoo Seo; Roger Zimmermann; Zhijie Shen

Similar to positioning data, camera orientation information has become a powerful contextual feature utilized by a number of GIS and social media applications. Such auxiliary information facilitates higher-level semantic analysis and management of video assets in such applications, e.g., video summarization and video indexing systems. However, it is problematic that raw sensor data collected from current mobile devices is often not accurate enough for subsequent geospatial analysis. To date, an effective orientation data correction system for mobile video content has been lacking. Here we present a content-based approach that improves the accuracy of noisy orientation sensor measurements generated by mobile devices in conjunction with video acquisition. Our preliminary experimental results demonstrate significant accuracy enhancements which benefit upstream sensor-aided GIS applications to access video content more precisely.


international workshop on geostreaming | 2016

A general feature-based map matching framework with trajectory simplification

Yifang Yin; Rajiv Ratn Shah; Roger Zimmermann

Accurate map matching has been a fundamental but challenging problem that has drawn great research attention in recent years. It aims to reduce the uncertainty in a trajectory by matching the GPS points to the road network on a digital map. Most existing work has focused on estimating the likelihood of a candidate path based on the GPS observations, while neglecting to model the probability of a route choice from the perspective of drivers. Here we propose a novel feature-based map matching algorithm that estimates the cost of a candidate path based on both GPS observations and human factors. To take human factors into consideration is very important especially when dealing with low sampling rate data where most of the movement details are lost. Additionally, we simultaneously analyze a subsequence of coherent GPS points by utilizing a new segment-based probabilistic map matching strategy, which is less susceptible to the noisiness of the positioning data. We have evaluated the proposed approach on a public large-scale GPS dataset, which consists of 100 trajectories distributed all over the world. The experimental results show that our method is robust to sparse data with large sampling intervals (e.g., 60 s ∼ 300 s) and challenging track features (e.g., u-turns and loops). Compared with two state-of-the-art map matching algorithms, our method substantially reduces the route mismatch error by 6.4% ∼ 32.3% and obtains the best map matching results in all the different combinations of sampling rates and challenging features.


international conference on multimedia and expo | 2017

Geographic information use in weakly-supervised deep learning for landmark recognition

Yifang Yin; Zhenguang Liu; Roger Zimmermann

The successful deep convolutional neural networks for visual object recognition typically rely on a massive number of training images that are well annotated by class labels or object bounding boxes with great human efforts. Here we explore the use of the geographic metadata, which are automatically retrieved from sensors such as GPS and compass, in weakly-supervised learning techniques for landmark recognition. The visibility of a landmark in a frame can be calculated based on the cameras field-of-view and the landmarks geometric information such as location and height. Subsequently, a training dataset is generated as the union of the frames with presence of at least one target landmark. To reduce the impact of the intrinsic noise in the geo-metadata, we present a frame selection method that removes the mistakenly labeled frames with a two-step approach consisting of (1) Gaussian Mixture Model clustering based on camera location followed by (2) outlier removal based on visual consistency. We compare the classification results obtained from the ground truth labels and the noisy labels derived from the raw geo-metadata. Experiments show that training based on the raw geo-metadata achieves a Mean Average Precision (MAP) of 0.797. Moreover, by applying our proposed representative frame selection method, the MAP can be further improved by 6.4%, which indicates the promising use of the geo-metadata in weakly-supervised learning techniques.


acm multimedia | 2013

OSCOR: an orientation sensor data correction system for mobile generated contents

Guanfeng Wang; Beomjoo Seo; Yifang Yin; Roger Zimmermann; Zhijie Shen

In addition to positioning data, other sensor information -- such as orientation data, have become a useful and powerful contextual feature. Such auxiliary information can facilitate higher-level semantic description inferences in many multimedia applications, e.g., video tagging and video summarization. However, sensor data collected from current mobile devices is often not accurate enough for upstream multimedia analysis. An effective orientation data correction system for mobile multimedia content has been an elusive goal so far. Here we present a system, termed Oscor, which aims to improve the accuracy of noisy orientation sensor measurements generated by mobile devices during image and video recording. We provide a user-friendly camera interface to facilitate the gathering of additional information, which enables the correction process on the server-side. Geographic field-of-view (FOV) visualizations based on the original and corrected sensor data help users understand the corrected contextual information and how the erroneous data possibly may affect further processes.


advances in geographic information systems | 2016

Automatic geographic metadata correction for sensor-rich video sequences

Yifang Yin; Guanfeng Wang; Roger Zimmermann

Videos recorded with current mobile devices are increasingly geotagged at fine granularity and used in various location- based applications and services. However, raw sensor data collected is often noisy, resulting in subsequent inaccurate geospatial analysis. In this study, we focus on the challenging correction of compass readings and present an automatic approach to reduce these metadata errors. Given the small geo-distance between consecutive video frames, image-based localization does not work due to the high ambiguity in the depth reconstruction of the scene. As an alternative, we collect geographic context from OpenStreetMap and estimate the absolute viewing direction by comparing the image scene to world projections obtained with different external camera parameters. To design a comprehensive model, we further incorporate smooth approximation and feature-based rotation estimation when formulating the error terms. Experimental results show that our proposed pyramid-based method outperforms its competitors and reduces orientation errors by an average of 58.8%. Hence, for downstream applications, improved results can be obtained with these more accurate geo-metadata. To illustrate, we present the performance gain in landmark retrieval and tag suggestion by utilizing the accuracy-enhanced geo-metadata.


international conference on multimedia retrieval | 2015

Exploiting Spatial Relationship between Scenes for Hierarchical Video Geotagging

Yifang Yin; Luming Zhang; Roger Zimmermann

Predicting the location of a video based on its content is a very meaningful, yet very challenging problem. Most existing work has focused on developing representative visual features and then searching for visually nearest neighbors in the development set to achieve a prediction. Interestingly, the relationship between scenes has been overlooked in prior work. Two scenes that are visually different, but frequently co-occur in same location, should naturally be considered similar for the geotagging problem. To build upon the above ideas, we propose to model the geo-spatial distributions of scenes by Gaussian Mixture Models (GMMs) and measure the distribution similarity by the Jensen-Shannon divergence (JSD). Subsequently, we present the Spatial Relationship Model (SRM) for geotagging which integrates the geo-spatial relationship of scenes into a hierarchical framework. We segment the Earths surface into multiple levels of grids and measure the likelihood of input videos with an adaptation to region granularities. We have evaluated our approach using the YFCC100M dataset in the context of the MediaEval 2014 placing task. The total set of 35,000 geotagged videos is further divided into a training set of 25,000 videos and a test set of 10,000 videos. Our experimental results demonstrate the effectiveness of our proposed framework, as our solution achieves good accuracy and outperforms existing visual approaches for video geotagging.

Collaboration


Dive into the Yifang Yin's collaboration.

Top Co-Authors

Avatar

Roger Zimmermann

National University of Singapore

View shared research outputs
Top Co-Authors

Avatar

Guanfeng Wang

National University of Singapore

View shared research outputs
Top Co-Authors

Avatar

Beomjoo Seo

National University of Singapore

View shared research outputs
Top Co-Authors

Avatar

Zhenguang Liu

National University of Singapore

View shared research outputs
Top Co-Authors

Avatar

Zhijie Shen

National University of Singapore

View shared research outputs
Top Co-Authors

Avatar

Luming Zhang

Hefei University of Technology

View shared research outputs
Top Co-Authors

Avatar

Rajiv Ratn Shah

Indraprastha Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar

Rajiv Ratn Shah

Indraprastha Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Qi Liu

National University of Singapore

View shared research outputs
Researchain Logo
Decentralizing Knowledge