Yilin Yan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yilin Yan is active.

Explore More

Publication

Featured researches published by Yilin Yan.

international symposium on multimedia | 2015

Deep Learning for Imbalanced Multimedia Data Classification

Yilin Yan; Min Chen; Mei Ling Shyu; Shu-Ching Chen

Classification of imbalanced data is an important research problem as lots of real-world data sets have skewed class distributions in which the majority of data instances (examples) belong to one class and far fewer instances belong to others. While in many applications, the minority instances actually represent the concept of interest (e.g., fraud in banking operations, abnormal cell in medical data, etc.), a classifier induced from an imbalanced data set is more likely to be biased towards the majority class and show very poor classification accuracy on the minority class. Despite extensive research efforts, imbalanced data classification remains one of the most challenging problems in data mining and machine learning, especially for multimedia data. To tackle this challenge, in this paper, we propose an extended deep learning approach to achieve promising performance in classifying skewed multimedia data sets. Specifically, we investigate the integration of bootstrapping methods and a state-of-the-art deep learning approach, Convolutional Neural Networks (CNNs), with extensive empirical studies. Considering the fact that deep learning approaches such as CNNs are usually computationally expensive, we propose to feed low-level features to CNNs and prove its feasibility in achieving promising performance while saving a lot of training time. The experimental results show the effectiveness of our framework in classifying severely imbalanced data in the TRECVID data set.

International Journal of Multimedia Data Engineering and Management | 2015

Spatio-Temporal Analysis for Human Action Detection and Recognition in Uncontrolled Environments

Dianting Liu; Yilin Yan; Mei Ling Shyu; Guiru Zhao; Min Chen

Understanding semantic meaning of human actions captured in unconstrained environments has broad applications in fields ranging from patient monitoring, human-computer interaction, to surveillance systems. However, while great progresses have been achieved on automatic human action detection and recognition in videos that are captured in controlled/constrained environments, most existing approaches perform unsatisfactorily on videos with uncontrolled/unconstrained conditions e.g., significant camera motion, background clutter, scaling, and light conditions. To address this issue, the authors propose a robust human action detection and recognition framework that works effectively on videos taken in controlled or uncontrolled environments. Specifically, the authors integrate the optical flow field and Harris3D corner detector to generate a new spatial-temporal information representation for each video sequence, from which the general Gaussian mixture model GMM is learned. All the mean vectors of the Gaussian components in the generated GMM model are concatenated to create the GMM supervector for video action recognition. They build a boosting classifier based on a set of sparse representation classifiers and hamming distance classifiers to improve the accuracy of action recognition. The experimental results on two broadly used public data sets, KTH and UCF YouTube Action, show that the proposed framework outperforms the other state-of-the-art approaches on both action detection and recognition.

information reuse and integration | 2014

Utilizing concept correlations for effective imbalanced data classification

Yilin Yan; Yang Liu; Mei Ling Shyu; Min Chen

Data imbalance is a challenging and common problem in data mining and machine learning areas, and has attracted significant research efforts. A data set is considered imbalanced when the data instances (samples) are not close to uniformly distributed across different classes/categories, which is very common in real-world data sets. It is likely to result in biased classification results. In this paper, a two-phase classification framework is proposed to make the classification of imbalanced data more accurate and stable. The proposed framework is based on the correlations generated between concepts. The general idea is to identify negative data instances which have certain positive correlations with data instances in the target concept to facilitate the classification task. The experimental results show that our framework is effective in imbalanced data classification and is robust to feature descriptors by comparing it with four existing approaches using four different kinds of feature representations.

ieee international conference semantic computing | 2016

Negative Correlation Discovery for Big Multimedia Data Semantic Concept Mining and Retrieval

Yilin Yan; Mei Ling Shyu; Qiusha Zhu

With massive amounts of data producing each day in almost every field, traditional data processing techniques have become more and more inadequate. However, the research of effectively managing and retrieving these big data is still under development. Multimedia high-level semantic concept mining and retrieval in big data is one of the most challenging research topics, which requires joint efforts from researchers in both big data mining and multimedia domains. In order to bridge the semantic gap between high-level concepts and low-level visual features, correlation discovery in semantic concept mining is worth exploring. Meanwhile, correlation discovery is a computationally intensive task in the sense that it requires a deep analysis of very large and growing repositories. This paper presents a novel system of discovering negative correlation for semantic concept mining and retrieval. It is designed to adapt to Hadoop MapReduce framework, which is further extended to utilize Spark, a more efficient and general cluster computing engine. The experimental results demonstrate the feasibility of utilizing big data technologies in negative correlation discovery.

ieee international conference semantic computing | 2014

Enhancing Multimedia Semantic Concept Mining and Retrieval by Incorporating Negative Correlations

Tao Meng; Yang Liu; Mei Ling Shyu; Yilin Yan; Chi Min Shu

In recent years, we have witnessed a deluge of multimedia data such as texts, images, and videos. However, the research of managing and retrieving these data efficiently is still in the development stage. The conventional tag-based searching approaches suffer from noisy or incomplete tag issues. As a result, the content-based multimedia data management framework has become increasingly popular. In this research direction, multimedia high-level semantic concept mining and retrieval is one of the fastest developing research topics requesting joint efforts from researchers in both data mining and multimedia domains. To solve this problem, one great challenge is to bridge the semantic gap which is the gap between high-level concepts and low-level features. Recently, positive inter-concept correlations have been utilized to capture the context of a concept to bridge the gap. However, negative correlations have rarely been studied because of the difficulty to mine and utilize them. In this paper, a concept mining and retrieval framework utilizing negative inter-concept correlations is proposed. Several research problems such as negative correlation selection, weight estimation, and score integration are addressed. Experimental results on TRECVID 2010 benchmark data set demonstrate that the proposed framework gives promising performance.

International Journal of Multimedia Data Engineering and Management | 2017

Efficient Imbalanced Multimedia Concept Retrieval by Deep Learning on Spark Clusters

Yilin Yan; Min Chen; Saad Sadiq; Mei Ling Shyu

The classification of imbalanced datasets has recently attracted significant attention due to its implications in several real-world use cases. The classifiers developed on datasets with skewed distributions tend to favor the majority classes and are biased against the minority class. Despite extensive research interests, imbalanced data classification remains a challenge in data mining research, especially for multimedia data. Our attempt to overcome this hurdle is to develop a convolutional neural network CNN based deep learning solution integrated with a bootstrapping technique. Considering that convolutional neural networks are very computationally expensive coupled with big training datasets, we propose to extract features from pre-trained convolutional neural network models and feed those features to another full connected neutral network. Spark implementation shows promising performance of our model in handling big datasets with respect to feasibility and scalability.

information reuse and integration | 2016

Domain Knowledge Assisted Data Processing for Florida Public Hurricane Loss Model (Invited Paper)

Yilin Yan; Samira Pouyanfar; Haiman Tian; Sheng Guan; Hsin Yu Ha; Shu-Ching Chen; Mei Ling Shyu; Shahid Hamid

Catastrophes have caused tremendous damages in human history and triggered record high post-disaster relief from the governments. The research of catastrophic modeling can help estimate the effects of natural disasters like hurricanes, floods, surges, and earthquakes. In every Atlantic hurricane season, the state of Florida in the United States has the potential to suffer economic and human losses from hurricanes. The Florida Public Hurricane Loss Model (FPHLM), funded by the Florida Office of Insurance Regulation, has assisted Florida and the residential insurance industry for more than a decade. How to process big data for historical hurricanes and insurance companies remains a challenging research topic for cat models. In this paper, the FPHLMs novel integrated domain knowledge assisted big data processing system is introduced and its effectiveness of data processing error prevention is presented.

International Journal of Semantic Computing | 2016

Supporting Semantic Concept Retrieval with Negative Correlations in a Multimedia Big Data Mining System

Yilin Yan; Mei Ling Shyu; Qiusha Zhu

With the extensive use of smart devices and blooming popularity of social media websites such as Flickr, YouTube, Twitter, and Facebook, we have witnessed an explosion of multimedia data. The amount of data nowadays is formidable without effective big data technologies. It is well-acknowledged that multimedia high-level semantic concept mining and retrieval has become an important research topic; while the semantic gap (i.e., the gap between the low-level features and high-level concepts) makes it even more challenging. To address these challenges, it requires the joint research efforts from both big data mining and multimedia areas. In particular, the correlations among the classes can provide important context cues to help bridge the semantic gap. However, correlation discovery is computationally expensive due to the huge amount of data. In this paper, a novel multimedia big data mining system based on the MapReduce framework is proposed to discover negative correlations for semantic concept mining and retrieval. Furthermore, the proposed multimedia big data mining system consists of a big data processing platform with Mesos for efficient resource management and with Cassandra for handling data across multiple data centers. Experimental results on the TRECVID benchmark datasets demonstrate the feasibility and the effectiveness of the proposed multimedia big data mining system with negative correlation discovery for semantic concept mining and retrieval.

ieee international conference on multimedia big data | 2017

Mining Anomalies in Medicare Big Data Using Patient Rule Induction Method

Saad Sadiq; Yudong Tao; Yilin Yan; Mei Ling Shyu

The public health infrastructure delivers proper health care services as part of the basic needs of the general population. The health care system in the United States is rapidly changing in order to provide a better and convenient healthcare system to the public. Unfortunately, this comprehensive expand has also given rise to healthcare frauds in recent years where losses surge up to

international symposium on multimedia | 2016

Enhancing Rare Class Mining in Multimedia Big Data by Concept Correlation

Yilin Yan; Mei Ling Shyu

1.8 billion in the country. Organizations such as the Center for Medicare Services (CMS) have started providing accesses to comprehensive medical big data to promote the identification of healthcare frauds as an important research topic. In this paper, we will use the Patient Rule Induction Method (PRIM) based bump hunting method to identify the spaces of higher modes and masses to indicate the peak anomalies in the CMS 2014 dataset. By applying our framework, we can find a way to observe anomalies, which can be attributed to frauds in legal medical practices or other interesting insights in the CMS dataset. This will enable us to characterize the attribute space and explain the events incurring losses to the medicare/medicaid program. The proposed framework is compared with several methods to illustrate the efficiency and effectiveness of the proposed framework for fraud detection.

Explore More