Is this you? Create Your Porfile

Jie Cao

Nanjing University of Finance and Economics

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jie Cao is active.

Explore More

Publication

Featured researches published by Jie Cao.

World Wide Web | 2013

Shilling attack detection utilizing semi-supervised learning method for collaborative recommender system

Jie Cao; Zhiang Wu; Bo Mao; Yanchun Zhang

Collaborative filtering (CF) technique is capable of generating personalized recommendations. However, the recommender systems utilizing CF as their key algorithms are vulnerable to shilling attacks which insert malicious user profiles into the systems to push or nuke the reputations of targeted items. There are only a small number of labeled users in most of the practical recommender systems, while a large number of users are unlabeled because it is expensive to obtain their identities. In this paper, Semi-SAD, a new semi-supervised learning based shilling attack detection algorithm is proposed to take advantage of both types of data. It first trains a naïve Bayes classifier on a small set of labeled users, and then incorporates unlabeled users with EM-λ to improve the initial naïve Bayes classifier. Experiments on MovieLens datasets are implemented to compare the efficiency of Semi-SAD with supervised learning based detector and unsupervised learning based detector. The results indicate that Semi-SAD can better detect various kinds of shilling attacks than others, especially against obfuscated and hybrid shilling attacks.

World Wide Web | 2015

A Graph-based model for context-aware recommendation using implicit feedback data

Weilong Yao; Jing He; Guangyan Huang; Jie Cao; Yanchun Zhang

Recommender systems have been successfully dealing with the problem of information overload. However, most recommendation methods suit to the scenarios where explicit feedback, e.g. ratings, are available, but might not be suitable for the most common scenarios with only implicit feedback. In addition, most existing methods only focus on user and item dimensions and neglect any additional contextual information, such as time and location. In this paper, we propose a graph-based generic recommendation framework, which constructs a Multi-Layer Context Graph (MLCG) from implicit feedback data, and then performs ranking algorithms in MLCG for context-aware recommendation. Specifically, MLCG incorporates a variety of contextual information into a recommendation process and models the interactions between users and items. Moreover, based on MLCG, two novel ranking methods are developed: Context-aware Personalized Random Walk (CPRW) captures user preferences and current situations, and Semantic Path-based Random Walk (SPRW) incorporates semantics of paths in MLCG into random walk model for recommendation. The experiments on two real-world datasets demonstrate the effectiveness of our approach.

web age information management | 2012

Pick-Up Tree Based Route Recommendation from Taxi Trajectories

Haoran Hu; Zhiang Wu; Bo Mao; Yi Zhuang; Jie Cao; Jingui Pan

Recommending suitable routes to taxi drivers for picking up passengers is helpful to raise their incomes and reduce the gasoline consumption. In this paper, a pick-up tree based route recommender system is proposed to minimize the traveling distance without carrying passengers for a given taxis set. Firstly, we apply clustering approach to the GPS trajectory data of a large number of taxis that indicates state variance from “free” to “occupied”, and take the centroids as potential pick-up points. Secondly, we propose a heuristic based on skyline computation to construct a pick-up tree in which current position is its root node that connects all centroids. Then, we present a probability model to estimate gasoline consumption of every route. By adopting the estimated gasoline consumption as the weight of every route, the weighted Round-Robin recommendation method for the set of taxis is proposed. Our experimental results on real-world taxi trajectories data set have shown that the proposed recommendation method effectively reduce the driving distance before carrying passengers, especially when the number of cabs becomes large. Meanwhile, the time-cost of our method is also lower than the existing methods.

conference on recommender systems | 2011

Semi-SAD: applying semi-supervised learning to shilling attack detection

Zhiang Wu; Jie Cao; Bo Mao; Youquan Wang

Collaborative filtering (CF) based recommender systems are vulnerable to shilling attacks. In some leading e-commerce sites, there exists a large number of unlabeled users, and it is expensive to obtain their identities. Existing research efforts on shilling attack detection fail to exploit these unlabeled users. In this article, Semi-SAD, a new semi-supervised learning based shilling attack detection algorithm is proposed. Semi-SAD is trained with the labeled and unlabeled user profiles using the combination of naïve Bayes classifier and EM-», augmented Expectation Maximization (EM). Experiments on MovieLens datasets show that our proposed Semi-SAD is efficient and effective.

World Wide Web | 2014

Online mining abnormal period patterns from multiple medical sensor data streams

Guangyan Huang; Yanchun Zhang; Jie Cao; Michael Steyn; Kersi Taraporewalla

With the advanced technology of medical devices and sensors, an abundance of medical data streams are available. However, data analysis techniques are very limited, especially for processing massive multiple physiological streams that may only be understood by medical experts. The state-of-the-art techniques only allow multiple medical devices to independently monitor different physiological parameters for the patient’s status, thus they signal too many false alarms, creating unnecessary noise, especially in the Intensive Care Unit (ICU). An effective solution which has been recently studied is to integrate information from multiple physiologic parameters to reduce alarms. But it is a challenge to detect abnormalities from high frequently changed physiological streams data, since abnormalities occur gradually due to the complex situation of patients. An analysis of ICU physiological data streams shows that many vital physiological parameters are changed periodically (such as heart rate, arterial pressure, and respiratory impedance) and thus abnormalities are generally abnormal period patterns. In this paper, we develop a Mining Abnormal Period Patterns from Multiple Physiological Streams (MAPPMPS) method to detect and rank abnormalities in medical sensor streams. The efficiency and effectiveness of the MAPPMPS method is demonstrated by a real-world massive database of multiple physiological streams sampled in ICU, comprising 250 patients’ streams (each stream involving over 1.3 million data points) with a total size of 28xa0GB data.

asia-pacific web conference | 2012

Multiple time series anomaly detection based on compression and correlation analysis: a medical surveillance case study

Zhi Qiao; Jing He; Jie Cao; Guangyan Huang; Peng Zhang

In this paper, we present a novel anomaly detection framework for multiple heterogeneous yet correlated time series, such as the medical surveillance series data. In our framework, we propose an anomaly detection algorithm from the viewpoint of trend and correlation analysis. Moreover, to efficiently process huge amount of observed time series, a new clustering-based compression method is proposed. Experimental results indicate that our framework is more effective and efficient than its peers.

Procedia Computer Science | 2015

A Framework for Food Traceability Information Extraction Based on a Video Surveillance System

Bo Mao; Jing He; Jie Cao; Stephen W. Bigger; Todor Vasiljevic

Abstract Food security is currently one of the most concerning problems in China. A traceability system is an effective method to improve the quality of food production. This system has been widely applied in several countries such as the US, Japan and the EU. However, in China, the creditability of traceability systems is not strong and many producers try to deceive the public by forging the data in such systems. In this paper, a video surveillance system-based traceability system is proposed which will significantly increase the forgery cost. In this system, subjects, such as vehicles or people, are firstly defined using a novel dynamic background model, then their trajectories are generated and connected using different cameras with a camera relation graph. The experimental results indicate that the proposed method can efficiently extract the object information from the video surveillance system and generate image-based traceability information to be used for further analysis.

Procedia Computer Science | 2017

Analysis of Grain Storage Loss Based on Decision Tree Algorithm

Xueli Liu; Bingchan Li; Dongqin Shen; Jie Cao; Bo Mao

Abstract Different grain storage factors will cause different degrees of grain loss. In this paper, the data mining method is used to study the loss of grain storage, and the grain loss analysis and forecasting model based on decision tree algorithm is proposed. The paper analyzes and predicts the grain loss caused by different grain storage factors. And the influence of model parameters on model fitting and accuracy is verified by the verification curve. Then the decision tree model is optimized by the method of grid search and cross validation, which improves the prediction accuracy of the decision tree model to analyze the grain loss.

international conference on data engineering | 2013

A real-time abnormality detection system for intensive care management

Guangyan Huang; Jing He; Jie Cao; Zhi Qiao; Michael Steyn; Kersi Taraporewalla

Detecting abnormalities from multiple correlated time series is valuable to those applications where a credible realtime event prediction system will minimize economic losses (e.g. stock market crash) and save lives (e.g. medical surveillance in the operating theatre). For example, in an intensive care scenario, anesthetists perform a vital role in monitoring the patient and adjusting the flow and type of anesthetics to the patient during an operation. An early awareness of possible complications is vital for an anesthetist to correctly react to a given situation. In this demonstration, we provide a comprehensive medical surveillance system to effectively detect abnormalities from multiple physiological data streams for assisting online intensive care management. Particularly, a novel online support vector regression (OSVR) algorithm is developed to approach the problem of discovering the abnormalities from multiple correlated time series for accuracy and real-time efficiency. We also utilize historical data streams to optimize the precision of the OSVR algorithm. Moreover, this system comprises a friendly user interface by integrating multiple physiological data streams and visualizing alarms of abnormalities.

international world wide web conferences | 2010

An Improved Protocol for Deadlock and Livelock Avoidance Resource Co-allocation in Network Computing

Jie Cao; Zhiang Wu

A multitude of applications require simultaneous access to multiple kinds of resources scatted in distributed sites. This problem is known as resource co-allocation which has evolved as a hot topic in network computing. How to design a kind of high-performance protocol for deadlock and livelock avoidance resource co-allocation becomes a challenging problem. In this paper, we propose a new protocol OODP3 (Optimal ODP3) which is based on the currently popular protocol ODP3 (Order-based Deadlock Prevention Protocol with Parallel requests).OODP3 not only inherits the advantage of ODP3 but also guarantees the fulfillment of resource co-allocation within polynomial time. Theoretical proof is conducted to verify the correctness of OODP3. Experimental results also show that OODP3 achieves the better performance improvements than the existing deadlock and livelock avoidance protocol.

Explore More