Chuanren Liu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chuanren Liu is active.

Explore More

Publication

Featured researches published by Chuanren Liu.

international conference on data mining | 2011

A Taxi Driving Fraud Detection System

Yong Ge; Hui Xiong; Chuanren Liu; Zhi-Hua Zhou

Advances in GPS tracking technology have enabled us to install GPS tracking devices in city taxis to collect a large amount of GPS traces under operational time constraints. These GPS traces provide unparallel opportunities for us to uncover taxi driving fraud activities. In this paper, we develop a taxi driving fraud detection system, which is able to systematically investigate taxi driving fraud. In this system, we first provide functions to find two aspects of evidences: travel route evidence and driving distance evidence. Furthermore, a third function is designed to combine the two aspects of evidences based on Dempster-Shafer theory. To implement the system, we first identify interesting sites from a large amount of taxi GPS logs. Then, we propose a parameter-free method to mine the travel route evidences. Also, we introduce route mark to represent a typical driving path from an interesting site to another one. Based on route mark, we exploit a generative statistical model to characterize the distribution of driving distance and identify the driving distance evidences. Finally, we evaluate the taxi driving fraud detection system with large scale real-world taxi GPS logs. In the experiments, we uncover some regularity of driving fraud activities and investigate the motivation of drivers to commit a driving fraud by analyzing the produced taxi fraud data.

knowledge discovery and data mining | 2015

Temporal Phenotyping from Longitudinal Electronic Health Records: A Graph Based Framework

Chuanren Liu; Fei Wang; Jianying Hu; Hui Xiong

The rapid growth in the development of healthcare information systems has led to an increased interest in utilizing the patient Electronic Health Records (EHR) for assisting disease diagnosis and phenotyping. The patient EHRs are generally longitudinal and naturally represented as medical event sequences, where the events include clinical notes, problems, medications, vital signs, laboratory reports, etc. The longitudinal and heterogeneous properties make EHR analysis an inherently difficult challenge. To address this challenge, in this paper, we develop a novel representation, namely the temporal graph, for such event sequences. The temporal graph is informative for a variety of challenging analytic tasks, such as predictive modeling, since it can capture temporal relationships of the medical events in each event sequence. By summarizing the longitudinal data, the temporal graphs are also robust and resistant to noisy and irregular observations. Based on the temporal graph representation, we further develop an approach for temporal phenotyping to identify the most significant and interpretable graph basis as phenotypes. This helps us better understand the disease evolving patterns. Moreover, by expressing the temporal graphs with the phenotypes, the expressing coefficients can be used for applications such as personalized medicine, disease diagnosis, and patient segmentation. Our temporal phenotyping framework is also flexible to incorporate semi-supervised/supervised information. Finally, we validate our framework on two real-world tasks. One is predicting the onset risk of heart failure. Another is predicting the risk of heart failure related hospitalization for patients with COPD pre-condition. Our results show that the diagnosis performance in both tasks can be improved significantly by the proposed approaches. Also, we illustrate some interesting phenotypes derived from the data.

knowledge discovery and data mining | 2011

A taxi business intelligence system

Yong Ge; Chuanren Liu; Hui Xiong; Jian Chen

The increasing availability of large-scale location traces creates unprecedent opportunities to change the paradigm for knowledge discovery in transportation systems. A particularly promising area is to extract useful business intelligence, which can be used as guidance for reducing inefficiencies in energy consumption of transportation sectors, improving customer experiences, and increasing business performances. However, extracting business intelligence from location traces is not a trivial task. Conventional data analytic tools are usually not customized for handling large, complex, dynamic, and distributed nature of location traces. To that end, we develop a taxi business intelligence system to explore the massive taxi location traces from different business perspectives with various data mining functions. Since we implement the system using the real-world taxi GPS data, this demonstration will help taxi companies to improve their business performances by understanding the behaviors of both drivers and customers. In addition, several identified technical challenges also motivate data mining people to develop more sophisticate techniques in the future.

European Journal of Operational Research | 2016

Instance-based credit risk assessment for investment decisions in P2P lending

Yanhong Guo; Wenjun Zhou; Chunyu Luo; Chuanren Liu; Hui Xiong

Recent years have witnessed increased attention on peer-to-peer (P2P) lending, which provides an alternative way of financing without the involvement of traditional financial institutions. A key challenge for personal investors in P2P lending marketplaces is the effective allocation of their money across different loans by accurately assessing the credit risk of each loan. Traditional rating-based assessment models cannot meet the needs of individual investors in P2P lending, since they do not provide an explicit mechanism for asset allocation. In this study, we propose a data-driven investment decision-making framework for this emerging market. We designed an instance-based credit risk assessment model, which has the ability of evaluating the return and risk of each individual loan. Moreover, we formulated the investment decision in P2P lending as a portfolio optimization problem with boundary constraints. To validate the proposed model, we performed extensive experiments on real-world datasets from two notable P2P lending marketplaces. Experimental results revealed that the proposed model can effectively improve investment performances compared with existing methods in P2P lending.

IEEE Transactions on Systems, Man, and Cybernetics | 2015

Popularity Modeling for Mobile Apps: A Sequential Approach

Hengshu Zhu; Chuanren Liu; Yong Ge; Hui Xiong; Enhong Chen

The popularity information in App stores, such as chart rankings, user ratings, and user reviews, provides an unprecedented opportunity to understand user experiences with mobile Apps, learn the process of adoption of mobile Apps, and thus enables better mobile App services. While the importance of popularity information is well recognized in the literature, the use of the popularity information for mobile App services is still fragmented and under-explored. To this end, in this paper, we propose a sequential approach based on hidden Markov model (HMM) for modeling the popularity information of mobile Apps toward mobile App services. Specifically, we first propose a popularity based HMM (PHMM) to model the sequences of the heterogeneous popularity observations of mobile Apps. Then, we introduce a bipartite based method to precluster the popularity observations. This can help to learn the parameters and initial values of the PHMM efficiently. Furthermore, we demonstrate that the PHMM is a general model and can be applicable for various mobile App services, such as trend based App recommendation, rating and review spam detection, and ranking fraud detection. Finally, we validate our approach on two real-world data sets collected from the Apple Appstore. Experimental results clearly validate both the effectiveness and efficiency of the proposed popularity modeling approach.

knowledge discovery and data mining | 2014

Temporal skeletonization on sequential data: patterns, categorization, and visualization

Chuanren Liu; Kai Zhang; Hui Xiong; Guofei Jiang; Qiang Yang

Sequential pattern analysis aims at finding statistically relevant temporal structures where the values are delivered in a sequence. With the growing complexity of real-world dynamic scenarios, more and more symbols are often needed to encode the sequential values. This is so-called “curse of cardinality”, which can impose significant challenges to the design of sequential analysis methods in terms of computational efficiency and practical use. Indeed, given the overwhelming scale and the heterogeneous nature of the sequential data, new visions and strategies are needed to face the challenges. To this end, in this paper, we propose a “temporal skeletonization” approach to proactively reduce the cardinality of the representation for sequences by uncovering significant, hidden temporal structures. The key idea is to summarize the temporal correlations in an undirected graph, and use the “skeleton” of the graph as a higher granularity on which hidden temporal patterns are more likely to be identified. As a consequence, the embedding topology of the graph allows us to translate the rich temporal content into a metric space. This opens up new possibilities to explore, quantify, and visualize sequential data. Our approach has shown to greatly alleviate the curse of cardinality in challenging tasks of sequential pattern mining and clustering. Evaluation on a business-to-business (B2B) marketing application demonstrates that our approach can effectively discover critical buying paths from noisy customer event data.

international conference on data mining | 2014

Exploiting Heterogeneous Human Mobility Patterns for Intelligent Bus Routing

Yanchi Liu; Chuanren Liu; Nicholas Jing Yuan; Lian Duan; Yanjie Fu; Hui Xiong; Songhua Xu; Junjie Wu

Optimal planning for public transportation is one of the keys to sustainable development and better quality of life in urban areas. Compared to private transportation, public transportation uses road space more efficiently and produces fewer accidents and emissions. In this paper, we focus on the identification and optimization of flawed bus routes to improve utilization efficiency of public transportation services, according to peoples real demand for public transportation. To this end, we first provide an integrated mobility pattern analysis between the location traces of taxicabs and the mobility records in bus transactions. Based on mobility patterns, we propose a localized transportation mode choice model, with which we can accurately predict the bus travel demand for different bus routing. This model is then used for bus routing optimization which aims to convert as many people from private transportation to public transportation as possible given budget constraints on the bus route modification. We also leverage the model to identify region pairs with flawed bus routes, which are effectively optimized using our approach. To validate the effectiveness of the proposed methods, extensive studies are performed on real world data collected in Beijing which contains 19 million taxi trips and 10 million bus trips.

Knowledge and Information Systems | 2014

High-dimensional clustering: a clique-based hypergraph partitioning framework

Tianming Hu; Chuanren Liu; Yong Tang; Jing Sun; Hui Xiong; Sam Yuan Sung

Hypergraph partitioning has been considered as a promising method to address the challenges of high-dimensional clustering. With objects modeled as vertices and the relationship among objects captured by the hyperedges, the goal of graph partitioning is to minimize the edge cut. Therefore, the definition of hyperedges is vital to the clustering performance. While several definitions of hyperedges have been proposed, a systematic understanding of desired characteristics of hyperedges is still missing. To that end, in this paper, we first provide a unified clique perspective of the definition of hyperedges, which serves as a guide to define hyperedges. With this perspective, based on the concepts of shared (reverse) nearest neighbors, we propose two new types of clique hyperedges and analyze their properties regarding purity and size issues. Finally, we present an extensive evaluation using real-world document datasets. The experimental results show that, with shared (reverse) nearest neighbor-based hyperedges, the clustering performance can be improved significantly in terms of various external validation measures without the need for fine tuning of parameters.

international conference on data mining | 2012

A Stochastic Model for Context-Aware Anomaly Detection in Indoor Location Traces

Chuanren Liu; Hui Xiong; Yong Ge; Wei Geng; Matt Perkins

Rapid growth in the development of real-time location system solutions has led to an increased interest in indoor location-aware services, such as hospital asset management. Although there are extensive studies in the literature on the analysis of outdoor location traces, the studies of indoor location traces are less touched and fragmented. To that end, in this paper, we provide a focused study of indoor location traces collected by the sensors attached to medical devices in a hospital environment. Along this line, we first introduce some unique properties of these indoor location traces. We show that they can capture the movement patterns of the medical devices, which are tightly coupled with the work flow in the controlled hospital environment. Based on this observation, we propose a stochastic model for context-aware anomaly detection in indoor location traces, which exploits the hospital work flow and models the movements of medical devices as transitions in finite state machines. In detail, we first develop a density-based method to identify the hotspots filled with high-level abnormal activities in the indoor environment. The discovered hotspots serve as the context for nearby trajectories. Then, we introduce an N-gram based method for measuring the degree of anomaly based on the detected hotspots, which is able to predict the missing events possibly due to the devices being stolen. Besides, to address the noisy nature of the indoor sensor networks, we also propose an iterative algorithm to estimate the transition probabilities. This algorithm allows to effectively recover the missing location records which are critical for the abnormality estimation. Finally, the experimental results on the real-world date sets validate the effectiveness of the proposed context-aware anomaly detection method for identifying abnormal events.

IEEE Transactions on Knowledge and Data Engineering | 2016