Yanchi Liu
Rutgers University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yanchi Liu.
ubiquitous computing | 2011
Yu Zheng; Yanchi Liu; Jing Yuan; Xing Xie
Urban computing for city planning is one of the most significant applications in Ubiquitous computing. In this paper we detect flawed urban planning using the GPS trajectories of taxicabs traveling in urban areas. The detected results consist of 1) pairs of regions with salient traffic problems and 2) the linking structure as well as correlation among them. These results can evaluate the effectiveness of the carried out planning, such as a newly built road and subway lines in a city, and remind city planners of a problem that has not been recognized when they conceive future plans. We conduct our method using the trajectories generated by 30,000 taxis from March to May in 2009 and 2010 in Beijing, and evaluate our results with the real urban planning of Beijing.
international conference on data mining | 2010
Yanchi Liu; Zhongmou Li; Hui Xiong; Xuedong Gao; Junjie Wu
Clustering validation has long been recognized as one of the vital issues essential to the success of clustering applications. In general, clustering validation can be categorized into two classes, external clustering validation and internal clustering validation. In this paper, we focus on internal clustering validation and present a detailed study of 11 widely used internal clustering validation measures for crisp clustering. From five conventional aspects of clustering, we investigate their validation properties. Experiment results show that S\_Dbw is the only internal validation measure which performs well in all five aspects, while other measures have certain limitations in different application scenarios.
IEEE Transactions on Systems, Man, and Cybernetics | 2013
Yanchi Liu; Zhongmou Li; Hui Xiong; Xuedong Gao; Junjie Wu; Sen Wu
Clustering validation has long been recognized as one of the vital issues essential to the success of clustering applications. In general, clustering validation can be categorized into two classes, external clustering validation and internal clustering validation. In this paper, we focus on internal clustering validation and present a study of 11 widely used internal clustering validation measures for crisp clustering. The results of this study indicate that these existing measures have certain limitations in different application scenarios. As an alternative choice, we propose a new internal clustering validation measure, named clustering validation index based on nearest neighbors (CVNN), which is based on the notion of nearest neighbors. This measure can dynamically select multiple objects as representatives for different clusters in different situations. Experimental results show that CVNN outperforms the existing measures on both synthetic data and real-world data in different application scenarios.
international conference on data mining | 2014
Yanchi Liu; Chuanren Liu; Nicholas Jing Yuan; Lian Duan; Yanjie Fu; Hui Xiong; Songhua Xu; Junjie Wu
Optimal planning for public transportation is one of the keys to sustainable development and better quality of life in urban areas. Compared to private transportation, public transportation uses road space more efficiently and produces fewer accidents and emissions. In this paper, we focus on the identification and optimization of flawed bus routes to improve utilization efficiency of public transportation services, according to peoples real demand for public transportation. To this end, we first provide an integrated mobility pattern analysis between the location traces of taxicabs and the mobility records in bus transactions. Based on mobility patterns, we propose a localized transportation mode choice model, with which we can accurately predict the bus travel demand for different bus routing. This model is then used for bus routing optimization which aims to convert as many people from private transportation to public transportation as possible given budget constraints on the bus route modification. We also leverage the model to identify region pairs with flawed bus routes, which are effectively optimized using our approach. To validate the effectiveness of the proposed methods, extensive studies are performed on real world data collected in Beijing which contains 19 million taxi trips and 10 million bus trips.
Data Mining and Knowledge Discovery | 2012
Zhongmou Li; Hui Xiong; Yanchi Liu
Given a directed graph, the problem of blackhole mining is to identify groups of nodes, called blackhole patterns, in a way such that the average in-weight of this group is significantly larger than the average out-weight of the same group. The problem of finding volcano patterns is a dual problem of mining blackhole patterns. Therefore, we focus on discovering the blackhole patterns. Indeed, in this article, we develop a generalized blackhole mining framework. Specifically, we first design two pruning schemes for reducing the computational cost by reducing both the number of candidate patterns and the average computation cost for each candidate pattern. The first pruning scheme is to exploit the concept of combination dominance to reduce the exponential growth search space. Based on this pruning approach, we develop the gBlackhole algorithm. Instead, the second pruning scheme is an approximate approach, named approxBlackhole, which can strike a balance between the efficiency and the completeness of blackhole mining. Finally, experimental results on real-world data show that the performance of approxBlackhole can be several orders of magnitude faster than gBlackhole, and both of them have huge computational advantages over the brute-force approach. Also, we show that the blackhole mining algorithm can be used to capture some suspicious financial fraud patterns.
international conference on data mining | 2016
Zijun Yao; Yanjie Fu; Bin Liu; Yanchi Liu; Hui Xiong
Point of interest (POI) recommendation, which provides personalized recommendation of places to mobile users, is an important task in location-based social networks (LBSNs). However, quite different from traditional interest-oriented merchandise recommendation, POI recommendation is more complex due to the timing effects: we need to examine whether the POI fits a users availability. While there are some prior studies which included the temporal effect into POI recommendations, they overlooked the compatibility between time-varying popularity of POIs and regular availability of users, which we believe has a non-negligible impact on user decision-making. To this end, in this paper, we present a novel method which incorporates the degree of temporal matching between users and POIs into personalized POI recommendations. Specifically, we first profile the temporal popularity of POIs to show when a POI is popular for visit by mining the spatio-temporal human mobility and POI category data. Secondly, we propose latent user regularities to characterize when a user is regularly available for exploring POIs, which is learned with a user-POI temporal matching function. Finally, results of extensive experiments with real-world POI check-in and human mobility data demonstrate that our proposed user-POI temporal matching method delivers substantial advantages over baseline models for POI recommendation tasks.
international conference on data mining | 2010
Zhongmou Li; Hui Xiong; Yanchi Liu; Aoying Zhou
In this paper, we formulate a novel problem for finding black hole and volcano patterns in a large directed graph. Specifically, a black hole pattern is a group which is made of a set of nodes in a way such that there are only in links to this group from the rest nodes in the graph. In contrast, a volcano pattern is a group which only has out links to the rest nodes in the graph. Both patterns can be observed in real world. For instance, in a trading network, a black hole pattern may represent a group of traders who are manipulating the market. In the paper, we first prove that the black hole mining problem is a dual problem of finding volcanoes. Therefore, we focus on finding the black hole patterns. Along this line, we design two pruning schemes to guide the black hole finding process. In the first pruning scheme, we strategically prune the search space based on a set of pattern-size-independent pruning rules and develop an iBlack hole algorithm. The second pruning scheme follows a divide-and-conquer strategy to further exploit the pruning results from the first pruning scheme. Indeed, a target directed graphs can be divided into several disconnected sub graphs by the first pruning scheme, and thus the black hole finding can be conducted in each disconnected sub graph rather than in a large graph. Based on these two pruning schemes, we also develop an iBlackhole-DC algorithm. Finally, experimental results on real-world data show that the iBlackhole-DC algorithm can be several orders of magnitude faster than the iBlackhole algorithm, which has a huge computational advantage over a brute-force method.
knowledge discovery and data mining | 2017
Junming Liu; Leilei Sun; Qiao Li; Jingci Ming; Yanchi Liu; Hui Xiong
Bike sharing systems, aiming at providing the missing links in public transportation systems, are becoming popular in urban cities. Many providers of bike sharing systems are ready to expand their bike stations from the existing service area to surrounding regions. A key to success for a bike sharing systems expansion is the bike demand prediction for expansion areas. There are two major challenges in this demand prediction problem: First. the bike transition records are not available for the expansion area and second. station level bike demand have big variances across the urban city. Previous research efforts mainly focus on discovering global features, assuming the station bike demands react equally to the global features, which brings large prediction error when the urban area is large and highly diversified. To address these challenges, in this paper, we develop a hierarchical station bike demand predictor which analyzes bike demands from functional zone level to station level. Specifically, we first divide the studied bike stations into functional zones by a novel Bi-clustering algorithm which is designed to cluster bike stations with similar POI characteristics and close geographical distances together. Then, the hourly bike check-ins and check-outs of functional zones are predicted by integrating three influential factors: distance preference, zone-to-zone preference, and zone characteristics. The station demand is estimated by studying the demand distributions among the stations within the same functional zone. Finally, the extensive experimental results on the NYC Citi Bike system with two expansion stages show the advantages of our approach on station demand and balance prediction for bike sharing system expansions.
knowledge discovery and data mining | 2017
Yanchi Liu; Chuanren Liu; Xinjiang Lu; Mingfei Teng; Hengshu Zhu; Hui Xiong
Point-of-Interest (POI) demand modeling in urban regions is critical for many applications such as business site selection and real estate investment. While some efforts have been made for the demand analysis of some specific POI categories, such as restaurants, it lacks systematic means to support POI demand modeling. To this end, in this paper, we develop a systematic POI demand modeling framework, named Region POI Demand Identification (RPDI), to model POI demands by exploiting the daily needs of people identified from their large-scale mobility data. Specifically, we first partition the urban space into spatially differentiated neighborhood regions formed by many small local communities. Then, the daily activity patterns of people traveling in the city will be extracted from human mobility data. Since the trip activities, even aggregated, are sparse and insufficient to directly identify the POI demands, especially for underdeveloped regions, we develop a latent factor model that integrates human mobility data, POI profiles, and demographic data to robustly model the POI demand of urban regions in a holistic way. In this model, POI preferences and supplies are used together with demographic features to estimate the POI demands simultaneously for all the urban regions interconnected in the city. Moreover, we also design efficient algorithms to optimize the latent model for large-scale data. Finally, experimental results on real-world data in New York City (NYC) show that our method is effective for identifying POI demands for different regions.
database systems for advanced applications | 2016
Hongting Niu; Junming Liu; Yanjie Fu; Yanchi Liu; Bo Lang
Advances in sensor, wireless communication, and information infrastructure such as GPS have enabled us to collect massive amounts of human mobility data, which are fine-grained and have global road coverage. These human mobility data, if properly encoded with semantic information (i.e. combined with Point of Interests (POIs)), is appealing for changing the paradigm for gas station site selection. To this end, in this paper, we investigate how to exploit newly-generated human mobility data for enhancing gas station selection. Specifically, we develop a ranking system for evaluating the business performances of gas stations based on waiting time of refueling events by mining human mobility data. Along this line, we first design a method for detecting taxi refueling events by jointly tracking dwell times, GPS trace angles, location sequences, and refueling cycles of the vehicles. Also, we extract the fine-grained discriminative features strategically from POI data, human mobility data and road network data within the neighborhood of gas stations, and perform feature selection by simultaneously maximizing relevance and minimizing redundancy based on mutual information. In addition, we learn a ranking model for predicting gas station crowdedness by exploiting learning to rank techniques. The extensive experimental evaluation on real-world data also show the advantages of the proposed method over existing approaches for gas site selection.