Is this you? Create Your Porfile

Leilei Sun

Dalian University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Leilei Sun is active.

Explore More

Publication

Featured researches published by Leilei Sun.

knowledge discovery and data mining | 2017

Functional Zone Based Hierarchical Demand Prediction For Bike System Expansion

Junming Liu; Leilei Sun; Qiao Li; Jingci Ming; Yanchi Liu; Hui Xiong

Bike sharing systems, aiming at providing the missing links in public transportation systems, are becoming popular in urban cities. Many providers of bike sharing systems are ready to expand their bike stations from the existing service area to surrounding regions. A key to success for a bike sharing systems expansion is the bike demand prediction for expansion areas. There are two major challenges in this demand prediction problem: First. the bike transition records are not available for the expansion area and second. station level bike demand have big variances across the urban city. Previous research efforts mainly focus on discovering global features, assuming the station bike demands react equally to the global features, which brings large prediction error when the urban area is large and highly diversified. To address these challenges, in this paper, we develop a hierarchical station bike demand predictor which analyzes bike demands from functional zone level to station level. Specifically, we first divide the studied bike stations into functional zones by a novel Bi-clustering algorithm which is designed to cluster bike stations with similar POI characteristics and close geographical distances together. Then, the hourly bike check-ins and check-outs of functional zones are predicted by integrating three influential factors: distance preference, zone-to-zone preference, and zone characteristics. The station demand is estimated by studying the demand distributions among the stations within the same functional zone. Finally, the extensive experimental results on the NYC Citi Bike system with two expansion stages show the advantages of our approach on station demand and balance prediction for bike sharing system expansions.

Knowledge and Information Systems | 2017

Fast affinity propagation clustering based on incomplete similarity matrix

Leilei Sun; Chonghui Guo; Chuanren Liu; Hui Xiong

Affinity propagation (AP) is a recently proposed clustering algorithm, which has been successful used in a lot of practical problems. Although effective in finding meaningful clustering solutions, a key disadvantage of AP is its efficiency, which has become the bottleneck when applying AP for large-scale problems. In the literature, most of the methods proposed to improve the efficiency of AP are based on implementing the message-passing on a sparse similarity matrix, while neither the decline in effectiveness nor the improvement in efficiency is theoretically analyzed. In this paper, we propose a two-stage fast affinity propagation (FastAP) algorithm. Different from previous work, the scale of the similarity matrix is first compressed by selecting only potential exemplars, then further reduced by sparseness according to k nearest neighbors. More importantly, we provide theoretical analysis, based on which the improvement of efficiency in our method is controllable with guaranteed clustering performance. In experiments, two synthetic data sets, seven publicly available data sets, and two real-world streaming data sets are used to evaluate the proposed method. The results demonstrate that FastAP can achieve comparable clustering performances with the original AP algorithm, while the computational efficiency has been improved with a several-fold speed-up on small data sets and a dozens-of-fold on larger-scale data sets.

ubiquitous computing | 2016

Characterizing the life cycle of point of interests using human mobility patterns

Xinjiang Lu; Zhiwen Yu; Leilei Sun; Chuanren Liu; Hui Xiong; Chu Guan

A Point of Interest (POI) refers to a specific location that people may find useful or interesting. While a large body of research has been focused on identifying and recommending POIs, there are few studies on characterizing the life cycle of POIs. Indeed, a comprehensive understanding of POI life cycle can be helpful for various tasks, such as urban planning, business site selection, and real estate evaluation. In this paper, we develop a framework, named POLIP, for characterizing the POI life cycle with multiple data sources. Specifically, to investigate the POI evolution process over time, we first formulate a serial classification problem to predict the life status of POIs. The prediction approach is designed to integrate two important perspectives: 1) the spatial-temporal dependencies associated with the prosperity of POIs, and 2) the human mobility dynamics hidden in the citywide taxicab data related to the POIs at multiple granularity levels. In addition, based on the predicted life statuses in successive time windows for a given POI, we design an algorithm to characterize its life cycle. Finally, we performed extensive experiments using large-scale and real-world datasets. The results demonstrate the feasibility in automatic characterizing POI life cycle and shed important light on future research directions.

knowledge discovery and data mining | 2016

Data-driven Automatic Treatment Regimen Development and Recommendation

Leilei Sun; Chuanren Liu; Chonghui Guo; Hui Xiong; Yanming Xie

The analysis of large-scale Electrical Medical Records (EMRs) has the potential to develop and optimize clinical treatment regimens. A treatment regimen usually includes a series of doctor orders containing rich temporal and heterogeneous information. However, in many existing studies, a doctor order is simplified as an event code and a treatment record is simplified as a code sequence. Thus, the information inherent in doctor orders is not fully used for in-depth analysis. In this paper, we aim at exploiting the rich information in doctor orders and developing data-driven approaches for improving clinical treatments. To this end, we first propose a novel method to measure the similarities between treatment records with consideration of sequential and multifaceted information in doctor orders. Then, we propose an efficient density-based clustering algorithm to summarize large-scale treatment records, and extract a semantic representation of each treatment cluster. Finally, we develop a unified framework to evaluate the discovered treatment regimens, and find the most effective treatment regimen for new patients. In the empirical study, we validate our methods with EMRs of 27,678 patients from 14 hospitals. The results show that: 1) Our method can successfully extract typical treatment regimens from large-scale treatment records. The extracted treatment regimens are intuitive and provide managerial implications for treatment regimen design and optimization. 2) By recommending the most effective treatment regimens, the total cure rate in our data improves from 19.89% to 21.28%, and the effective rate increases up to 98.29%.

knowledge discovery and data mining | 2017

A Data-driven Process Recommender Framework

Sen Yang; Xin Dong; Leilei Sun; Yichen Zhou; Richard A. Farneth; Hui Xiong; Randall S. Burd; Ivan Marsic

We present an approach for improving the performance of complex knowledge-based processes by providing data-driven step-by-step recommendations. Our framework uses the associations between similar historic process performances and contextual information to determine the prototypical way of enacting the process. We introduce a novel similarity metric for grouping traces into clusters that incorporates temporal information about activity performance and handles concurrent activities. Our data-driven recommender system selects the appropriate prototype performance of the process based on user-provided context attributes. Our approach for determining the prototypes discovers the commonly performed activities and their temporal relationships. We tested our system on data from three real-world medical processes and achieved recommendation accuracy up to an F1 score of 0.77 (compared to an F1 score of 0.37 using ZeroR) with 63.2% of recommended enactments being within the first five neighbors of the actual historic enactments in a set of 87 cases. Our framework works as an interactive visual analytic tool for process mining. This work shows the feasibility of data-driven decision support system for complex knowledge-based processes.

knowledge discovery and data mining | 2017

Effective and Real-time In-App Activity Analysis in Encrypted Internet Traffic Streams

Junming Liu; Yanjie Fu; Jingci Ming; Yong Ren; Leilei Sun; Hui Xiong

The mobile in-App service analysis, aiming at classifying mobile internet traffic into different types of service usages, has become a challenging and emergent task for mobile service providers due to the increasing adoption of secure protocols for in-App services. While some efforts have been made for the classification of mobile internet traffic, existing methods rely on complex feature construction and large storage cache, which lead to low processing speed, and thus not practical for online real-time scenarios. To this end, we develop an iterative analyzer for classifying encrypted mobile traffic in a real-time way. Specifically, we first select an optimal set of most discriminative features from raw features extracted from traffic packet sequences by a novel Maximizing Inner activity similarity and Minimizing Different activity similarity (MIMD) measurement. To develop the online analyzer, we first represent a traffic flow with a series of time windows, which are described by the optimal feature vector and are updated iteratively at the packet level. Instead of extracting feature elements from a series of raw traffic packets, our feature elements are updated when a new traffic packet is observed and the storage of raw traffic packets is not required. The time windows generated from the same service usage activity are grouped by our proposed method, namely, recursive time continuity constrained KMeans clustering (rCKC). The feature vectors of cluster centers are then fed into a random forest classifier to identify corresponding service usages. Finally, we provide extensive experiments on real-world Internet traffic data from Wechat, Whatsapp, and Facebook to demonstrate the effectiveness and efficiency of our approach. The results show that the proposed analyzer provides high accuracy in real-world scenarios, and has low storage cache requirement as well as fast processing speed.

Information Sciences | 2019

Unsupervised EEG feature extraction based on echo state network

Leilei Sun; Bo Jin; Haoyu Yang; Jianing Tong; Chuanren Liu; Hui Xiong

Abstract Advanced analytics such as event detection, pattern recognition, clustering, and classification with electroencephalogram (EEG) data often rely on extracted EEG features. Most of the existing EEG feature extraction approaches are hand-designed with expert knowledge or prior assumptions, which may lead to inferior analytical performances. In this paper, we develop a fully data-driven EEG feature extraction method by applying recurrent autoencoders on multivariate EEG signals. We use an Echo State Network (ESN) to encode EEG signals to EEG features, and then decode them to recover the original EEG signals. Therefore, we name our method feature extraction based on echo state network, or simply FE-ESN. We show that the well-known autoregression-based EEG feature extraction can be seen as a simplified variation of our FE-ESN method. We have conducted experiments on real-world EEG data to evaluate the effectiveness of FE-ESN for both classification tasks and clustering tasks. Experimental results demonstrate the superiority of FE-ESN over the state-of-the-art methods. This paper not only provides a novel EEG feature extraction method but also opens up a new way towards unsupervised EEG feature design.

knowledge discovery and data mining | 2018

A Treatment Engine by Predicting Next-Period Prescriptions

Bo Jin; Haoyu Yang; Leilei Sun; Chuanren Liu; Yue Qu; Jianing Tong

Recent years have witnessed an opportunity for improving healthcare efficiency and quality by mining Electronic Medical Records (EMRs). This paper is aimed at developing a treatment engine, which learns from historical EMR data and provides a patient with next-period prescriptions based on disease conditions, laboratory results, and treatment records of the patient. Importantly, the engine takes consideration of both treatment records and physical examination sequences which are not only heterogeneous and temporal in nature but also often with different record frequencies and lengths. Moreover, the engine also combines static information (e.g., demographics) with the temporal sequences to provide personalized treatment prescriptions to patients. In this regard, a novel Long Short-Term Memory (LSTM) learning framework is proposed to model inter-correlations of different types of medical sequences by connections between hidden neurons. With this framework, we develop three multifaceted LSTM models: Fully Connected Heterogeneous LSTM, Partially Connected Heterogeneous LSTM, and Decomposed Heterogeneous LSTM. The experiments are conducted on two datasets: one is the public MIMIC-III ICU data, and the other comes from several Chinese hospitals. Experimental results reveal the effectiveness of the framework and the three models. The work is deemed important and meaningful for both academia and practitioners in the realm of medical treatment and prediction, as well as in other fields of applications where intelligent decision support becomes pervasive.

Marine Structures | 2014