Tieke He
Nanjing University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tieke He.
ACM Transactions on Intelligent Systems and Technology | 2016
Tieke He; Hongzhi Yin; Zhenyu Chen; Xiaofang Zhou; Shazia Wasim Sadiq; Bin Luo
Semantic tags of points of interest (POIs) are a crucial prerequisite for location search, recommendation services, and data cleaning. However, most POIs in location-based social networks (LBSNs) are either tag-missing or tag-incomplete. This article aims to develop semantic annotation techniques to automatically infer tags for POIs. We first analyze two LBSN datasets and observe that there are two types of tags, category-related ones and sentimental ones, which have unique characteristics. Category-related tags are hierarchical, whereas sentimental ones are category-aware. All existing related work has adopted classification methods to predict high-level category-related tags in the hierarchy, but they cannot apply to infer either low-level category tags or sentimental ones. In light of this, we propose a latent-class probabilistic generative model, namely the spatial-temporal topic model (STM), to infer personal interests, the temporal and spatial patterns of topics/semantics embedded in users’ check-in activities, the interdependence between category-topic and sentiment-topic, and the correlation between sentimental tags and rating scores from users’ check-in and rating behaviors. Then, this learned knowledge is utilized to automatically annotate all POIs with both category-related and sentimental tags in a unified way. We conduct extensive experiments to evaluate the performance of the proposed STM on a real large-scale dataset. The experimental results show the superiority of our proposed STM, and we also observe that the real challenge of inferring category-related tags for POIs lies in the low-level ones of the hierarchy and that the challenge of predicting sentimental tags are those with neutral ratings.
australasian database conference | 2015
Mengyu Dou; Tieke He; Hongzhi Yin; Xiaofang Zhou; Zhenyu Chen; Bin Luo
Transit prediction has long been a hot research problem, which is central to the public transport agencies and operators, as evidence to support scheduling and urban planning. There are several previous work aiming at transit prediction, but they are all from the macro perspective. In this paper, we study the prediction of individuals in the context of public transport. Existing research on the prediction of individual behaviour are mostly found in information retrieval and recommender systems, leaving it untouched in the area of public transport. We propose a NLP based back-propagation neural network for the prediction job in this paper. Specifically, we adopt the concept of “bag of words” to build user profile, and use the result of clustering as input of back-propagation neural network to generate predictions. To illustrate the effectiveness of our method, we conduct an extensive set of experiments on a dataset from public transport fare collecting system. Our detailed experimental evaluation demonstrates that our method gets good performance on predicting public transport individuals.
World Wide Web | 2017
Tieke He; Zhenyu Chen; Jia Liu; Xiaofang Zhou; Xingzhong Du; Weiqing Wang
User based collaborative filtering (CF) has been successfully applied into recommender system for years. The main idea of user based CF is to discover communities of users sharing similar interests, thus, in which, the measurement of user similarity is the foundation of CF. However, existing user based CF methods suffer from data sparsity, which means the user-item matrix is often too sparse to get ideal outcome in recommender systems. One possible way to alleviate this problem is to bring new data sources into user based CF. Thanks to the rapid development of social annotation systems, we turn to using tags as new sources. In these approaches, user-topic rating based CF is proposed to extract topics from tags using different topic model methods, based on which we compute the similarities between users by measuring their preferences on topics. In this paper, we conduct comparisons between three user-topic rating based CF methods, using PLSA, Hierarchical Clustering and LDA. All these three methods calculate user-topic preferences according to their ratings of items and topic weights. We conduct the experiments using the MovieLens dataset. The experimental results show that LDA based user-topic rating CF and Hierarchical Clustering outperforms the traditional user based CF in recommending accuracy, while the PLSA based user-topic rating CF performs worse than the traditional user based CF.
australasian database conference | 2015
Tieke He; Hongzhi Yin; Zhenyu Chen; Xiaofang Zhou; Bin Luo
Some E-commerce giants (e.g., Amazon and Jingdong) with abundant purchasing data achieve highly accurate recommendations, since people have to pay for their choices and their purchasing behaviors are more qualified and valid for capturing users’ needs and preferences than other types of users’ behavior data (e.g., browsing). However, there is not enough users’ purchasing data available for most of small and medium-size E-commerce sites as well as some newly established E-commerce sites. In this paper, we aim to alleviate the sparsity of users’ purchasing data by exploiting users’ browsing data which is more sufficient. The low validity and reliability of users’ browsing data raises great challenge for accurately predicting users’ purchasing behaviors since there are many factors leading to users’ browsing behaviors. To this end, we propose a novel semi-supervised method to make the most of both high-quality purchasing data and low-quality browsing data to predict users’ purchasing behaviors. Specifically, we first use a small amount of purchasing data to supervise the model training of browsing data, and then integrate the results into the item-based collaborative filtering method. We conduct extensive experiments on a real dataset, and the experimental results show the superiority of our method by achieving 25% improvements over traditional collaborative-filtering methods.
Archive | 2018
Hao Lian; Zemin Qin; Hangcheng Song; Tieke He
Everyday, millions of crowdsourcing tasks are accomplished in exchange for payments. Pricing acts as an important role in crowdsourcing campaigns, not only for the interest of requesters and workers, but also for the fair competition among the crowdsourcing markets, as well as its sustainable development. All the previous pricing strategies are based on the evaluation of results, however, in the scenario of crowdsourced android testing (CAT), the testing process of a worker is a factor that we cannot overlook. In this paper, we propose a unified model that combines Evaluation on both process and results of CAT (E-CAT). And based on the proposed E-CAT, we can construct the pricing strategy for CAT. On one hand, E-CAT enables the requesters to investigate the testing process of a worker from both aspects of depth and width. On the other hand, it helps the requesters evaluate the coming-outs of each worker.
Journal of Computer Science and Technology | 2018
Tieke He; Hao Lian; Zemin Qin; Zhenyu Chen; Bin Luo
Deciding the penalty of a law case has always been a complex process, which may involve with much coordination. Despite the judicial study based on the rules and conditions, artificial intelligence and machine learning has rarely been used to study the problem of penalty inferring, leaving the large amount of law cases as well as various factors among them untouched. This paper aims to incorporate the state-of-the-art artificial intelligence methods to exploit to what extent this problem can be alleviated. We first analyze 145 000 law cases and observe that there are two sorts of labels, temporal labels and spatial labels, which have unique characteristics. Temporal labels and spatial labels tend to converge towards the final penalty, on condition that the cases are of the same category. In light of this, we propose a latent-class probabilistic generative model, namely Penalty Topic Model (PTM), to infer the topic of law cases, and the temporal and spatial patterns of topics embedded in the case judgment. Then, the learnt knowledge is utilized to automatically cluster all cases accordingly in a unified way. We conduct extensive experiments to evaluate the performance of the proposed PTM on a real large-scale dataset of law cases. The experimental results show the superiority of our proposed PTM.
international conference on software engineering | 2017
Yabin Wang; Tieke He; Weiqiang Zhang; Chunrong Fang; Bin Luo
To improve software quality, developers often open a bug repository and allow users to find bugs, describe bugs in the form of bug reports and submit bug reports to the repository. Based on the description, testers assign a priority to each bug report. In the beginning the process of priority assignment is performed manually. With the increasing amount of bug reports, researchers introduced classification methods to assign priorities automatically with all the features considered. In this paper feature selection methods are introduced to improve the effect of bug report prioritization using classification models. The experimental results show that feature selection based on Information Gain and Pearson Correlation can improve the precision and recall for bug report prioritization on two models, i.e., SVM and Naive Bayes.
software engineering and knowledge engineering | 2016
Zhengjie Xu; Tieke He; Weiqiang Zhang; Yabin Wang; Jia Liu; Zhenyu Chen
Time factor has been widely applied into a wide range of data mining areas, such as social network and information retrieval. The main idea of taking time factor into consideration is that human activities may have some relations to time pattern. However, little attention has been pulled on the study of time factor in the area of software engineering. In this paper, we endeavour to explore to what extent time factor affects the prioritization of bug reports, a specified while important task in software engineering. Specifically, we test four time factors that may have some influence on this task, which are time of day, normal time, day of week, and days to major version. After the validation of relatedness of all these factors, we conduct an extensive set of experiments on two datasets to verify the effectiveness of these factors. The experimental results demonstrate that we can effectively improve the results by metrics of both Precision and Recall, on two classical models, i.e., the SVM model and Naive Bayes model.
Archive | 2012
Zhenyu Chen; Xingzhong Du; Jia Liu; Chengfeng Hui; Tieke He
software engineering and knowledge engineering | 2013
Tieke He; Xingzhong Du; Weiqing Wang; Zhenyu Chen; Jia Liu