Is this you? Create Your Porfile

Jialong Han

Nanyang Technological University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jialong Han is active.

Explore More

Publication

Featured researches published by Jialong Han.

conference on information and knowledge management | 2009

Efficient algorithms for approximate member extraction using signature-based inverted lists

Jiaheng Lu; Jialong Han; Xiaofeng Meng

We study the problem of approximate membership extraction (AME), i.e., how to efficiently extract substrings in a text document that approximately match some strings in a given dictionary. This problem is important in a variety of applications such as named entity recognition and data cleaning. We solve this problem in two steps. In the first step, for each substring in the text, we filter away the strings in the dictionary that are very different from the substring. In the second step, each candidate string is verified to decide whether the substring should be extracted. We develop an incremental algorithm using signature-based inverted lists to minimize the duplicate list-scan operations of overlapping windows in the text. Our experimental study of the proposed algorithms on real and synthetic datasets showed that our solutions significantly outperform existing methods in the literature.

international world wide web conferences | 2016

Joint Recognition and Linking of Fine-Grained Locations from Tweets

Zongcheng Ji; Aixin Sun; Gao Cong; Jialong Han

Many users casually reveal their locations such as restaurants, landmarks, and shops in their tweets. Recognizing such fine-grained locations from tweets and then linking the location mentions to well-defined location profiles (e.g., with formal name, detailed address, and geo-coordinates etc.) offer a tremendous opportunity for many applications. Different from existing solutions which perform location recognition and linking as two sub-tasks sequentially in a pipeline setting, in this paper, we propose a novel joint framework to perform location recognition and location linking simultaneously in a joint search space. We formulate this end-to-end location linking problem as a structured prediction problem and propose a beam-search based algorithm. Based on the concept of multi-view learning, we further enable the algorithm to learn from unlabeled data to alleviate the dearth of labeled data. Extensive experiments are conducted to recognize locations mentioned in tweets and link them to location profiles in Foursquare. Experimental results show that the proposed joint learning algorithm outperforms the state-of-the-art solutions, and learning from unlabeled data improves both the recognition and linking accuracy.

conference on information and knowledge management | 2017

NeuPL: Attention-based Semantic Matching and Pair-Linking for Entity Disambiguation

Minh C. Phan; Aixin Sun; Yi Tay; Jialong Han; Chenliang Li

Entity disambiguation, also known as entity linking, is the task of mapping mentions in text to the corresponding entities in a given knowledge base, e.g. Wikipedia. Two key challenges are making use of mentions context to disambiguate (i.e. local objective), and promoting coherence of all the linked entities (i.e. global objective). In this paper, we propose a deep neural network model to effectively measure the semantic matching between mentions context and target entity. We are the first to employ the long short-term memory (LSTM) and attention mechanism for entity disambiguation. We also propose Pair-Linking, a simple but effective and significantly fast linking algorithm. Pair-Linking iteratively identifies and resolves pairs of mentions, starting from the most confident pair. It finishes linking all mentions in a document by scanning the pairs of mentions at most once. Our neural network model combined with Pair-Linking, named NeuPL, outperforms state-of-the-art systems over different types of documents including news, RSS, and tweets.

international conference on data engineering | 2016

Discovering Neighborhood Pattern Queries by sample answers in knowledge base

Jialong Han; Kai Zheng; Aixin Sun; Shuo Shang; Ji-Rong Wen

Knowledge bases have shown their effectiveness in facilitating services like Web search and question-answering. Nevertheless, it remains challenging for ordinary users to fully understand the structure of a knowledge base and to issue structural queries. In many cases, users may have a natural language question and also know some popular (but not all) entities as sample answers. In this paper, we study the Reverse top-k Neighborhood Pattern Query problem, with the aim of discovering structural queries of the question based on: (i) the structure of the knowledge base, and (ii) the sample answers of the question. The proposed solution contains two phases: filter and refine. In the filter phase, a search space of candidate queries is systematically explored. The invalid queries whose result sets do not fully cover the sample answers are filtered out. In the refine phase, all surviving queries are verified to ensure that they are sufficiently relevant to the sample answers, with the assumption that the sample answers are more well-known or popular than other entities in the results of relevant queries. Several optimization techniques are proposed to accelerate the refine phrase. For evaluation, we conduct extensive experiments using the DBpedia knowledge base and a set of real-life questions. Empirical results show that our algorithm is able to provide a small set of possible queries, which contains the query matching the user question in natural language.

conference on information and knowledge management | 2014

Within-Network Classification Using Radius-Constrained Neighborhood Patterns

Jialong Han; Ji-Rong Wen; Jian Pei

Within-Network Classification (WNC) techniques are designed for applications where objects to be classified and those with known labels are interlinked. For WNC tasks like web page classification, the homophily principle succeeds by assuming that linked objects, represented as adjacent vertices in a network, are likely to have the same labels. However, in other tasks like chemical structure completion, recent works suggest that the label of a vertex should be related to the local structure it resides in, rather than equated with those of its neighbors. These works also propose structure-aware vertex features or methods to deal with such an issue. In this paper, we demonstrate that frequent neighborhood patterns, originally studied in the pattern mining literature, serve as a strong class of structure-aware features and provide satisfactory effectiveness in WNC. In addition, we identify the problem that the neighborhood pattern miner indiscriminately mines patterns of all radiuses, while heuristics and experiments both indicate that patterns with a large radius take much time only to bring negligible effectiveness gains. We develop a specially designed algorithm capable of working under radius threshold constraints, by which patterns with a large radius are not mined at all. Experiments suggest that our algorithm helps with the trade-off between efficiency and effectiveness in WNC tasks.

IEEE Transactions on Knowledge and Data Engineering | 2018

A Survey of Location Prediction on Twitter

Xin Zheng; Jialong Han; Aixin Sun

Locations, e.g., countries, states, cities, and point-of-interests, are central to news, emergency events, and peoples daily lives. Automatic identification of locations associated with or mentioned in documents has been explored for decades. As one of the most popular online social network platforms, Twitter has attracted a large number of users who send millions of tweets on daily basis. Due to the world-wide coverage of its users and real-time freshness of tweets, location prediction on Twitter has gained significant attention in recent years. Research efforts are spent on dealing with new challenges and opportunities brought by the noisy, short, and context-rich nature of tweets. In this survey, we aim at offering an overall picture of location prediction on Twitter. Specifically, we concentrate on the prediction of user home locations, tweet locations, and mentioned locations. We first define the three tasks and review the evaluation metrics. By summarizing Twitter network, tweet content, and tweet context as potential inputs, we then structurally highlight how the problems depend on these inputs. Each dependency is illustrated by a comprehensive review of the corresponding strategies adopted in state-of-the-art approaches. In addition, we also briefly review two related problems, i.e., semantic location prediction and point-of-interest recommendation. Finally, we make a conclusion of the survey and list future research directions.

Knowledge and Information Systems | 2018

A time-aware trajectory embedding model for next-location recommendation

Wayne Xin Zhao; Ningnan Zhou; Aixin Sun; Ji-Rong Wen; Jialong Han; Edward Y. Chang

Next-location recommendation is an emerging task with the proliferation of location-based services. It is the task of recommending the next location to visit for a user, given her past check-in records. Although several principled solutions have been proposed for this task, existing studies have not well characterized the temporal factors in the recommendation. From three real-world datasets, our quantitative analysis reveals that temporal factors play an important role in next-location recommendation, including the periodical temporal preference and dynamic personal preference. In this paper, we propose a Time-Aware Trajectory Embedding Model (TA-TEM) to incorporate three kinds of temporal factors in next-location recommendation. Based on distributed representation learning, the proposed TA-TEM jointly models multiple kinds of temporal factors in a unified manner. TA-TEM also enhances the sequential context by using a longer context window. Experiments show that TA-TEM outperforms several competitive baselines.

IEEE Transactions on Knowledge and Data Engineering | 2018

Linking Fine-Grained Locations in User Comments

Jialong Han; Aixin Sun; Gao Cong; Wayne Xin Zhao; Zongcheng Ji; Minh C. Phan

Many domain-specific websites host a profile page for each entity (e.g., locations on Foursquare, movies on IMDb, and products on Amazon) for users to post comments on. When commenting on an entity, users often mention other entities for reference or comparison. Compared with web pages and tweets, the problem of disambiguating the mentioned entities in user comments has not received much attention. This paper investigates linking fine-grained locations in Foursquare comments. We demonstrate that the focal location, i.e., the location that a comment is posted on, provides rich contexts for the linking task. To exploit such information, we represent the Foursquare data in a graph, which includes locations, comments, and their relations. A probabilistic model named FocalLink is proposed to estimate the probability that a user mentions a location when commenting on a focal location, by following different kinds of relations. Experimental results show that FocalLink is consistently superior under different collective linking settings.

conference on information and knowledge management | 2017

Semi-Supervised Event-related Tweet Identification with Dynamic Keyword Generation

Xin Zheng; Aixin Sun; Sibo Wang; Jialong Han

Twitter provides us a convenient channel to get access to the immediate information about major events. However, it is challenging to acquire a clean and complete set of event-related data due to the characteristics of tweets, eg short and noisy. In this paper, we propose a semi-supervised method to obtain high quality event-related tweets from Twitter stream, in terms of precision and recall. Specifically, candidate event-related tweets are selected based on a set of keywords. We propose to generate and update these keywords dynamically along the event development. To be included in this keyword set, words are evaluated based on single word properties, property based on co-occurred words, and changes of word importance over time. Our solution is capable of capturing keywords of emerging aspects or aspects with increasing importance along event evolvement. By leveraging keyword importance information and a few labeled tweets, we propose a semi-supervised expectation maximization process to identify event-related tweets. This process significantly reduces human effort in acquiring high quality tweets. Experiments on three real world datasets show that our solution outperforms state-of-the-art approaches by up to 10% in F1 measure.

IEEE Transactions on Knowledge and Data Engineering | 2004