Is this you? Create Your Porfile

Jialu Liu

University of Illinois at Urbana–Champaign

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jialu Liu is active.

Explore More

Publication

Featured researches published by Jialu Liu.

international conference on management of data | 2015

Mining Quality Phrases from Massive Text Corpora

Jialu Liu; Jingbo Shang; Chi Wang; Xiang Ren; Jiawei Han

Text data are ubiquitous and play an essential role in big data applications. However, text data are mostly unstructured. Transforming unstructured text into structured units (e.g., semantically meaningful phrases) will substantially reduce semantic ambiguity and enhance the power and efficiency at manipulating such data using database technology. Thus mining quality phrases is a critical research problem in the field of databases. In this paper, we propose a new framework that extracts quality phrases from text corpora integrated with phrasal segmentation. The framework requires only limited training but the quality of phrases so generated is close to human judgment. Moreover, the method is scalable: both computation time and required space grow linearly as corpus size increases. Our experiments on large text corpora demonstrate the quality and efficiency of the new method.

knowledge discovery and data mining | 2014

ClusCite: effective citation recommendation by information network-based clustering

Xiang Ren; Jialu Liu; Xiao Yu; Urvashi Khandelwal; Quanquan Gu; Lidan Wang; Jiawei Han

Citation recommendation is an interesting but challenging research problem. Most existing studies assume that all papers adopt the same criterion and follow the same behavioral pattern in deciding relevance and authority of a paper. However, in reality, papers have distinct citation behavioral patterns when looking for different references, depending on paper content, authors and target venues. In this study, we investigate the problem in the context of heterogeneous bibliographic networks and propose a novel cluster-based citation recommendation framework, called ClusCite, which explores the principle that citations tend to be softly clustered into interest groups based on multiple types of relationships in the network. Therefore, we predict each querys citations based on related interest groups, each having its own model for paper authority and relevance. Specifically, we learn group memberships for objects and the significance of relevance features for each interest group, while also propagating relative authority between objects, by solving a joint optimization problem. Experiments on both DBLP and PubMed datasets demonstrate the power of the proposed approach, with 17.68% improvement in Recall@50 and 9.57% growth in MRR over the best performing baseline.

international conference on data mining | 2016

Large-Scale Embedding Learning in Heterogeneous Event Data

Huan Gui; Jialu Liu; Fangbo Tao; Meng Jiang; Brandon Norick; Jiawei Han

Heterogeneous events, which are defined as events connecting strongly-typed objects, are ubiquitous in the real world. We propose a HyperEdge-Based Embedding (Hebe) framework for heterogeneous event data, where a hyperedge represents the interaction among a set of involving objects in an event. The Hebe framework models the proximity among objects in an event by predicting a target object given the other participating objects in the event (hyperedge). Since each hyperedge encapsulates more information on a given event, Hebe is robust to data sparseness. In addition, Hebe is scalable when the data size spirals. Extensive experiments on large-scale real-world datasets demonstrate the efficacy and robustness of Hebe.

Knowledge and Information Systems | 2015

Constructing topical hierarchies in heterogeneous information networks

Chi Wang; Jialu Liu; Nihit Desai; Marina Danilevsky; Jiawei Han

Many digital documentary data collections (e.g., scientific publications, enterprise reports, news articles, and social media) can be modeled as a heterogeneous information network, linking text with multiple types of entities. Constructing high-quality hierarchies that can represent topics at multiple granularities benefits tasks such as search, information browsing, and pattern mining. In this work, we present an algorithm for recursively constructing multi-typed topical hierarchies. Contrary to traditional text-based topic modeling, our approach handles both textual phrases and multiple types of entities by a newly designed clustering and ranking algorithm for heterogeneous network data, as well as mining and ranking topical patterns of different types. Our experiments on datasets from two different domains demonstrate that our algorithm yields high-quality, multi-typed topical hierarchies.

international conference on management of data | 2014

NewsNetExplorer: automatic construction and exploration of news information networks

Fangbo Tao; George Brova; Jiawei Han; Heng Ji; Chi Wang; Brandon Norick; Ahmed El-Kishky; Jialu Liu; Xiang Ren; Yizhou Sun

News data is one of the most abundant and familiar data sources. News data can be systematically utilized and ex- plored by database, data mining, NLP and information re- trieval researchers to demonstrate to the general public the power of advanced information technology. In our view, news data contains rich, inter-related and multi-typed data objects, forming one or a set of gigantic, interconnected, het- erogeneous information networks. Much knowledge can be derived and explored with such an information network if we systematically develop effective and scalable data-intensive information network analysis technologies. By further developing a set of information extraction, in- formation network construction, and information network mining methods, we extract types, topical hierarchies and other semantic structures from news data, construct a semi- structured news information network NewsNet. Further, we develop a set of news information network exploration and mining mechanisms that explore news in multi-dimensional space, which include (i) OLAP-based operations on the hierarchical dimensional and topical structures and rich-text, such as cell summary, single dimension analysis, and promo- tion analysis, (ii) a set of network-based operations, such as similarity search and ranking-based clustering, and (iii) a set of hybrid operations or network-OLAP operations, such as entity ranking at different granularity levels. These form the basis of our proposed NewsNetExplorer system. Although some of these functions have been studied in recent research, effective and scalable realization of such functions in large networks still poses multiple challenging research problems. Moreover, some functions are our on-going research tasks. By integrating these functions, NewsNetExplorer not only provides with us insightful recommendations in NewsNet exploration system but also helps us gain insight on how to perform effective information extraction, integration and mining in large unstructured datasets.

knowledge discovery and data mining | 2013

EventCube: multi-dimensional search and mining of structured and text data

Fangbo Tao; Kin Hou Lei; Jiawei Han; ChengXiang Zhai; Xiao Cheng; Marina Danilevsky; Nihit Desai; Bolin Ding; Jing Ge; Heng Ji; Rucha Kanade; Anne Kao; Qi Li; Yanen Li; Cindy Xide Lin; Jialu Liu; Nikunj C. Oza; Ashok N. Srivastava; Rodney Tjoelker; Chi Wang; Duo Zhang; Bo Zhao

A large portion of real world data is either text or structured (e.g., relational) data. Moreover, such data objects are often linked together (e.g., structured specification of products linking with the corresponding product descriptions and customer comments). Even for text data such as news data, typed entities can be extracted with entity extraction tools. The EventCube project constructs TextCube and TopicCube from interconnected structured and text data (or from text data via entity extraction and dimension building), and performs multidimensional search and analysis on such datasets, in an informative, powerful, and user-friendly manner. This proposed EventCube demo will show the power of the system not only on the originally designed ASRS (Aviation Safety Report System) data sets, but also on news datasets collected from multiple news agencies, and academic datasets constructed from the DBLP and web data. The system has high potential to be extended in many powerful ways and serve as a general platform for search, OLAP (online analytical processing) and data mining on integrated text and structured data. After the system demo in the conference, the system will be put on the web for public access and evaluation.

knowledge discovery and data mining | 2013

Selective sampling on graphs for classification

Quanquan Gu; Charu C. Aggarwal; Jialu Liu; Jiawei Han

Selective sampling is an active variant of online learning in which the learner is allowed to adaptively query the label of an observed example. The goal of selective sampling is to achieve a good trade-off between prediction performance and the number of queried labels. Existing selective sampling algorithms are designed for vector-based data. In this paper, motivated by the ubiquity of graph representations in real-world applications, we propose to study selective sampling on graphs. We first present an online version of the well-known Learning with Local and Global Consistency method (OLLGC). It is essentially a second-order online learning algorithm, and can be seen as an online ridge regression in the Hilbert space of functions defined on graphs. We prove its regret bound in terms of the structural property (cut size) of a graph. Based on OLLGC, we present a selective sampling algorithm, namely Selective Sampling with Local and Global Consistency (SSLGC), which queries the label of each node based on the confidence of the linear function on graphs. Its bound on the label complexity is also derived. We analyze the low-rank approximation of graph kernels, which enables the online algorithms scale to large graphs. Experiments on benchmark graph datasets show that OLLGC outperforms the state-of-the-art first-order algorithm significantly, and SSLGC achieves comparable or even better results than OLLGC while querying substantially fewer nodes. Moreover, SSLGC is overwhelmingly better than random sampling.

international conference on data mining | 2013

Constructing Topical Hierarchies in Heterogeneous Information Networks

Chi Wang; Marina Danilevsky; Jialu Liu; Nihit Desai; Heng Ji; Jiawei Han

A digital data collection (e.g., scientific publications, enterprise reports, news, and social media) can often be modeled as a heterogeneous information network, linking text with multiple types of entities. Constructing high-quality concept hierarchies that can represent topics at multiple granularities benefits tasks such as search, information browsing, and pattern mining. In this work we present an algorithm for recursively constructing multi-typed topical hierarchies. Contrary to traditional text-based topic modeling, our approach handles both textual phrases and multiple types of entities by a newly designed clustering and ranking algorithm for heterogeneous network data, as well as mining and ranking topical patterns of different types. Our experiments on datasets from two different domains demonstrate that our algorithm yields high quality, multi-typed topical hierarchies.

international world wide web conferences | 2016