Kenny Q. Zhu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kenny Q. Zhu is active.

Explore More

Publication

Featured researches published by Kenny Q. Zhu.

international conference on management of data | 2012

Probase: a probabilistic taxonomy for text understanding

Wentao Wu; Hongsong Li; Haixun Wang; Kenny Q. Zhu

Knowledge is indispensable to understanding. The ongoing information explosion highlights the need to enable machines to better understand electronic text in human language. Much work has been devoted to creating universal ontologies or taxonomies for this purpose. However, none of the existing ontologies has the needed depth and breadth for universal understanding. In this paper, we present a universal, probabilistic taxonomy that is more comprehensive than any existing ones. It contains 2.7 million concepts harnessed automatically from a corpus of 1.68 billion web pages. Unlike traditional taxonomies that treat knowledge as black and white, it uses probabilities to model inconsistent, ambiguous and uncertain information it contains. We present details of how the taxonomy is constructed, its probabilistic modeling, and its potential applications in text understanding.

symposium on principles of programming languages | 2008

From dirt to shovels: fully automatic tool generation from ad hoc data

Kathleen Fisher; David Walker; Kenny Q. Zhu; Peter White

An ad hoc data source is any semistructured data source for which useful data analysis and transformation tools are not readily available. Such data must be queried, transformed and displayed by systems administrators, computational biologists, financial analysts and hosts of others on a regular basis. In this paper, we demonstrate that it is possible to generate a suite of useful data processing tools, including a semi-structured query engine, several format converters, a statistical analyzer and data visualization routines directly from the ad hoc data itself, without any human intervention. The key technical contribution of the work is a multi-phase algorithm that automatically infers the structure of an ad hoc data source and produces a format specification in the PADS data description language. Programmers wishing to implement custom data analysis tools can use such descriptions to generate printing and parsing libraries for the data. Alternatively, our software infrastructure will push these descriptions through the PADS compiler, creating format-dependent modules that, when linked with format-independent algorithms for analysis and transformation, result infully functional tools. We evaluate the performance of our inference algorithm, showing it scales linearlyin the size of the training data - completing in seconds, as opposed to the hours or days it takes to write a description by hand. We also evaluate the correctness of the algorithm, demonstrating that generating accurate descriptions often requires less than 5% of theavailable data.

international conference on data engineering | 2015

False rumors detection on Sina Weibo by propagation structures

Ke Wu; Song Yang; Kenny Q. Zhu

This paper studies the problem of automatic detection of false rumors on Sina Weibo, the popular Chinese microblogging social network. Traditional feature-based approaches extract features from the false rumor message, its author, as well as the statistics of its responses to form a flat feature vector. This ignores the propagation structure of the messages and has not achieved very good results. We propose a graph-kernel based hybrid SVM classifier which captures the high-order propagation patterns in addition to semantic features such as topics and sentiments. The new model achieves a classification accuracy of 91.3% on randomly selected Weibo dataset, significantly higher than state-of-the-art approaches. Moreover, our approach can be applied at the early stage of rumor propagation and is 88% confident in detecting an average false rumor just 24 hours after the initial broadcast.

international conference on tools with artificial intelligence | 2003

A diversity-controlling adaptive genetic algorithm for the vehicle routing problem with time windows

Kenny Q. Zhu

This paper presents an adaptive genetic algorithm (GA) to solve the vehicle routing problem with time windows (VRPTW) to near optimal solutions. The algorithm employs a unique decoding scheme with the integer strings. It also automatically adapts the crossover probability and the mutation rate to the changing population dynamics. The adaptive control maintains population diversity at user-defined levels, and therefore prevents premature convergence in search. Comparison between this algorithm and a normal fixed parameter GA clearly demonstrates the advantage of population diversity control. Our experiments with the 56 Solomon benchmark problems indicate that this algorithm is competitive and it paves way for future research on population-based adaptive genetic algorithm.

conference on information and knowledge management | 2013

Computing term similarity by large probabilistic isA knowledge

Peipei Li; Haixun Wang; Kenny Q. Zhu; Zhongyuan Wang; Xindong Wu

Computing semantic similarity between two terms is essential for a variety of text analytics and understanding applications. However, existing approaches are more suitable for semantic similarity between words rather than the more general multi-word expressions (MWEs), and they do not scale very well. Therefore, we propose a lightweight and effective approach for semantic similarity using a large scale semantic network automatically acquired from billions of web documents. Given two terms, we map them into the concept space, and compare their similarity there. Furthermore, we introduce a clustering approach to orthogonalize the concept space in order to improve the accuracy of the similarity measure. Extensive studies demonstrate that our approach can accurately compute the semantic similarity between terms with MWEs and ambiguity, and significantly outperforms 12 competing methods.

european conference on machine learning | 2004

Ppulation diversity in permutation-based genetic algorithm

Kenny Q. Zhu; Ziwei Liu

This paper presents an empirical study of population diversity measures and adaptive control of diversity in the context of a permutation-based algorithm for Traveling Salesman Problems and Vehicle Routing Problems. We provide detailed graphical observations and discussion of the relationship among the four diversity measures and suggest a moderate correlation between diversity and search performance under simple conditions. We also study the effects of adapting key genetic control parameters such as crossover and mutation rates on the population diversity. We are able to show that adaptive control of the genetic operations based on population diversity effectively outperforms fixed parameter genetic algorithms.

international conference on data engineering | 2015

SAR: A sentiment-aspect-region model for user preference analysis in geo-tagged reviews

Kaiqi Zhao; Gao Cong; Quan Yuan; Kenny Q. Zhu

Many location based services, such as FourSquare, Yelp, TripAdvisor, Google Places, etc., allow users to compose reviews or tips on points of interest (POIs), each having a geographical coordinates. These services have accumulated a large amount of such geo-tagged review data, which allows deep analysis of user preferences in POIs. This paper studies two types of user preferences to POIs: topical-region preference and category aware topical-aspect preference. We propose a unified probabilistic model to capture these two preferences simultaneously. In addition, our model is capable of capturing the interaction of different factors, including topical aspect, sentiment, and spatial information. The model can be used in a number of applications, such as POI recommendation and user recommendation, among others. In addition, the model enables us to investigate whether people like an aspect of a POI or whether people like a topical aspect of some type of POIs (e.g., bars) in a region, which offer explanation for recommendations. Experiments on real world datasets show that the model achieves significant improvement in POI recommendation and user recommendation in comparison to the state-of-the-art methods. We also propose an efficient online recommendation algorithm based on our model, which saves up to 90% computation time.

conference on tools with artificial intelligence | 2000

A reactive method for real time dynamic vehicle routing problem

Kenny Q. Zhu; Kar-Loon Ong

The real time dynamic vehicle routing problem (RT-DVRP) is an extension of VRPTW, in which the problem parameters change in real time. We present a solution to RTDVRP: a concurrent, agent-based reactive vehicle routing system (RVRS) and the implementation of the RVRS, which combines a generic, concurrent infrastructure and a powerful incremental local optimization heuristic.

international conference on tools with artificial intelligence | 2004

Scalable distributed depth-first search with greedy work stealing

Joxan Jaffar; Andrew E. Santosa; Roland H. C. Yap; Kenny Q. Zhu

We present a framework for the parallelization of depth-first combinatorial search algorithms on a network of computers. Our architecture is intended for a distributed setting and uses a work stealing strategy coupled with a small number of primitives for the processors (which we call workers) to obtain new work and to communicate to other workers. These primitives are a minimal imposition and integrate easily with constraint programming systems. The main contribution is an adaptive architecture, which allows workers to incrementally join and leave and has good scaling properties as the number of workers increases. Our empirical results illustrate that near-linear speedup for backtrack search is achieved for up to 61 workers. It suggests that near-linear speedup is possible with even more workers. The experiments also demonstrate where departures from linearity can occur for small problems, and also for problems where the parallelism can itself affect the search as in branch and bound.

Operating Systems Review | 2010

Incremental learning of system log formats

Kenny Q. Zhu; Kathleen Fisher; David Walker

System logs come in a large and evolving variety of formats, many of which are semi-structured and/or non-standard. As a consequence, off-the-shelf tools for processing such logs often do not exist, forcing analysts to develop their own tools, which is costly and timeconsuming. In this paper, we present an incremental algorithm that automatically infers the format of system log files. From the resulting format descriptions, we can generate a suite of data processing tools automatically. The system can handle large-scale data sources whose formats evolve over time. Furthermore, it allows analysts to modify inferred descriptions as desired and incorporates those changes in future revisions.

Explore More