Wentao Wu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wentao Wu is active.

Explore More

Publication

Featured researches published by Wentao Wu.

international conference on management of data | 2012

Probase: a probabilistic taxonomy for text understanding

Wentao Wu; Hongsong Li; Haixun Wang; Kenny Q. Zhu

Knowledge is indispensable to understanding. The ongoing information explosion highlights the need to enable machines to better understand electronic text in human language. Much work has been devoted to creating universal ontologies or taxonomies for this purpose. However, none of the existing ontologies has the needed depth and breadth for universal understanding. In this paper, we present a universal, probabilistic taxonomy that is more comprehensive than any existing ones. It contains 2.7 million concepts harnessed automatically from a corpus of 1.68 billion web pages. Unlike traditional taxonomies that treat knowledge as black and white, it uses probabilities to model inconsistent, ambiguous and uncertain information it contains. We present details of how the taxonomy is constructed, its probabilistic modeling, and its potential applications in text understanding.

extending database technology | 2010

k-symmetry model for identity anonymization in social networks

Wentao Wu; Yanghua Xiao; Wei Wang; Zhenying He; Zhihui Wang

With more and more social network data being released, protecting the sensitive information within social networks from leakage has become an important concern of publishers. Adversaries with some background structural knowledge about a target individual can easily re-identify him from the network, even if the identifiers have been replaced by randomized integers(i.e., the network is naively-anonymized). Since there exists numerous topological information that can be used to attack a victims privacy, to resist such structural re-identification becomes a great challenge. Previous works only investigated a minority of such structural attacks, without considering protecting against re-identification under any potential structural knowledge about a target. To achieve this objective, in this paper we propose k-symmetry model, which modifies a naively-anonymized network so that for any vertex in the network, there exist at least k -- 1 structurally equivalent counterparts. We also propose sampling methods to extract approximate versions of the original network from the anonymized network so that statistical properties of the original network could be evaluated. Extensive experiments show that we can successfully recover a variety of such properties of the original network through aggregations on quite a small number of sample graphs.

international conference on data engineering | 2013

Predicting query execution time: Are optimizer cost models really unusable?

Wentao Wu; Yun Chi; Shenghuo Zhu; Junichi Tatemura; Hakan Hacigümüs; Jeffrey F. Naughton

Predicting query execution time is useful in many database management issues including admission control, query scheduling, progress monitoring, and system sizing. Recently the research community has been exploring the use of statistical machine learning approaches to build predictive models for this task. An implicit assumption behind this work is that the cost models used by query optimizers are insufficient for query execution time prediction. In this paper we challenge this assumption and show while the simple approach of scaling the optimizers estimated cost indeed fails, a properly calibrated optimizer cost model is surprisingly effective. However, even a well-tuned optimizer cost model will fail in the presence of errors in cardinality estimates. Accordingly we investigate the novel idea of spending extra resources to refine estimates for the query plan after it has been chosen by the optimizer but before execution. In our experiments we find that a well calibrated query optimizer model along with cardinality estimation refinement provides a low overhead way to provide estimates that are always competitive and often much better than the best reported numbers from the machine learning approaches.

extending database technology | 2009

Efficiently indexing shortest paths by exploiting symmetry in graphs

Yanghua Xiao; Wentao Wu; Jian Pei; Wei Wang; Zhenying He

Shortest path queries (SPQ) are essential in many graph analysis and mining tasks. However, answering shortest path queries on-the-fly on large graphs is costly. To online answer shortest path queries, we may materialize and index shortest paths. However, a straightforward index of all shortest paths in a graph of N vertices takes O(N2) space. In this paper, we tackle the problem of indexing shortest paths and online answering shortest path queries. As many large real graphs are shown richly symmetric, the central idea of our approach is to use graph symmetry to reduce the index size while retaining the correctness and the efficiency of shortest path query answering. Technically, we develop a framework to index a large graph at the orbit level instead of the vertex level so that the number of breadth-first search trees materialized is reduced from O(N) to O(|Δ|), where |Δ| ≤ N is the number of orbits in the graph. We explore orbit adjacency and local symmetry to obtain compact breadth-first-search trees (compact BFS-trees). An extensive empirical study using both synthetic data and real data shows that compact BFS-trees can be built efficiently and the space cost can be reduced substantially. Moreover, online shortest path query answering can be achieved using compact BFS-trees.

Physica A-statistical Mechanics and Its Applications | 2008

Symmetry-based structure entropy of complex networks

Yanghua Xiao; Wentao Wu; Hui Wang; Momiao Xiong; Wei Wang

Precisely quantifying the heterogeneity or disorder of network systems is important and desired in studies of behaviors and functions of network systems. Although various degree-based entropies have been available to measure the heterogeneity of real networks, heterogeneity implicated in the structures of networks can not be precisely quantified yet. Hence, we propose a new structure entropy based on automorphism partition. Analysis of extreme cases shows that entropy based on automorphism partition can quantify the structural heterogeneity of networks more precisely than degree-based entropies. We also summarized symmetry and heterogeneity statistics of many real networks, finding that real networks are more heterogeneous in the view of automorphism partition than what have been depicted under the measurement of degree-based entropies; and that structural heterogeneity is strongly negatively correlated to symmetry of real networks.

cloud data management | 2009

Personalization as a service: the architecture and a case study

Hang Guo; Jidong Chen; Wentao Wu; Wei Wang

Cloud computing has become a hot topic in the IT industry. Great efforts have been made to establish cloud computing platforms for enterprise users, mostly small businesses. However, there are few researches about the impact of cloud computing over individual users. In this paper we focus on how to provide personalized services for individual users in the cloud environment. We argue that a personalized cloud service shall compose of two parts. The client side program records user activities on personal de-vices such as PC. Besides that, the user model is also computed on the client side to avoid server overhead. The cloud side program fetches the user model periodically and adjusts its results accordingly. We build a personalized cloud data search engine prototype to prove our idea.

conference on information and knowledge management | 2009

iMecho: an associative memory based desktop search system

Jidong Chen; Hang Guo; Wentao Wu; Wei Wang

Traditional desktop search engines only support keyword based search that needs exact keyword matching to find resources. However, users generally have a vague picture of what is stored but forget the exact location and keywords of the resource. According to observations of human associative memory, people tend to remember things from some memory fragments in their brains and these memory fragments are connected by memory cues of user activity context. We developed iMecho (My Memory Echo), an associative memory based desktop search system, which exploits such associations and contexts to enhance traditional desktop search. Desktop resources are connected with semantic links mined from explicit and implicit user activities according to specific access patterns. Using these semantic links, associations among memory fragments can be built or rebuilt in a users brain during a search. Moreover, our personalized ranking scheme uses these links together with a users personal preferences to rank results by both relevance and importance to the user. In addition, the system provides a faceted search feature and association graph navigation to help users refine and associate search results generated by full-text keyword search. Our experiments investigating precision and recall quality of iMecho prototype show that the association-based search system is superior to the traditional keyword search in personal search engines since it is closer to the way that human associative memory works.

Pattern Recognition | 2008

Structure-based graph distance measures of high degree of precision

Yanghua Xiao; Hua Dong; Wentao Wu; Momiao Xiong; Wei Wang; Baile Shi

In recent years, evaluating graph distance has become more and more important in a variety of real applications and many graph distance measures have been proposed. Among all of those measures, structure-based graph distance measures have become the research focus due to their independence of the definition of cost functions. However, existing structure-based graph distance measures have low degree of precision because only node and edge information of graphs are employed in these measures. To improve the precision of graph distance measures, we define substructure abundance vector (SAV) to capture more substructure information of a graph. Furthermore, based on SAV, we propose unified graph distance measures which are generalization of the existing structure-based graph distance measures. In general, the unified graph distance measures can evaluate graph distance in much finer grain. We also show that unified graph distance measures based on occurrence mapping and some of their variants are metrics. Finally, we apply the unified graph distance metric and its variants to the population evolution analysis and construct distance graphs of marker networks in three populations, which reflect the single nucleotide polymorphism (SNP) linkage disequilibrium (LD) differences among these populations.

international conference on management of data | 2009

Search your memory ! - an associative memory based desktop search system

Jidong Chen; Hang Guo; Wentao Wu; Chunxin Xie

We present XSearcher, an associative memory based desktop search system, which exploits associations by creating semantic links of personal desktop resources from explicit and implicit user activities. With these links, associations among memory fragments can be built or rebuilt in a users brain during a search. The personalized ranking scheme uses these links together with a users personal preferences to rank results by both relevance and importance. XSearcher enhances traditional keyword based search systems since it is closer to the way that human associative memory works.

very large data bases | 2014

Uncertainty aware query execution time prediction

Wentao Wu; Xi Wu; Hakan Hacigümüs; Jeffrey F. Naughton

Predicting query execution time is a fundamental issue underlying many database management tasks. Existing predictors rely on information such as cardinality estimates and system performance constants that are difficult to know exactly. As a result, accurate prediction still remains elusive for many queries. However, existing predictors provide a single, point estimate of the true execution time, but fail to characterize the uncertainty in the prediction. In this paper, we take a first step towards providing uncertainty information along with query execution time predictions. We use the query optimizers cost model to represent the query execution time as a function of the selectivities of operators in the query plan as well as the constants that describe the cost of CPU and I/O operations in the system. By treating these quantities as random variables rather than constants, we show that with low overhead we can infer the distribution of likely prediction errors. We further show that the estimated prediction errors by our proposed techniques are strongly correlated with the actual prediction errors.

Explore More