Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Zhongyuan Wang is active.

Publication


Featured researches published by Zhongyuan Wang.


international joint conference on artificial intelligence | 2011

Short text conceptualization using a probabilistic knowledgebase

Yangqiu Song; Haixun Wang; Zhongyuan Wang; Hongsong Li; Weizhu Chen

Most text mining tasks, including clustering and topic detection, are based on statistical methods that treat text as bags of words. Semantics in the text is largely ignored in the mining process, and mining results often have low interpretability. One particular challenge faced by such approaches lies in short text understanding, as short texts lack enough content from which statistical conclusions can be drawn easily. In this paper, we improve text understanding by using a probabilistic knowledgebase that is as rich as our mental world in terms of the concepts (of worldly facts) it contains. We then develop a Bayesian inference mechanism to conceptualize words and short text. We conducted comprehensive experiments on conceptualizing textual terms, and clustering short pieces of text such as Twitter messages. Compared to purely statistical methods such as latent semantic topic modeling or methods that use existing knowledge-bases (e.g., WordNet, Freebase and Wikipedia), our approach brings significant improvements in short text understanding as reflected by the clustering accuracy.


international conference on data engineering | 2015

Short text understanding through lexical-semantic analysis

Wen Hua; Zhongyuan Wang; Haixun Wang; Kai Zheng; Xiaofang Zhou

Understanding short texts is crucial to many applications, but challenges abound. First, short texts do not always observe the syntax of a written language. As a result, traditional natural language processing methods cannot be easily applied. Second, short texts usually do not contain sufficient statistical signals to support many state-of-the-art approaches for text processing such as topic modeling. Third, short texts are usually more ambiguous. We argue that knowledge is needed in order to better understand short texts. In this work, we use lexical-semantic knowledge provided by a well-known semantic network for short text understanding. Our knowledge-intensive approach disrupts traditional methods for tasks such as text segmentation, part-of-speech tagging, and concept labeling, in the sense that we focus on semantics in all these tasks. We conduct a comprehensive performance evaluation on real-life data. The results show that knowledge is indispensable for short text understanding, and our knowledge-intensive approaches are effective in harvesting semantics of short texts.


international conference on data engineering | 2013

Attribute extraction and scoring: A probabilistic approach

Taesung Lee; Zhongyuan Wang; Haixun Wang; Seung-won Hwang

Knowledge bases, which consist of concepts, entities, attributes and relations, are increasingly important in a wide range of applications. We argue that knowledge about attributes (of concepts or entities) plays a critical role in inferencing. In this paper, we propose methods to derive attributes for millions of concepts and we quantify the typicality of the attributes with regard to their corresponding concepts. We employ multiple data sources such as web documents, search logs, and existing knowledge bases, and we derive typicality scores for attributes by aggregating different distributions derived from different sources using different methods. To the best of our knowledge, ours is the first approach to integrate concept- and instance-based patterns into probabilistic typicality scores that scale to broad concept space. We have conducted extensive experiments to show the effectiveness of our approach.


conference on information and knowledge management | 2013

Computing term similarity by large probabilistic isA knowledge

Peipei Li; Haixun Wang; Kenny Q. Zhu; Zhongyuan Wang; Xindong Wu

Computing semantic similarity between two terms is essential for a variety of text analytics and understanding applications. However, existing approaches are more suitable for semantic similarity between words rather than the more general multi-word expressions (MWEs), and they do not scale very well. Therefore, we propose a lightweight and effective approach for semantic similarity using a large scale semantic network automatically acquired from billions of web documents. Given two terms, we map them into the concept space, and compare their similarity there. Furthermore, we introduce a clustering approach to orthogonalize the concept space in order to improve the accuracy of the similarity measure. Extensive studies demonstrate that our approach can accurately compute the semantic similarity between terms with MWEs and ambiguity, and significantly outperforms 12 competing methods.


international conference on data engineering | 2014

Head, modifier, and constraint detection in short texts

Zhongyuan Wang; Haixun Wang; Zhirui Hu

Head and modifier detection is an important problem for applications that handle short texts such as search queries, ads keywords, titles, captions, etc. In many cases, short texts such as search queries do not follow grammar rules, and existing approaches for head and modifier detection are coarse-grained, domain specific, and/or require labeling of large amounts of training data. In this paper, we introduce a semantic approach for head and modifier detection. We first obtain a large number of instance level head-modifier pairs from search log. Then, we develop a conceptualization mechanism to generalize the instance level pairs to concept level. Finally, we derive weighted concept patterns that are concise, accurate, and have strong generalization power in head and modifier detection. Furthermore, we identify a subset of modifiers that we call constraints. Constraints are usually specific and not negligible as far as the intent of the short text is concerned, while non-constraint modifiers are more subjective. The mechanism we developed has been used in production for search relevance and ads matching. We use extensive experiment results to demonstrate the effectiveness of our approach.


international conference on management of data | 2012

Optimizing index for taxonomy keyword search

Bolin Ding; Haixun Wang; Ruoming Jin; Jiawei Han; Zhongyuan Wang

Query substitution is an important problem in information retrieval. Much work focuses on how to find substitutes for any given query. In this paper, we study how to efficiently process a keyword query whose substitutes are defined by a given taxonomy. This problem is challenging because each term in a query can have a large number of substitutes, and the original query can be rewritten into any of their combinations. We propose to build an additional index (besides inverted index) to efficiently process queries. For a query workload, we formulate an optimization problem which chooses the additional index structure, aiming at minimizing the query evaluation cost, under given index space constraints. We show the NP-hardness of the problem, and propose a pseudo-polynomial time algorithm using dynamic programming, as well as an 1 over 4(1-1/e)-approximation algorithm to solve the problem. Experimental results show that, with only 10% additional index space, our approach can greatly reduce the query evaluation cost.


IEEE Transactions on Knowledge and Data Engineering | 2017

Understand Short Texts by Harvesting and Analyzing Semantic Knowledge

Wen Hua; Zhongyuan Wang; Haixun Wang; Kai Zheng; Xiaofang Zhou

Understanding short texts is crucial to many applications, but challenges abound. First, short texts do not always observe the syntax of a written language. As a result, traditional natural language processing tools, ranging from part-of-speech tagging to dependency parsing, cannot be easily applied. Second, short texts usually do not contain sufficient statistical signals to support many state-of-the-art approaches for text mining such as topic modeling. Third, short texts are more ambiguous and noisy, and are generated in an enormous volume, which further increases the difficulty to handle them. We argue that semantic knowledge is required in order to better understand short texts. In this work, we build a prototype system for short text understanding which exploits semantic knowledge provided by a well-known knowledgebase and automatically harvested from a web corpus. Our knowledge-intensive approaches disrupt traditional methods for tasks such as text segmentation, part-of-speech tagging, and concept labeling, in the sense that we focus on semantics in all these tasks. We conduct a comprehensive performance evaluation on real-life data. The results show that semantic knowledge is indispensable for short text understanding, and our knowledge-intensive approaches are both effective and efficient in discovering semantics of short texts.


conference on information and knowledge management | 2015

Contextual Text Understanding in Distributional Semantic Space

Jianpeng Cheng; Zhongyuan Wang; Ji-Rong Wen; Jun Yan; Zheng Chen

Representing discrete words in a continuous vector space turns out to be useful for natural language applications related to text understanding. Meanwhile, it poses extensive challenges, one of which is due to the polysemous nature of human language. A common solution (a.k.a word sense induction) is to separate each word into multiple senses and create a representation for each sense respectively. However, this approach is usually computationally expensive and prone to data sparsity, since each sense needs to be managed discriminatively. In this work, we propose a new framework for generating context-aware text representations without diving into the sense space. We model the concept space shared among senses, resulting in a framework that is efficient in both computation and storage. Specifically, the framework we propose is one that: i) projects both words and concepts into the same vector space; ii) obtains unambiguous word representations that not only preserve the uniqueness among words, but also reflect their context-appropriate meanings. We demonstrate the effectiveness of the framework in a number of tasks on text understanding, including word/phrase similarity measurements, paraphrase identification and question-answer relatedness classification.


IEEE Transactions on Knowledge and Data Engineering | 2015

A Large Probabilistic Semantic Network Based Approach to Compute Term Similarity

Peipei Li; Haixun Wang; Kenny Q. Zhu; Zhongyuan Wang; Xuegang Hu; Xindong Wu

Measuring semantic similarity between two terms is essential for a variety of text analytics and understanding applications. Currently, there are two main approaches for this task, namely the knowledge based and the corpus based approaches. However, existing approaches are more suitable for semantic similarity between words rather than the more general multi-word expressions (MWEs), and they do not scale very well. Contrary to these existing techniques, we propose an efficient and effective approach for semantic similarity using a large scale semantic network. This semantic network is automatically acquired from billions of web documents. It consists of millions of concepts, which explicitly model the context of semantic relationships. In this paper, we first show how to map two terms into the concept space, and compare their similarity there. Then, we introduce a clustering approach to orthogonalize the concept space in order to improve the accuracy of the similarity measure. Finally, we conduct extensive studies to demonstrate that our approach can accurately compute the semantic similarity between terms of MWEs and with ambiguity, and significantly outperforms 12 competing methods under Pearson Correlation Coefficient. Meanwhile, our approach is much more efficient than all competing algorithms, and can be used to compute semantic similarity in a large scale.


conference on information and knowledge management | 2015

An Inference Approach to Basic Level of Categorization

Zhongyuan Wang; Haixun Wang; Ji-Rong Wen; Yanghua Xiao

Humans understand the world by classifying objects into an appropriate level of categories. This process is often automatic and subconscious. Psychologists and linguists call it as Basic-level Categorization (BLC). BLC can benefit lots of applications such as knowledge panel, advertising and recommendation. However, how to quantify basic-level concepts is still an open problem. Recently, much work focuses on constructing knowledge bases or semantic networks from web scale text corpora, which makes it possible for the first time to analyze computational approaches for deriving BLC. In this paper, we introduce a method based on typicality and PMI for BLC. We compare it with a few existing measures such as NPMI and commute time to understand its essence, and conduct extensive experiments to show the effectiveness of our approach. We also give a real application example to show how BLC can help sponsored search.

Collaboration


Dive into the Zhongyuan Wang's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ji-Rong Wen

Renmin University of China

View shared research outputs
Top Co-Authors

Avatar

Xiaofeng Meng

Renmin University of China

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Taesung Lee

Pohang University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kenny Q. Zhu

Shanghai Jiao Tong University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge