Xiaoxun Zhang
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Xiaoxun Zhang.
international world wide web conferences | 2008
Qi Su; Xinying Xu; Honglei Guo; Zhili Guo; Xian Wu; Xiaoxun Zhang; Bin Swen; Zhong Su
The boom of product review websites, blogs and forums on the web has attracted many research efforts on opinion mining. Recently, there was a growing interest in the finer-grained opinion mining, which detects opinions on different review features as opposed to the whole review level. The researches on feature-level opinion mining mainly rely on identifying the explicit relatedness between product feature words and opinion words in reviews. However, the sentiment relatedness between the two objects is usually complicated. For many cases, product feature words are implied by the opinion words in reviews. The detection of such hidden sentiment association is still a big challenge in opinion mining. Especially, it is an even harder task of feature-level opinion mining on Chinese reviews due to the nature of Chinese language. In this paper, we propose a novel mutual reinforcement approach to deal with the feature-level opinion mining problem. More specially, 1) the approach clusters product features and opinion words simultaneously and iteratively by fusing both their content information and sentiment link information. 2) under the same framework, based on the product feature categories and opinion word groups, we construct the sentiment association set between the two groups of data objects by identifying their strongest n sentiment links. Moreover, knowledge from multi-source is incorporated to enhance clustering in the procedure. Based on the pre-constructed association set, our approach can largely predict opinions relating to different product features, even for the case without the explicit appearance of product feature words in reviews. Thus it provides a more accurate opinion evaluation. The experimental results demonstrate that our method outperforms the state-of-art algorithms.
north american chapter of the association for computational linguistics | 2009
Honglei Guo; Huijia Zhu; Zhili Guo; Xiaoxun Zhang; Xian Wu; Zhong Su
Domain adaptation is an important problem in named entity recognition (NER). NER classifiers usually lose accuracy in the domain transfer due to the different data distribution between the source and the target domains. The major reason for performance degrading is that each entity type often has lots of domain-specific term representations in the different domains. The existing approaches usually need an amount of labeled target domain data for tuning the original model. However, it is a labor-intensive and time-consuming task to build annotated training data set for every target domain. We present a domain adaptation method with latent semantic association (LaSA). This method effectively overcomes the data distribution difference without leveraging any labeled target domain data. LaSA model is constructed to capture latent semantic association among words from the unlabeled corpus. It groups words into a set of concepts according to the related context snippets. In the domain transfer, the original term spaces of both domains are projected to a concept space using LaSA model at first, then the original NER model is tuned based on the semantic association features. Experimental results on English and Chinese corpus show that LaSA-based domain adaptation significantly enhances the performance of NER.
knowledge discovery and data mining | 2009
Honglei Guo; Huijia Zhu; Zhili Guo; Xiaoxun Zhang; Zhong Su
Address standardization is a very challenging task in data cleansing. To provide better customer relationship management and business intelligence for customer-oriented cooperates, millions of free-text addresses need to be converted to a standard format for data integration, de-duplication and householding. Existing commercial tools usually employ lots of hand-craft, domain-specific rules and reference data dictionary of cities, states etc. These rules work better for the region they are designed. However, rule-based methods usually require more human efforts to rewrite these rules for each new domain since address data are very irregular and varied with countries and regions. Supervised learning methods usually are more adaptable than rule-based approaches. However, supervised methods need large-scale labeled training data. It is a labor-intensive and time-consuming task to build a large-scale annotated corpus for each target domain. For minimizing human efforts and the size of labeled training data set, we present a free-text address standardization method with latent semantic association (LaSA). LaSA model is constructed to capture latent semantic association among words from the unlabeled corpus. The original term space of the target domain is projected to a concept space using LaSA model at first, then the address standardization model is active learned from LaSA features and informative samples. The proposed method effectively captures the data distribution of the domain. Experimental results on large-scale English and Chinese corpus show that the proposed method significantly enhances the performance of standardization with less efforts and training data.
Ibm Journal of Research and Development | 2010
Li Zhang; Shenghua Bao; Honglei Guo; Huijia Zhu; Xiaoxun Zhang; Keke Cai; Ben Fei; Xian Wu; Zhenyu Guo; Zhong Su
This paper describes EagleEye, which is an intelligent system that provides business intelligence through advanced data mining and text analytics. Unlike traditional search engines, EagleEye is entity oriented, and an entity can be an organization, a person, or a place. Given an entity name, the basic function of EagleEye is to generate a consolidated view of the entity information it gathers from many disparate data sources and to organize and categorize it, and automatically detect entity relationships. EagleEye can also analyze the opinions of entities, evaluate whether they are positive or negative, and provide insight into many aspects of consumer sentiment toward product brands. This type of information can enable enterprises to manage the reputation of their brands and to respond more quickly to changes in the marketplace. We present the key technologies--such as entity-name grouping, entity-relation extraction, and entity-oriented opinion mining--that were developed to support these functions. EagleEye has been successfully deployed to a number of clients across a variety of industries in China. Several case studies are presented to demonstrate in practice the capability and business value of EagleEye.
conference on information and knowledge management | 2010
Xiaoxun Zhang; Zhili Guo; Honglei Guo; Huijia Zhu; Zhong Su
We are concerned with the problem of similarity joins of text data, where the task is to find all pairs of documents above an expected similarity. Such a problem often serves as an indispensable step in many web applications. A crucial issue is to preclude unnecessary candidate pairs as many as possible ahead of expensive similarity evaluation. In this paper, we initiate an idea of adopting a cascade structure in text joins for a large speedup, where a latter stage can exclude a considerable number of invalid pairs survived in former stages. The proposed algorithm is shortly referred to as CasJoin. We further adopt a prefix filter to build the stage of CasJoin by introducing a novel vision to the dynamic generation of document vector. Specifically, a vector is partitioned into a chain of multiple prefixes that are appended one by one for cascade joining. We evaluate our CasJoin on a typical web corpus, ODP. Experiments indicate that, comparing to the state-of-the-art prefix algorithms, CasJoin can achieve a drastic reduction of candidates by as much as 98.15% and a dramatic speedup of joining by up to 13.34x.
conference on information and knowledge management | 2009
Honglei Guo; Huijia Zhu; Zhili Guo; Xiaoxun Zhang; Zhong Su
conference on information and knowledge management | 2010
Honglei Guo; Huijia Zhu; Zhili Guo; Xiaoxun Zhang; Zhong Su
conference on information and knowledge management | 2009
Xiaoxun Zhang; Lichun Yang; Xian Wu; Honglei Guo; Zhili Guo; Shenghua Bao; Yong Yu; Zhong Su
Archive | 2009
Zhili Guo; Xiaoxun Zhang; Honglei Guo; Zhong Su
Archive | 2009
Ben Fei; Bo Hu; Xian Wu; Xiaoxun Zhang; Zhong Su