Guanggang Geng | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guanggang Geng is active.

Explore More

Publication

Featured researches published by Guanggang Geng.

fuzzy systems and knowledge discovery | 2007

Boosting the Performance of Web Spam Detection with Ensemble Under-Sampling Classification

Guanggang Geng; Chun-Heng Wang; Qiudan Li; Lei Xu; Xiao-Bo Jin

Anti-spam has become one of the top challenges for the Web search. In this paper, we explore the Web spam detection as a binary classification problem. Based on the fact that reputable pages are more easy to be obtained than spam ones on the Web, an ensemble under-sampling classification strategy is adopted, which exploits the information involved in the large number of reputable Websites to full advantage. The strategy is based on the predicted spamicity of every sub-classifiers, in which both content-based and link-based features are taken into account. The experiments on standard WEBSPAM-UK2006 benchmark showed that the ensemble strategy can improve the web spam detection performance effectively.

international world wide web conferences | 2008

Improving personalized services in mobile commerce by a novel multicriteria rating approach

Qiudan Li; Chunheng Wang; Guanggang Geng

With the rapid growth of wireless technologies and mobile devices, there is a great demand for personalized services in m-commerce. Collaborative filtering (CF) is one of successful techniques to produce personalized recommendations for users. This paper proposes a novel approach to improve CF algorithms, where the contextual information of a user and the multicriteria ratings of an item are considered besides the typical information on users and items. The multilinear singular value decomposition (MSVD) technique is utilized to explore both explicit relations and implicit relations among user, item and criterion. We implement the approach in an existing m-commerce platform, and encouraging experimental results demonstrate its effectiveness.

international world wide web conferences | 2009

Link based small sample learning for web spam detection

Guanggang Geng; Qiudan Li; Xinchang Zhang

Robust statistical learning based web spam detection system often requires large amounts of labeled training data. However, labeled samples are more difficult, expensive and time consuming to obtain than unlabeled ones. This paper proposed link based semi-supervised learning algorithms to boost the performance of a classifier, which integrates the traditional Self-training with the topological dependency based link learning. The experiments with a few labeled samples on standard WEBSPAM-UK2006 benchmark showed that the algorithms are effective.

international world wide web conferences | 2007

A novel collaborative filtering-based framework for personalized services in m-commerce

Qiudan Li; Chunheng Wang; Guanggang Geng; Ruwei Dai

With the rapid growth of wireless technologies and handheld devices, m-commerce is becoming a promising research area. Personalization is especially important to the success of m-commerce. This paper proposes a novel collaborative filtering-based framework for personalized services in m-commerce. The framework extends our previous work by using Online Analytical Processing (OLAP) to represent the relations among user, content and context information, and adopting a multi-dimensional collaborative filtering model to perform inference. It provides a powerful and well-founded mechanism to personalization for m-commerce. We implemented it in an existing m-commerce platform, and experimental results demonstrate its feasibility and correctness.

asia information retrieval symposium | 2008

Improving spamdexing detection via a two-stage classification strategy

Guanggang Geng; Chunheng Wang; Qiudan Li

Spamdexing is any of various methods to manipulate the relevancy or prominence of resources indexed by a search engine, usually in a manner inconsistent with the purpose of the indexing system. Combating Spamdexing has become one of the top challenges for web search. Machine learning based methods have shown their superiority for being easy to adapt to newly developed spam techniques. In this paper, we propose a two-stage classification strategy to detect web spam, which is based on the predicted spamicity of learning algorithms and hyperlink propagation. Preliminary experiments on standard WEBSPAM- UK2006 benchmark show that the two-stage strategy is reasonable and effective.

Knowledge Based Systems | 2012

Statistical cross-language Web content quality assessment

Guanggang Geng; Liming Wang; Wei Wang; An-Lei Hu; Shuo Shen

Cross-language Web content quality assessment plays an important role in many Web content processing applications. In the previous research, natural language processing, heuristic content and term frequency-inverse document frequency features based statistical systems have proven effective for Web content quality assessment. However, these are language-dependent features, which are not suitable for cross-language ranking. This paper proposes a cross-language Web content quality assessment method. First multi-modal language-independent features are extracted. The extracting features include character features, domain registration features, two-layer hyperlink analysis features and third-party Web service features. All the extracted features are then fused. Based on the fused features, feature selection is carried out to get a new eigenspace. Finally cross-language Web content quality model on the eigenspace can be learned. The experiments on ECML/PKDD 2010 Discovery Challenge cross-language datasets demonstrate that every scale feature has discriminability; different modalities of features are complementary to each other; and the feature selection is effective for statistical learning based cross-language Web content quality assessment.

international world wide web conferences | 2008

Improving web spam detection with re-extracted features

Guanggang Geng; Chunheng Wang; Qiudan Li

Web spam detection has become one of the top challenges for the Internet search industry. Instead of using some heuristic rules, we propose a feature re-extraction strategy to optimize the detection result. Based on the predicted spamicity obtained by the preliminary detection, through the host level web graph, three types of features are extracted. Experiments on WEBSPAM-UK2006 benchmark show that with this strategy, the performance of web spam detection can be improved evidently.

european conference on information retrieval | 2007

Fighting link spam with a two-stage ranking strategy

Guanggang Geng; Chunheng Wang; Qiudan Li; Yuanping Zhu

Most of the existing combating web spam techniques focus on the spam detection itself, which are separated from the ranking process. In this paper, we propose a two-stage ranking strategy, which makes good use of hyperlink information among Websites and Websites intra structure information. The proposed method incorporates web spam detection into the ranking process and penalizes the ranking score of potential spam pages, instead of removing them arbitrarily. Preliminary experimental results show that our method is feasible and effective.

NLPCC/ICCPOL | 2016

Statistical Entity Ranking with Domain Knowledge

Xiao-Bo Jin; Guanggang Geng; Kaizhu Huang; Zhi-Wei Yan

Entity search is a new application meeting either precise or vague requirements from the search engines users. Baidu Cup 2016 Challenge just provided such a chance to tackle the problem of the entity search. We achieved the first place with the average MAP scores on 4 tasks including movie, tvShow, celebrity and restaurant. In this paper, we propose a series of similarity features based on both of the word frequency features and the word semantic features and describe our ranking architecture and experiment details.

fuzzy systems and knowledge discovery | 2013

Co-training based semi-supervised Web spam detection

Wei Wang; Xiaodong Lee; An-Lei Hu; Guanggang Geng

Traditional Web spam classifiers use only labeled data (feature/label pairs) to train. Labeled spam instances, however, are very difficult, expensive, or time consuming to obtain, as they require the efforts of experienced human annotators. Meanwhile unlabeled samples are relatively easy to collect. Semi-supervised learning addresses the classification problem by using large amount of unlabeled data, together with the labeled data, to build better classifiers. This paper proposes two new semi-supervised learning algorithms to boost the performance of Web spam classifiers. The algorithms integrate the traditional co-training with the topological dependency based hyperlink learning. The proposed methods extend our previous work on self-training based semi-supervised Web spam detection. The experimental results with 100/200 labeled samples on the standard WEBSPAM-UK2006 benchmark showed that the algorithms are effective.

Explore More