Is this you? Create Your Porfile

Shen Huang

Shanghai Jiao Tong University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shen Huang is active.

Explore More

Publication

Featured researches published by Shen Huang.

web intelligence | 2004

TSSP: A Reinforcement Algorithm to Find Related Papers

Shen Huang; Gui-Rong Xue; Ben-Yu Zhang; Zheng Chen; Yong Yu; Wei-Ying Ma

Content analysis and citation analysis are two common methods in recommending system. Compared with content analysis, citation analysis can discover more implicitly related papers. However, the citation-based methods may introduce more noise in citation graph and cause topic drift. Some work combine content with citation to improve similarity measurement. The problem is that the two features are not used to reinforce each other to get better result. To solve the problem, we propose a new algorithm, Topic Sensitive Similarity Propagation (TSSP), to effectively integrate content similarity into similarity propagation. TSSP has two parts: citation context based propagation and iterative reinforcement. First, citation contexts provide clues for which papers are topic related to and filter out less irrelevant citations. Second, iteratively integrating content and citation similarity enable them to reinforce each other during the propagation. The experimental results of a user study show TSSP outperforms other algorithms in almost all cases.

asia-pacific web conference | 2005

Block-based language modeling approach towards web search

Shengping Li; Shen Huang; Gui-Rong Xue; Yong Yu

Using probabilistic Language Modeling approach in Information Retrieval, model for each document is estimated individually. However, with Web pages becoming more complex, each of them may contain some blocks discussing different topics. Consequently, the performance of statistic model for web document tends to be degraded by the mixture of topics. In this paper, we argue that segmenting Web page into several relatively independent blocks will assist the language modeling and a Block-based Language Modeling (BLM) approach is proposed. Different with normal method, BLM refines the modeling process into two parts: the probability of a query occurring in a block, and the probability of a block occurring in a Web page. Then given a query, those pages with more relevant blocks tend to be retrieved. Experimental results show that when unigram model is used, our approach outperforms original language modeling for web search in most cases.

web information systems engineering | 2004

Optimizing Web Search Using Spreading Activation on the Clickthrough Data

Gui-Rong Xue; Shen Huang; Yong Yu; Hua-Jun Zeng; Zheng Chen; Wei-Ying Ma

In this paper, we propose a mining algorithm to utilize the user click-through data to improve search performance. The algorithm first explores the relationship between queries and Web pages and mine out co-visiting relationship as the virtual link among the Web pages, and then Spreading Activation mechanism is used to perform the query-dependent search. Our approach could overcome the challenges discussed above and the experimental results on a large set of MSN click-through log data show a significant improvement on search performance over the DirectHit algorithm as well as the baseline search engine.

web information systems engineering | 2004

Multi-type Features Based Web Document Clustering

Shen Huang; Gui-Rong Xue; Benyu Zhang; Zheng Chen; Yong Yu; Wei-Ying Ma

Clustering has been demonstrated as a feasible way to explore the contents of document collection and organize search engine results. For this task, many features of Web page, such as content, anchor text, URL, hyperlink etc, can be exploited and different results can be obtained. We expect to provide a unified and even better result for end users. Some work have studied how to use several types of features together to perform clustering. Most of them focus on ensemble method or combination of similarity. In this paper, we propose a novel algorithm: Multi-type Features based Reinforcement Clustering (MFRC). This algorithm does not use a unique combine score for all feature spaces, but uses the intermediate clustering result in one feature space as additional information to gradually enhance clustering in other spaces. Finally a consensus can be achieved by such mutual reinforcement. And the experimental results show that MFRC also provides some performance improvement.

Knowledge Engineering Review | 2003

OntoVote: a scalable distributed vote-collecting mechanism for ontology drift on a P2P platform

Yanfeng Ge; Yong Yu; Xing Zhu; Shen Huang; Min Xu

Ontologies provide potential support for knowledge and content management on a P2P platform. Although we can design ontologies beforehand for an application, it is argued that in P2P environments static or predefined ontologies cannot satisfy the ever-changing requirements of all users. So we propose every user should make proposals for what kind of ontology is the most apt to his need. Collecting all these proposals (or votes) helps the drift of ontologies. This paper introduces OntoVote, a scalable distributed vote-collecting mechanism based on application-level broadcast trees, and describes how OntoVote can be applied to ontology drift on a P2P platform by discussing several problems involved in the voting process.

web age information management | 2005

Interactive chinese search results clustering for personalization

Wei Liu; Gui-Rong Xue; Shen Huang; Yong Yu

Searching for information on the Web has attracted great attention in many research communities. Results returned by most Chinese web search engines usually reach up to thousands or even millions of documents, so efficient interfaces for search and navigation are of critical need. In this paper, we proposed an interactive search results clustering system to facilitate browsing Chinese web pages in a more compact and thematic form. Users can select the clusters that best match the implicit meanings of their queries and personalize on-the-fly those search results. Our experiments show that this highly efficient approach outperforms the traditional Chinese search engines.

web age information management | 2004

DHT Based Searching Improved by Sliding Window

Shen Huang; Gui-Rong Xue; Xing Zhu; Yanfeng Ge; Yong Yu

Efficient full-text searching is a big challenge in Peer-to-Peer (P2P) system. Recently, Distributed Hash Table (DHT) becomes one of the reliable communication schemes for P2P. Some research efforts perform keyword searching and result intersection on DHT substrate. Two or more search requests must be issued for multi-keyword query. This article proposes a Sliding Window improved Multi-keyword Searching method (SWMS) to index and search full-text for short queries on DHT. The main assumptions behind SWMS are: (1) query overhead to do standard inverted list intersection is prohibitive in a distributed P2P system; (2) most of the documents relevant to a multi-keyword query have those keywords appearing near each other. The experimental results demonstrate that our method guarantees the search quality while reduce the cost of communication.

IEEE Transactions on Knowledge and Data Engineering | 2006