Shen Huang
Shanghai Jiao Tong University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shen Huang.
web intelligence | 2004
Shen Huang; Gui-Rong Xue; Ben-Yu Zhang; Zheng Chen; Yong Yu; Wei-Ying Ma
Content analysis and citation analysis are two common methods in recommending system. Compared with content analysis, citation analysis can discover more implicitly related papers. However, the citation-based methods may introduce more noise in citation graph and cause topic drift. Some work combine content with citation to improve similarity measurement. The problem is that the two features are not used to reinforce each other to get better result. To solve the problem, we propose a new algorithm, Topic Sensitive Similarity Propagation (TSSP), to effectively integrate content similarity into similarity propagation. TSSP has two parts: citation context based propagation and iterative reinforcement. First, citation contexts provide clues for which papers are topic related to and filter out less irrelevant citations. Second, iteratively integrating content and citation similarity enable them to reinforce each other during the propagation. The experimental results of a user study show TSSP outperforms other algorithms in almost all cases.
asia-pacific web conference | 2005
Shengping Li; Shen Huang; Gui-Rong Xue; Yong Yu
Using probabilistic Language Modeling approach in Information Retrieval, model for each document is estimated individually. However, with Web pages becoming more complex, each of them may contain some blocks discussing different topics. Consequently, the performance of statistic model for web document tends to be degraded by the mixture of topics. In this paper, we argue that segmenting Web page into several relatively independent blocks will assist the language modeling and a Block-based Language Modeling (BLM) approach is proposed. Different with normal method, BLM refines the modeling process into two parts: the probability of a query occurring in a block, and the probability of a block occurring in a Web page. Then given a query, those pages with more relevant blocks tend to be retrieved. Experimental results show that when unigram model is used, our approach outperforms original language modeling for web search in most cases.
web information systems engineering | 2004
Gui-Rong Xue; Shen Huang; Yong Yu; Hua-Jun Zeng; Zheng Chen; Wei-Ying Ma
In this paper, we propose a mining algorithm to utilize the user click-through data to improve search performance. The algorithm first explores the relationship between queries and Web pages and mine out co-visiting relationship as the virtual link among the Web pages, and then Spreading Activation mechanism is used to perform the query-dependent search. Our approach could overcome the challenges discussed above and the experimental results on a large set of MSN click-through log data show a significant improvement on search performance over the DirectHit algorithm as well as the baseline search engine.
web information systems engineering | 2004
Shen Huang; Gui-Rong Xue; Benyu Zhang; Zheng Chen; Yong Yu; Wei-Ying Ma
Clustering has been demonstrated as a feasible way to explore the contents of document collection and organize search engine results. For this task, many features of Web page, such as content, anchor text, URL, hyperlink etc, can be exploited and different results can be obtained. We expect to provide a unified and even better result for end users. Some work have studied how to use several types of features together to perform clustering. Most of them focus on ensemble method or combination of similarity. In this paper, we propose a novel algorithm: Multi-type Features based Reinforcement Clustering (MFRC). This algorithm does not use a unique combine score for all feature spaces, but uses the intermediate clustering result in one feature space as additional information to gradually enhance clustering in other spaces. Finally a consensus can be achieved by such mutual reinforcement. And the experimental results show that MFRC also provides some performance improvement.
Knowledge Engineering Review | 2003
Yanfeng Ge; Yong Yu; Xing Zhu; Shen Huang; Min Xu
Ontologies provide potential support for knowledge and content management on a P2P platform. Although we can design ontologies beforehand for an application, it is argued that in P2P environments static or predefined ontologies cannot satisfy the ever-changing requirements of all users. So we propose every user should make proposals for what kind of ontology is the most apt to his need. Collecting all these proposals (or votes) helps the drift of ontologies. This paper introduces OntoVote, a scalable distributed vote-collecting mechanism based on application-level broadcast trees, and describes how OntoVote can be applied to ontology drift on a P2P platform by discussing several problems involved in the voting process.
web age information management | 2005
Wei Liu; Gui-Rong Xue; Shen Huang; Yong Yu
Searching for information on the Web has attracted great attention in many research communities. Results returned by most Chinese web search engines usually reach up to thousands or even millions of documents, so efficient interfaces for search and navigation are of critical need. In this paper, we proposed an interactive search results clustering system to facilitate browsing Chinese web pages in a more compact and thematic form. Users can select the clusters that best match the implicit meanings of their queries and personalize on-the-fly those search results. Our experiments show that this highly efficient approach outperforms the traditional Chinese search engines.
web age information management | 2004
Shen Huang; Gui-Rong Xue; Xing Zhu; Yanfeng Ge; Yong Yu
Efficient full-text searching is a big challenge in Peer-to-Peer (P2P) system. Recently, Distributed Hash Table (DHT) becomes one of the reliable communication schemes for P2P. Some research efforts perform keyword searching and result intersection on DHT substrate. Two or more search requests must be issued for multi-keyword query. This article proposes a Sliding Window improved Multi-keyword Searching method (SWMS) to index and search full-text for short queries on DHT. The main assumptions behind SWMS are: (1) query overhead to do standard inverted list intersection is prohibitive in a distributed P2P system; (2) most of the documents relevant to a multi-keyword query have those keywords appearing near each other. The experimental results demonstrate that our method guarantees the search quality while reduce the cost of communication.
IEEE Transactions on Knowledge and Data Engineering | 2006
Shen Huang; Zheng Chen; Yong Yu; Wei-Ying Ma
Web Intelligence and Agent Systems: An International Journal | 2006
Shen Huang; Yong Yu; Gui-Rong Xue; Benyu Zhang; Zheng Chen; Wei-Ying Ma
international world wide web conferences | 2005
Shen Huang; Yong Yu; Shengping Li; Gui-Rong Xue; Lei Zhang