Is this you? Create Your Porfile

Dingyi Han

Shanghai Jiao Tong University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dingyi Han is active.

Explore More

Publication

Featured researches published by Dingyi Han.

international conference on computational linguistics | 2008

Understanding and Summarizing Answers in Community-Based Question Answering Services

Yuanjie Liu; Shasha Li; Yunbo Cao; Chin-Yew Lin; Dingyi Han; Yong Yu

Community-based question answering (cQA) services have accumulated millions of questions and their answers over time. In the process of accumulation, cQA services assume that questions always have unique best answers. However, with an in-depth analysis of questions and answers on cQA services, we find that the assumption cannot be true. According to the analysis, at least 78% of the cQA best answers are reusable when similar questions are asked again, but no more than 48% of them are indeed the unique best answers. We conduct the analysis by proposing taxonomies for cQA questions and answers. To better reuse the cQA content, we also propose applying automatic summarization techniques to summarize answers. Our results show that question-type oriented summarization techniques can improve cQA answer quality significantly.

asia pacific web conference | 2006

A statistical study of today’s gnutella

Shicong Meng; Cong Shi; Dingyi Han; Xing Zhu; Yong Yu

As a developing P2P system, Gnutella has upgraded its protocol to 0.6, which significantly changed the characteristics of its hosts. However, few previous work has given a wide-scale study to the new version of Gnutella. In addition, various kinds of P2P models are used to evaluate P2P systems or mechanisms, but the reliability of some hypotheses used in the models are not carefully studied or proved. In this paper, we try to remedy this situation by performing a large scaled measurement study on Gnutella with the help of some new crawling approaches. In particular, we characterize Gnutella by its queries, shared files and peer roles. Our measurements show that the assumption that query arrival follows Poisson distribution may not be true in Gnutella and most peers incline to share files of very limited types, even when MP3 files are excluded. We also find that many ultrapeers in Gnutella are not well selected. Statistical data provided in this paper can also be useful for P2P modeling and simulation.

Computer Communications | 2008

A dynamic routing protocol for keyword search in unstructured peer-to-peer networks

Cong Shi; Dingyi Han; Yuanjie Liu; Shicong Meng; Yong Yu

The idea of building query-oriented routing indices has changed the way of improving keyword search efficiency from the basis as it can learn the content distribution from the query routing process. It gradually improves search efficiency without excessive network overhead for the construction and maintenance of routing indices. However, previously proposed protocol is not practically effective due to the slow improvement of routing efficiency. In this paper, we propose a novel protocol for query-oriented routing indices which quickly achieves high search efficiency at low cost. The maintenance mechanism employs reinforcement learning to exploit mass peer behavior. It explicitly uses the expected number of returned results to depict the content distribution, which helps quickly approximate the real distribution. The routing mechanism is to retrieve as many contents as possible and help speed up the learning process. To further improve the search efficiency, several methods are taken to optimize the routing and maintenance mechanism. In dealing with multi-keyword queries, the information of corresponding keywords is also used to forward the queries. In addition, to accelerate the learning speed, a rough description of content distribution is achieved when the query is first seen. The experimental evaluation shows that the mechanism achieves high routing efficiency, quick learning ability, and satisfactory performance under churn.

international conference on peer-to-peer computing | 2006

Reinforcement Learning for Query-Oriented Routing Indices in Unstructured Peer-to-Peer Networks

Cong Shi; Shicong Meng; Yuanjie Liu; Dingyi Han; Yong Yu

The idea of building query-oriented routing indices has changed the way of improving routing efficiency from the basis as it can learn the content distribution during the query routing process. It gradually improves routing efficiency with no excessive network overhead of the routing index construction and maintenance. However, the previously proposed mechanism is not practically effective due to the slow improvement of routing efficiency. In this paper, we propose a novel mechanism for query-oriented routing indices which quickly achieves high routing efficiency at low cost. The maintenance method employs reinforcement learning to utilize mass peer behaviors to construct and maintain routing indices. It explicitly uses the expected value of returned content number to depict the content distribution, which helps quickly approximate the real distribution. Meanwhile, the routing method is to retrieve as many contents as possible. It also helps speed up the learning process further. The experimental evaluation shows that the mechanism has high routing efficiency, quick learning ability and satisfactory performance under churn

international conference on peer-to-peer computing | 2006

Cuckoo Ring: BalancingWorkload for Locality Sensitive Hash

Dingyi Han; Ting Shen; Shicong Meng; Yong Yu

Locality sensitive hash (LSH) is widely used in peer-to-peer (P2P) systems. Although it can support range or similarity queries, it breaks the load balance mechanism of traditional distributed hash table (DHT) based system by replacing consistent hash with LSH. To solve the imbalance problem, current systems either weaken the locality preserve ability from similarity preserved to order preserved or adopt load aware peer join mechanism. The first method does not support similarity query as it loses the similarity information and the second method is greatly affected by the dynamic nature of P2P networks. In this paper, we propose a novel system, cuckoo ring, which can preserve similarity information while load balanced. It does not guide the newly joining peer to the hot areas but move the items in the hot areas to cold areas so that the short life time peers are distributed uniformly across the network instead of being guided to the hot areas. Compared to traditional DHT systems, cuckoo ring only maintains a little more information about the global light load peers and the moved indexed items

international conference on peer-to-peer computing | 2005

An effective resource description based approach to find similar peers

Xing Zhu; Dingyi Han; Weibin Zhu; Yong Yu

In text document sharing peer-to-peer (P2P) applications, similar peers are peers which share documents about the same topic. Finding similar peers can benefit document retrieval in P2P systems. In this paper, the authors suggested an effective resource description based approach to find similar peers. By combining the topic model (an extension of language model) technique and fuzzy set theory, the key problems about resource description generation and peer similarity calculation were solved. Experiments performed on the standard data sets prove that this approach is effective. Especially, comparing with the traditional approach, experimental results in the simulated P2P environment manifest that this approach works much better when only local information is available.

international conference on data mining | 2006

Mining and Predicting Duplication over Peer-to-Peer Query Streams

Shicong Meng; Yifeng Shao; Cong Shi; Dingyi Han; Yong Yu

Many previous works of data mining user queries in peer-to-peer systems focused their attention on the distribution of query contents. However, few has been done towards a better understanding of the time series distribution of these queries, which is vital for system performance. To remedy this situation, this paper mines query steams by using automatic time series analysis to evaluate different linear models (Box-Jenkins models and some simple windowed-mean models) for predicting the number of duplicated queries from 10 minutes to 2 hours into the future. Both the predictive power and the computational costs of these models are evaluated over 318,942,450 real world Gnutella queries collected over 3 months. We find the number of duplicated queries is consistently predictable. Simple, practical models like AR perform well on prediction

international conference on peer-to-peer computing | 2007

Predicting Query Duplication with Box-Jenkins Models and Its Applications

Xinyao Hu; Shicong Meng; Cong Shi; Dingyi Han; Yong Yu

In this paper, we analyze the stability of large scale superpeer networks against attack. Two different kinds of attacks namely deterministic and degree dependent attack have been introduced. We model the superpeer networks with the help of bimodal degree distribution and different attacks with the help of graph dynamics. It is interesting to observe from both theoretical and simulation results that peer degree plays the key role for maintaining stability of the superpeer networks in face of these two attacks.Many previous works of Peer-to-Peer traffic characterization and modeling focused their attention on the distribution of query contents. However, few has been done towards a better understanding of the time series distribution of these queries, which is vital for system performance. To remedy this situation, this paper characterizes query traffic by using automatic time series analysis to evaluate different linear models(Box-Jenkins models and some simple windowed-mean models) for predicting the number of duplicated queries from 10 minutes to 2 hours into the future. Both the predictive power and the computational costs of these models are evaluated over 318,942,450 real world Gnutella queries collected over 3 months. We find the number of duplicated queries is consistently predictable. Simple, practical models like AR perform well on prediction. To show that these characteristics have a wide range of potential applications, we propose two enhancement to existing search results caching and load balancing algorithms. Our simulation study shows that our methodology works quite well in both scenarios in terms of efficiency and effectiveness. The main contribution of this paper lies in: (1) proposing new measurement techniques on Gnutella, (2) characterizing and modeling peer-to-peer query traffic with Box-Jenkins Models, (3) presenting a general enhancement to existing performance optimization algorithm in P2P systems.

web information systems engineering | 2006

Exploiting rating behaviors for effective collaborative filtering

Dingyi Han; Yong Yu; Gui-Rong Xue

Collaborative Filtering (CF) is important in the e-business era as it can help business companies to predict customer preferences. However, Sparsity is still a major problem preventing it from achieving better effectiveness. Lots of ratings in the training matrix are unknown. Few current CF methods try to fill in those blanks before predicting the ratings of an active user. In this work, we have validated the effectiveness of matrix filling methods for the collaborative filtering. Moreover, we have tried three different matrix filling methods based on the whole training dataset and their clustered subsets with different weights to show the different effects. By comparison, we have analyzed the characteristics of those methods and have found that the mainstream method, Personality diagnosis (PD), can work better with most matrix filling method. Its MAE can reach 0.935 on a 2%-density EachMovie training dataset by item based matrix filling method, which is a 10.1% improvement. Similar improvements can be found both on EachMovie and MovieLens datasets. Our experiments also show that there is no need to do cluster-based matrix filling but the filled values should be assigned with a lower weight during the prediction process.

web age information management | 2005

A novel resource description based approach for clustering peers

Xing Zhu; Dingyi Han; Yong Yu; Weibin Zhu

Clustering similar peers could benefit document retrieval in P2P systems. In this paper, we suggested a resource description based approach to cluster peers according to their topics. By combining the topic model (an extension of language model) technique and fuzzy set theory, we solve the key problems about resource generation and peer similarity calculation. Experiments performed on the standard data sets prove that our approach is more effective than the traditional method.

Explore More