Archive | 2021

WIKI STREAMS: Wikipedia Article Recent Edit Retrieval System using Hierarchical Stream Clustering

 
 

Abstract


Stream analytics, a new paradigm in data analytics, has gained momentum due to the voluminous stream data generation. With the huge increase in the edits performed on Wikipedia topics, it is tedious for the digital knowledge discovery users to find their domain updates immediately. The users need to go through large information and spend more time to find the potential data. There is a need for retrieving the Wikipedia edits based on the meta data of the article edits for later retriev-al. Hence, the clustering technique may be employed in order to group the Wikipedia article edits domain wise. Hence, in this paper, hierarchical stream clustering is applied in order to retrieve the edits based on the user interest. Over a period of month, the data from Wikipedia is collected and used as a dataset. Our method is compared with the state-of-the-art clus-tering system WikiAutoCat and it is observed that the accuracy is improved by 10% and the clustering time is reduced by 20%.

Volume None
Pages None
DOI 10.21203/RS.3.RS-452931/V1
Language English
Journal None

Full Text