Int. J. Web Inf. Syst. | 2019

Latent Dirichlet allocation-based temporal summarization

 
 

Abstract


Purpose \n \n \n \n \nDuring crises such as accidents or disasters, an enormous volume of information is generated on the Web. Both people and decision-makers often need to identify relevant and timely content that can help in understanding what happens and take right decisions, as soon it appears online. However, relevant content can be disseminated in document streams. The available information can also contain redundant content published by different sources. Therefore, the need of automatic construction of summaries that aggregate important, non-redundant and non-outdated pieces of information is becoming critical. \n \n \n \n \nDesign/methodology/approach \n \n \n \n \nThe aim of this paper is to present a new temporal summarization approach based on a popular topic model in the information retrieval field, the Latent Dirichlet Allocation. The approach consists of filtering documents over streams, extracting relevant parts of information and then using topic modeling to reveal their underlying aspects to extract the most relevant and novel pieces of information to be added to the summary. \n \n \n \n \nFindings \n \n \n \n \nThe performance evaluation of the proposed temporal summarization approach based on Latent Dirichlet Allocation, performed on the TREC Temporal Summarization 2014 framework, clearly demonstrates its effectiveness to provide short and precise summaries of events. \n \n \n \n \nOriginality/value \n \n \n \n \nUnlike most of the state of the art approaches, the proposed method determines the importance of the pieces of information to be added to the summaries solely relying on their representation in the topic space provided by Latent Dirichlet Allocation, without the use of any external source of evidence.

Volume 15
Pages 83-102
DOI 10.1108/IJWIS-04-2018-0023
Language English
Journal Int. J. Web Inf. Syst.

Full Text