International Journal of Advanced Computer Science and Applications | 2021

A Systematic Review Web Content Mining Tools and its Applications

 
 

Abstract


In recent years, the emergence of WWW (World Wide Web) led to the accumulation of huge amount of information and data. Hence the web is found to consist of unstructured and structured information that impacts the day to day life of the society. Because of such availability of huge information, utilization of the required information becomes more challenging. This paper provided a comprehensive survey on the current situation and recent trends on web content mining (WCM) and its applications thereby contributing to the enhancement of the upcoming research in WCM. The paper focused mainly on the mining and retrieval techniques, various WCM approaches, challenges and process of information retrieval and information extraction. The paper describes the four major tasks of web content mining that is information retrieval, information extraction, generalization and validation in detail. WCM concentrates on orchestrating, sorting, classifying, collecting, congregating of web data and provide the improved data which can be easily accessed by the users. Web content mining tools were needed to scan text, images and HTML documents and provide results to the search engine. It guides the search engine to provide better productive results for every search based on their importance. The paper also analysed different web content mining tools for the extraction of relevant information from the corresponding web page. Keywords—Web content mining; web structure mining; web usage mining; data mining; information retrieval; information extraction

Volume None
Pages None
DOI 10.14569/ijacsa.2021.0120886
Language English
Journal International Journal of Advanced Computer Science and Applications

Full Text