Progress in Artificial Intelligence | 2021

Text categorization based on a new classification by thresholds

 
 
 

Abstract


Automated text categorization attempts to provide an effective solution to today’s unprecedented growth of textual data. Due to its capacity to organize a huge and varied amount of texts from which it is possible to gain invaluable insights, it has become an emerging investigative field for the research community. However, although several mathematical approaches have been studied to formalize the main components of a text categorization system: text representation, features extraction, and the classification process; such systems still face many difficulties due both to the complex nature of text databases and to the high dimensionality of texts representations. In this sense, this paper introduces an alternative way to process this problem. First, it starts by reducing the original set of features by using a newly proposed metric. And second, the added advantage of the proposed approach is that it automatically classifies a text without necessarily processing all its features. Moreover, some standard pretreatments such as stemming can be abandoned with this approach. The experimental results showed that this new text categorization method outperforms the state-of-the-art methods. As a result, the obtained f-measures on the 20 Newsgroups, BBC News, Reuters, and AG news datasets were, respectively, 95.06%, 98.21%, 88.44%, 95.70%, while standard approaches returned considerably lower scores.

Volume None
Pages 1-15
DOI 10.1007/S13748-021-00247-1
Language English
Journal Progress in Artificial Intelligence

Full Text