Corpus Linguistics and Linguistic Theory | 2019

On classification trees and random forests in corpus linguistics: Some words of caution and suggestions for improvement

 

Abstract


Abstract This paper is a discussion of methodological problems that (can) arise in the analysis of multifactorial data analyzed with tree-based or forest-based classifiers in (corpus) linguistics. I showcase a data set that highlights where such methods can fail at providing optimal results and then discuss solutions to this problem as well as the interpretation of random forests more generally.

Volume 16
Pages 617 - 647
DOI 10.1515/cllt-2018-0078
Language English
Journal Corpus Linguistics and Linguistic Theory

Full Text