Archive | 2019

CRFPOST: Part-of-Speech Tagger for Filipino Texts using Conditional Random Fields

 
 
 

Abstract


Classifying and tagging words into different lexical classes, as a fundamental process in language processing, is necessary to be addressed given the constant evolution of language, and in this case is the Filipino language. As a part of this effort, the researchers introduce in this paper a Linear-chain Conditional Random Fields (CRF) Part-of-Speech Tagger for Filipino texts with CRF providing an edge in sequence labelling as compared to generative models and other classifiers. The tool developed utilized a tag set containing 218 POS tags (69 basic and 161 compound) and Filipino text corpus with 15,166 sentences randomly picked from Wikipedia and translated to Filipino by students under linguist supervision. After experimentation, the researchers show that there is a 90.59% accuracy rate for tagging Filipino texts using CRF for POS tagging. Despite CRFPOST s utilization of word and tag sequence features produces a high performance in tagging, there are still improvements for future work. Recommendations are the inclusion of linguistic tools such as morphological analyzer and named entity recognition for better performance.

Volume None
Pages None
DOI 10.1145/3377713.3377788
Language English
Journal None

Full Text