Archive | 2021

Word Recognition using Embedded Prototype Subspace Classifiers on a New Imbalanced Dataset

 
 

Abstract


This paper presents an approach towards word recognition based on embedded prototype subspace classification. The purpose of this paper is three-fold. Firstly, a new dataset for word recognition is presented, which is extracted from the Esposalles database consisting of the Barcelona cathedral marriage records. Secondly, different clustering techniques are evaluated for Embedded Prototype Subspace Classifiers. The dataset, containing 30 different classes of words is heavily imbalanced, and some word classes are very similar, which renders the classification task rather challenging. For ease of use, no stratified sampling is done in advance, and the impact of different data splits is evaluated for different clustering techniques. It will be demonstrated that the original clustering technique based on scaling the bandwidth has to be adjusted for this new dataset. Thirdly, an algorithm is therefore proposed that finds k clusters, striving to obtain a certain amount of feature points in each cluster, rather than finding some clusters based on scaling the Silverman’s rule of thumb. Furthermore, Self Organising Maps are also evaluated as both a clustering and embedding technique.

Volume None
Pages None
DOI 10.24132/jwscg.2021.29.5
Language English
Journal None

Full Text