J. Inf. Technol. Res. | 2019

A Hybrid Hindi Printed Document Classification System Using SVM and Fuzzy: An Advancement

 
 

Abstract


This\ufeff article\ufeff introduces\ufeff a\ufeff new\ufeff advanced\ufeff tri-layered\ufeff segmentation\ufeff and\ufeff bi-leveled-classifier-based\ufeff Hindi\ufeff printed\ufeff document\ufeff classification\ufeff system,\ufeff which\ufeff categorizes\ufeff imaged\ufeff documents\ufeff into\ufeff predefined\ufeffmutually\ufeffexclusive\ufeffcategories\ufeffby\ufeffusing\ufeffSVM\ufeffand\ufeffFuzzy\ufeffmatching\ufeffat\ufeffcharacter\ufeffand\ufeffdocument\ufeff classifications,\ufeffrespectively.\ufeffDuring\ufefftraining,\ufeffthe\ufeffimproved\ufeffand\ufeffnoise-free\ufeffimage\ufeffis\ufeffsegmented\ufeffinto\ufeff lines\ufeffand\ufeffwords\ufeffby\ufeffprofiling.\ufeffThen\ufeffit\ufeffobtains\ufeffShirorekha\ufeffLess\ufeff(SL)\ufeffisolated\ufeffcharacters\ufeffalong\ufeffwith\ufeff upper,\ufeffleft\ufeffand\ufeffright\ufeffmodifier\ufeffcomponents\ufefffrom\ufeffthe\ufeffSL\ufeffwords.\ufeffThese\ufeffcomponents\ufeffuse\ufefftheir\ufefflocations\ufeff and\ufeffinter\ufeffcharacter-modifier\ufeffcomponent\ufeffdistance\ufeffto\ufeffget\ufeffassociate\ufeffwith\ufefftheir\ufeffcorresponding\ufeffcharacters\ufeff only.\ufeffFurther,\ufeffconfidence\ufeffvalues\ufeffof\ufeffall\ufeffcharacters\ufeffare\ufeffcalculated\ufeffwith\ufeffSVM\ufefftraining\ufeffand\ufeffall\ufeffcharacters\ufeff are\ufeffmapped\ufeffinto\ufeffRomanized\ufefflabels\ufeffto\ufeffgenerate\ufeffthe\ufeffwords.\ufeffFinally,\ufeffdocuments\ufeffare\ufeffclassified\ufeffby\ufeffFuzzy\ufeff based\ufeffmatching\ufeffof\ufeffRomanized\ufeffdetected\ufeffwords\ufeffand\ufeffpredefined\ufeffclasses.\ufeffThe\ufeffaverage\ufeffexecution\ufefftimes\ufeffof\ufeff SL\ufeffcharacters\ufeffare\ufeff0.22675\ufeffsec.\ufeffand\ufeff0.20375\ufeffsec.\ufeffand\ufeffclassification\ufeffaccuracy\ufeffare\ufeff74.61%\ufeffand\ufeff80.73%\ufeff for\ufefftraining\ufeffand\ufefftesting,\ufeffrespectively. KeywoRDS Character Recognition Mapping, Confidence, Document Classification, Feature Extraction, Fuzzy Matching, Hindi Printed Images, Shirorekha Less Characters, Shirorekha Less Words, SVM, Word Association

Volume 12
Pages 107-131
DOI 10.4018/jitr.2019100106
Language English
Journal J. Inf. Technol. Res.

Full Text