2021 IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC) | 2021

Interrelated information pair extraction algorithm of visual attention for form documents

 
 

Abstract


Information extraction technology of form documents plays an important role in facilitating the retention and retrieval of forms. The existing algorithms for form information extraction usually lacks certain accuracy and universality. In this paper, an algorithm of structured key information extraction for forms is proposed, which takes into account image features and text layout of forms, and is universal to different kinds of forms. First, a keywords and key information detection network is designed to detect information blocks. Then considering the characteristics of text layout based on human visual attention mechanism, an algorithm of matching keywords and corresponding information based on position relationships is proposed. Finally, the classification of information is realized by fuzzy matching of keywords, so as to extract the structured information quickly and accurately. Experiments show that compared with existing form information extraction systems, this algorithm realizes the extraction of structured information from form documents based on image features and text layout, greatly improving accuracy and universality.

Volume None
Pages 560-567
DOI 10.1109/IPEC51340.2021.9421311
Language English
Journal 2021 IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC)

Full Text