Archive | 2019

A Novel Conditional Random Fields Aided Fuzzy Matching in Vietnamese Address Standardization

 
 

Abstract


Address standardization is the process of recognizing and normalizing free-form addresses into a common standard format. In today s digital economy, this process is increasingly challenging such as in ecommerce fulfillment, logistic planning, geographical data analysis, real-estate, and social network mining, etc. Traditional approaches mostly follow two directions: Named Entity Recognition (NER) and fuzzy matching. Particularly, for Vietnamese address, neither these two approaches are efficient due to sparse and erroneous data. In this paper, we propose a novel approach that leverages NER model as a suggestion to re-rank potential address candidates obtained by the fuzzy matching stage. We develop a log-linear model for this re-ranking purpose. Our experiments showed that it outperforms both NER and fuzzy matching approaches with an accuracy of 88%, and suggested further applications on different language data.

Volume None
Pages 23-28
DOI 10.1145/3368926.3369687
Language English
Journal None

Full Text