Archive | 2019
A Novel Conditional Random Fields Aided Fuzzy Matching in Vietnamese Address Standardization
Abstract
Address standardization is the process of recognizing and normalizing free-form addresses into a common standard format. In today s digital economy, this process is increasingly challenging such as in ecommerce fulfillment, logistic planning, geographical data analysis, real-estate, and social network mining, etc. Traditional approaches mostly follow two directions: Named Entity Recognition (NER) and fuzzy matching. Particularly, for Vietnamese address, neither these two approaches are efficient due to sparse and erroneous data. In this paper, we propose a novel approach that leverages NER model as a suggestion to re-rank potential address candidates obtained by the fuzzy matching stage. We develop a log-linear model for this re-ranking purpose. Our experiments showed that it outperforms both NER and fuzzy matching approaches with an accuracy of 88%, and suggested further applications on different language data.