Applied Intelligence | 2021

A two-domain coordinated sentence similarity scheme for question-answering robots regarding unpredictable outliers and non-orthogonal categories

 
 
 
 
 

Abstract


It is crucial and challenging for the question-answering robot (Qabot) to match the customer-input questions with the priori identification questions due to highly diversified expressions, especially in the case of Chinese. This article proposes a coordinated scheme to analyze the similarity between sentences in two independent domains instead of a single deep learning model. In the structure domain, the BLEU and data preprocessing are applied for binary analysis to discriminate the unpredictable outliers (illegal questions) to existing library. In the semantics domain, the MC-BERT model, which integrates the BERT encoder and the Multi-kernel convolutional top classifier, is developed to handle the non-orthogonality of class identification questions. The two-domain analyses are in parallel and the two similarity scores are coordinated for the final response. The linguistic features of Chinese are also taken into account. A realistic case of Qabot on energy trading service and finance is numerically studied. Computational results validate the effectiveness and accuracy of the proposed algorithm: Top-1 and Top-3 accuracies are 90.5% and 95.5%, respectively, which are significantly superior to the latest published results.

Volume None
Pages 1-17
DOI 10.1007/S10489-021-02269-7
Language English
Journal Applied Intelligence

Full Text