2019 11th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC) | 2019
Research on Multi-Attribute Data Completion Method Considering Data Distribution Characteristics
Abstract
For the data completion method, the maximum likelihood estimation method is suitable for big data, and the K-nearest neighbor method only considers the linear relationship between the same attributes of different data. Although the BP neural network considers the nonlinear relationship between data attributes, the sample distribution has a great influence on the data completion effect. DBSCAN is used to classify the sample data, analyze its distribution characteristics, eliminate the noise data and select training samples, and use BP neural network to fit the nonlinear relationship between data attributes to predict the missing data values. Through the analysis of the example data set, it can be seen that the BP considering the data distribution characteristics has the best complete accuracy.