IEEE Internet of Things Journal | 2021
Deep Q-Network-Based Feature Selection for Multisourced Data Cleaning
Abstract
The Internet of Things (IoT) integrates information collected from multisources and is able to support various intelligent smart city applications, such as industrial manufacturing, power systems, and mobile healthcare. In the big data era, multisourced data are collected on a daily basis, whereas a large part of the data may be irrelevant, redundant, noisy, or even malicious from a machine learning perspective. Feature selection has been a powerful data cleaning technique to reduce data redundancy and improve system performance in machine learning. Inspired by reinforcement learning that learns from its experience, in this article, we propose a novel efficient deep $Q$ -network (DQN)-based feature selection method for multisourced data cleaning. In particular, we model the feature selection problem as a competition between an agent and the environment in dynamic states, which is solved by a DQN. Traditional DQN suffers from high computational complexity and requires a significant amount of time in order to converge in the training process. To tackle these challenges, we develop a space searching algorithm called SS to speed up the training process of the DQN agent. To validate the efficacy and efficiency of the proposed method, we conduct extensive experiments on various types of IoT data. Simulation results show that the proposed DQN-based feature selection algorithms achieve much better performance compared with state-of-the-art methods, and are robust under data poisoning attacks.