IEEE Internet of Things Journal | 2021

Deep Q-Network-Based Feature Selection for Multisourced Data Cleaning

 
 
 
 
 

Abstract


The Internet of Things (IoT) integrates information collected from multisources and is able to support various intelligent smart city applications, such as industrial manufacturing, power systems, and mobile healthcare. In the big data era, multisourced data are collected on a daily basis, whereas a large part of the data may be irrelevant, redundant, noisy, or even malicious from a machine learning perspective. Feature selection has been a powerful data cleaning technique to reduce data redundancy and improve system performance in machine learning. Inspired by reinforcement learning that learns from its experience, in this article, we propose a novel efficient deep $Q$ -network (DQN)-based feature selection method for multisourced data cleaning. In particular, we model the feature selection problem as a competition between an agent and the environment in dynamic states, which is solved by a DQN. Traditional DQN suffers from high computational complexity and requires a significant amount of time in order to converge in the training process. To tackle these challenges, we develop a space searching algorithm called SS to speed up the training process of the DQN agent. To validate the efficacy and efficiency of the proposed method, we conduct extensive experiments on various types of IoT data. Simulation results show that the proposed DQN-based feature selection algorithms achieve much better performance compared with state-of-the-art methods, and are robust under data poisoning attacks.

Volume 8
Pages 16153-16164
DOI 10.1109/jiot.2020.3016297
Language English
Journal IEEE Internet of Things Journal

Full Text