2020 28th European Signal Processing Conference (EUSIPCO) | 2021
Theoretical Tuning of the Autoencoder Bottleneck Layer Dimension: A Mutual Information-based Algorithm
Abstract
Under the transportation field, the literature states that forecasting with excessive number of features can be computational inefficient and undertakes the risk of over-fitting. Because of that, several authors proposed the use of autoencoders (AE) as a way of learning fewer but useful features to enhance the road traffic forecast. Notably, the adequacy of the bottleneck layer dimension of the AE has not been addressed, thus there is no standard way for automatic selection of the dimensionality. We address the problem from an information theory perspective as the reconstruction error is not a reliable indicator of the performance of the subsequent supervised learning algorithm. Hence, we propose an algorithm based on how mutual information and entropy of data evolve during training of the AE. We validate it against two real-world traffic datasets and provide discussion why the entropy of codes is a reliable performance indicator. Compared to the tendency found in the literature, based on trial-and-error methods, the advantage of our proposal is that a practitioner can efficiently find said dimension guaranteeing maximal data compression and reliable traffic forecast.