Environmental science & technology | 2021

Predicting Micropollutant Removal by Reverse Osmosis and Nanofiltration Membranes: Is Machine Learning Viable?

 
 
 

Abstract


Predictive models for micropollutant removal by membrane separation are highly desirable for the design and selection of appropriate membranes. While machine learning (ML) models have been applied for such purposes, their reliability might be compromised by data leakage due to inappropriate data splitting. More importantly, whether ML models can truly understand the mechanisms of membrane separation has not been revealed. In this study, we evaluate the capability of the XGBoost model to predict micropollutant removal efficiencies of reverse osmosis and nanofiltration membranes. Our results demonstrate that data leakage leads to falsely high prediction accuracy. By utilizing a model interpretation method based on the cooperative game theory, we test the knowledge of XGBoost on the mechanisms of membrane separation via quantifying the contributions of input variables to the model predictions. We reveal that XGBoost possesses an adequate understanding of size exclusion, but its knowledge of electrostatic interactions and adsorption is limited. Our findings suggest that future work should focus more on avoiding data leakage and evaluating the mechanistic knowledge of ML models. In addition, high-quality data from more diverse experimental conditions, as well as more informative variables, are needed to improve the accuracy of ML models for predicting membrane performance.

Volume None
Pages None
DOI 10.1021/acs.est.1c04041
Language English
Journal Environmental science & technology

Full Text