Prediction of Polycyclic Aromatic Hydrocarbons (PAHs) Removal from Wastewater Treatment Sludge Using Machine Learning Methods


WATER AIR AND SOIL POLLUTION, vol.232, no.3, 2021 (Journal Indexed in SCI) identifier identifier

  • Publication Type: Article / Article
  • Volume: 232 Issue: 3
  • Publication Date: 2021
  • Doi Number: 10.1007/s11270-021-05049-8
  • Keywords: PAH, Wastewater treatment sludge, UV-C light, Data mining, machine learning, Over-sampling methods, Prediction of PAH removal efficiency


Removal of polycyclic aromatic hydrocarbons (PAHs) from wastewater treatment sludge with appropriate technologies is of great importance for nature and public health. UV technology is one of the most frequently used methods for the removal of PAHs. While various photodegradation applications with UV-C (ultraviolet-C) light and photocatalysts can be performed to remove these compounds, a large number of tests should be implemented to determine optimum removal conditions, which increase time and cost. It is possible to make predictions for the removal efficiency of PAHs by using data mining classification and reveal the hidden knowledge from data. This study aims to determine appropriate machine learning (ML) methods for the prediction of the PAH removal efficiency from wastewater treatment sludges regarding the initial PAH levels. The samples have multi-class imbalanced outputs; thus, random over-sampling and Synthetic Minority Over-sampling TEchniques (SMOTE) are used to improve the prediction results. Well-known data mining classification/machine learning methods, artificial neural network (multi-layer perceptron-MLP), k-means (k-NN), support vector machine (SVM), decision tree (C4.5), random forest (RF), and Bagging, are proposed for the prediction of removal efficiencies. Different evaluation metrics, Accuracy, multi-class AUC (MAUC-multi-class area under ROC curve), F-measure, Precision, Recall, and Specificity are used for the performance comparisons. RF and k-NN perform better with 92.35% and 92.36% average prediction accuracies, respectively. Besides, RF outperforms other methods with 0.97 MAUC value. RF and k-NN can be used for the removal efficiency prediction on the multi-class imbalanced datasets successfully, and removal efficiencies can be highly predicted considering input components with less cost and effort.