Prediction of Polycyclic Aromatic Hydrocarbons (PAHs) Removal from Wastewater Treatment Sludge Using Machine Learning Methods


ÇAĞLAR GENÇOSMAN B., EKER ŞANLI G.

WATER AIR AND SOIL POLLUTION, vol.232, no.3, 2021 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 232 Issue: 3
  • Publication Date: 2021
  • Doi Number: 10.1007/s11270-021-05049-8
  • Journal Name: WATER AIR AND SOIL POLLUTION
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, ABI/INFORM, Agricultural & Environmental Science Database, Aqualine, Aquatic Science & Fisheries Abstracts (ASFA), Artic & Antarctic Regions, BIOSIS, Biotechnology Research Abstracts, CAB Abstracts, Chemical Abstracts Core, Chimica, Compendex, EMBASE, Environment Index, Geobase, Greenfile, Pollution Abstracts, Veterinary Science Database, Civil Engineering Abstracts
  • Keywords: PAH, Wastewater treatment sludge, UV-C light, Data mining, machine learning, Over-sampling methods, Prediction of PAH removal efficiency, SEWAGE-SLUDGE, NEURAL-NETWORK, SOIL SURFACES, PHOTOCATALYTIC DEGRADATION, POLYCHLORINATED-BIPHENYLS, TREATMENT-PLANT, MODEL, PHOTODEGRADATION, CLASSIFICATION, BIOREMEDIATION
  • Bursa Uludag University Affiliated: Yes

Abstract

Removal of polycyclic aromatic hydrocarbons (PAHs) from wastewater treatment sludge with appropriate technologies is of great importance for nature and public health. UV technology is one of the most frequently used methods for the removal of PAHs. While various photodegradation applications with UV-C (ultraviolet-C) light and photocatalysts can be performed to remove these compounds, a large number of tests should be implemented to determine optimum removal conditions, which increase time and cost. It is possible to make predictions for the removal efficiency of PAHs by using data mining classification and reveal the hidden knowledge from data. This study aims to determine appropriate machine learning (ML) methods for the prediction of the PAH removal efficiency from wastewater treatment sludges regarding the initial PAH levels. The samples have multi-class imbalanced outputs; thus, random over-sampling and Synthetic Minority Over-sampling TEchniques (SMOTE) are used to improve the prediction results. Well-known data mining classification/machine learning methods, artificial neural network (multi-layer perceptron-MLP), k-means (k-NN), support vector machine (SVM), decision tree (C4.5), random forest (RF), and Bagging, are proposed for the prediction of removal efficiencies. Different evaluation metrics, Accuracy, multi-class AUC (MAUC-multi-class area under ROC curve), F-measure, Precision, Recall, and Specificity are used for the performance comparisons. RF and k-NN perform better with 92.35% and 92.36% average prediction accuracies, respectively. Besides, RF outperforms other methods with 0.97 MAUC value. RF and k-NN can be used for the removal efficiency prediction on the multi-class imbalanced datasets successfully, and removal efficiencies can be highly predicted considering input components with less cost and effort.