A novel LOF-based ensemble regression tree methodology

Ongelen, Gozde; İNKAYA, TÜLİN

doi:10.1007/s00521-023-08773-w

A novel LOF-based ensemble regression tree methodology

Ongelen G., İNKAYA T.

NEURAL COMPUTING & APPLICATIONS, cilt.35, sa.26, ss.19453-19463, 2023 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 35 Sayı: 26
Basım Tarihi: 2023
Doi Numarası: 10.1007/s00521-023-08773-w
Dergi Adı: NEURAL COMPUTING & APPLICATIONS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Applied Science & Technology Source, Biotechnology Research Abstracts, Compendex, Computer & Applied Sciences, Index Islamicus, INSPEC, zbMATH
Sayfa Sayıları: ss.19453-19463
Anahtar Kelimeler: Prediction, Regression tree, Ensemble learning, Local outlier factor, Outlier removal, CLASSIFICATION, PREDICTION, SELECTION, FOREST, MODEL
Bursa Uludağ Üniversitesi Adresli: Evet

Özet

With the emergence of digitilization, numeric prediction has become a prominent problem in various fields including finance, engineering, industry, and medicine. Among several machine learning methods, regression tree is a widely preferred method due to its simplicity, interpretability and robustness. Motivated by this, we introduce a novel ensemble regression tree based methodology, namely LOF-BRT+OR. The proposed methodology is an integrated solution approach with outlier removal, regression tree and ensemble learning. First, irregular data points are removed using local outlier factor (LOF), which measures the degree of being an outlier for each point. Next, a novel regression tree with LOF weighted node model is introduced. In the proposed node model, the weights of the points in the nodes are determined according to their surrounding neighborhood, as a function of LOF values and neighbor ranks. Finally, in order to increase the prediction performance, ensemble learning is adopted. In particular, bootstrap aggregation is used to generate multiple regression trees with LOF weighted node model. The experimental study shows that the proposed methodology yields the best root mean squared error (RMSE) values in five out of nine data sets. Also, the non-parametric tests demonstrate the statistical significance of the proposed approach over the benchmark methods. The proposed methodology can be applicable to various prediction problems.