PLOS ONE, cilt.20, sa.12 December, 2025 (SCI-Expanded, Scopus)
Environmental monitoring networks face critical data gaps that compromise public health protection and regulatory compliance, with missing data rates often exceeding 40% in operational settings. This study validates DynamicSeq2SeqXGB, a novel hybrid model that integrates a sequence-to-sequence encoder–decoder for temporal pattern extraction with an XGBoost regressor for robust gap reconstruction under extreme sparsity. Data from five monitoring stations in Pavlodar, Kazakhstan, collected over a 15-month period from May 23, 2024 to July 19, 2025, were analyzed representing severely compromised infrastructure (completeness rates 23.3–57.5%). The methodology employs adaptive context processing and implements hierarchical decomposition for extended outages. Two data preparation strategies were evaluated: selective compression applying quality thresholds versus full compression retaining all available observations. Benchmarking against classical methods using synthetic gaps of 5–72 hours demonstrated DynamicSeq2SeqXGB’s superiority in 96% of cases under full compression and 100% under selective compression (average 48.8% improvement for both strategies) with corresponding MAE values of 3.7–8.5 μg/m3 across the Pavlodar stations. Notably, full and selective compression showed equal overall effectiveness (50% win rate each), with optimal strategy depending on station-specific characteristics. External validation on the Beijing dataset (Guanyuan station, 2016) with controlled degradation confirmed cross-regional transferability, achieving MAE of 8.50 μg/m3 and coefficient of determination (R2) of 0.944 (68–79% improvement over baselines). The method successfully reconstructed PM2.5 time series even at 23.3% completeness, demonstrating robust performance for operational deployment in severely degraded monitoring networks.