A Clustering-Based Social Media Analysis Framework for Disaster Management: A Case Study of the 2023 Kahramanmaraş/Türkiye Earthquakes


Değirmen-Bektaş S., İNKAYA T., ÇAVDUR F.

Applied Sciences (Switzerland), cilt.16, sa.11, 2026 (SCI-Expanded, Scopus)

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 16 Sayı: 11
  • Basım Tarihi: 2026
  • Doi Numarası: 10.3390/app16115318
  • Dergi Adı: Applied Sciences (Switzerland)
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Applied Science & Technology Source, Compendex, INSPEC, Directory of Open Access Journals
  • Anahtar Kelimeler: demand prediction, disaster management, humanitarian crisis, multi-view learning, social media, unsupervised learning
  • Bursa Uludağ Üniversitesi Adresli: Evet

Özet

Social media posts by individuals affected by disasters and their relatives provide a significant source of data for identifying emergencies and needs, assessing the situation, and determining affected areas. These posts often contain not only text but also text embedded within images. Therefore, focusing solely on text data may compromise the integrity of the information and lead to incomplete or limited analyses. In this study, a topic modelling-based clustering approach is proposed that accounts for the complementary nature of text and image text in social media posts, as well as the limitations of manual annotation during disasters. In this context, data pre-processing was performed on text and text extracted from images. Text extracted from images via Optical Character Recognition (OCR) was corrected using the GPT-4.0-mini model. Then, both data types were clustered separately using BERTopic with k-means, and the resulting clusters were integrated. A dictionary-based analysis was conducted to identify humanitarian relief categories and locations within the clusters. The proposed framework was applied to the social media dataset related to the Kahramanmaraş earthquakes, one of the largest disasters in recent times. The findings show that text and image text data complement each other. The resulting clusters are meaningful, with average coherence scores of 0.710 for text and 0.687 for image text. LLM-based post-OCR correction also yielded a 62.81% reduction in average character error rate and a 56.91% decrease in average word error rate compared to the normalized ground truth image text. Furthermore, the proposed approach outperformed both keyword-based filtering with k-means and BERTopic with HDBSCAN. In summary, the results demonstrate that the proposed unsupervised learning approach is effective for extracting humanitarian needs and locations from social media in disaster response.