DPATransLLM: Detection of Pronominal Anaphora in Turkish Sentences Using Transformer-Based, Large Language Models and Hybrid Ensemble Approach


Demir E., BİLGİN M.

APPLIED SCIENCES-BASEL, cilt.15, sa.23, 2025 (SCI-Expanded, Scopus) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 15 Sayı: 23
  • Basım Tarihi: 2025
  • Doi Numarası: 10.3390/app152312480
  • Dergi Adı: APPLIED SCIENCES-BASEL
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
  • Bursa Uludağ Üniversitesi Adresli: Evet

Özet

In the current information age, with the exponential growth of data volume and language-based applications, the accurate resolution of intra-contextual relationships in texts has become indispensable for both academic research and industrial Natural Language Processing (NLP) systems. This study focuses on the detection of pronominal anaphora in Turkish sentences. For the detection of pronominal anaphora, a specific dataset comprising 2000 sentences and 72,239 tokens was created, and this dataset was labeled using a BIO tagging method developed with a custom approach for this study. In this work, fine-tuning was performed on Transformer-based language models pre-trained on Turkish data, such as BERT and RoBERTa. Additionally, Large Language Models (LLMs) trained on Turkish data, including Turkcell-LLM-7b-v1 and ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1, as well as multilingual models like Microsoft's Phi-3 Mini-4K-Instruct and OpenAI's GPT-4o-mini, were also fine-tuned with the created dataset to detect pronominal anaphora in sentences. Following the training of the language models, the resulting performance was evaluated using pronoun accuracy, antecedent accuracy, exact match, and F1-score metrics. According to the results obtained in the pronominal anaphora detection phase of the study, a novel hybrid ensemble approach combining multiple Transformer models with linguistic rules achieved the highest performance. This hybrid system attained scores of 0.987 for pronoun accuracy, 0.977 for antecedent accuracy, 0.505 for exact match, and 0.960 for F1-score, surpassing all individual models, including GPT-4o-mini. These findings reveal the superiority of ensemble methods combined with Turkish-specific linguistic rules over standalone models in Turkish anaphora resolution. This study is considered novel, as it is the first work to apply hybrid ensemble methods with linguistic rule integration to this domain for the Turkish language.