Drug Sensitivity Prediction Using Machine Learning on Integrated COSMIC, DGIdb, and GDSC Data


Mergen B., Çoban M., Özkan Ş. S., Başaran Ö. F., Özcan G.

IEEE ACCESS, cilt.14, ss.17825-17841, 2026 (SCI-Expanded, Scopus)

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 14
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1109/access.2026.3659340
  • Dergi Adı: IEEE ACCESS
  • Derginin Tarandığı İndeksler: Scopus, Science Citation Index Expanded (SCI-EXPANDED), Compendex, INSPEC, Directory of Open Access Journals
  • Sayfa Sayıları: ss.17825-17841
  • Bursa Uludağ Üniversitesi Adresli: Evet

Özet

Analyzing the relationship between drug efficacy and sensitivity to mutational profiles is

necessary for the effective treatment of complex diseases such as cancer. Particularly, cancerous tissues

undergo constant change as a result of ongoing mutations, and the sensitivity of drugs to cancer may change

as a result of new mutations. For this purpose, this study aims to present a statistical analysis of drug–disease–

gene interactions. Furthermore, a general processing pipeline and machine learning models were developed

to predict the drug sensitivity of cancer cells according to genetic mutations. To achieve this, four well-known

open-source databases, including drug sensitivity data from cancer cell lines, two somatic mutation data

resources, and a gene-drug interaction database, were integrated to assess an enriched database. Next, various

preprocessing techniques, including text encoding, filtering, and optimization, were implemented to attain

an efficient new dataset for statistical analysis and machine learning. Statistical analyses were conducted

to investigate gene–drug interactions on the enriched database and to quantify their relative contributions to

drug sensitivity. On the other hand, developed machine learning models predict drug sensitivity from somatic

mutation or drug interaction datasets. The research also includes ablation studies and feature importance

to introduce a thorough analysis of gene and drug sensitivity. The developed pipeline not only yielded an

R2 of 0.91 in initial evaluations but also demonstrated robust generalizability by maintaining a 0.73 R2 score

in predicting AUC values across independent data sources. Overall, statistical analysis, machine learning

performances, and ablation studies offer a new perspective on drug sensitivity prediction