IEEE ACCESS, cilt.14, ss.17825-17841, 2026 (SCI-Expanded, Scopus)
Analyzing the relationship between drug efficacy and sensitivity to mutational profiles is
necessary for the effective treatment of complex diseases such as cancer. Particularly, cancerous tissues
undergo constant change as a result of ongoing mutations, and the sensitivity of drugs to cancer may change
as a result of new mutations. For this purpose, this study aims to present a statistical analysis of drug–disease–
gene interactions. Furthermore, a general processing pipeline and machine learning models were developed
to predict the drug sensitivity of cancer cells according to genetic mutations. To achieve this, four well-known
open-source databases, including drug sensitivity data from cancer cell lines, two somatic mutation data
resources, and a gene-drug interaction database, were integrated to assess an enriched database. Next, various
preprocessing techniques, including text encoding, filtering, and optimization, were implemented to attain
an efficient new dataset for statistical analysis and machine learning. Statistical analyses were conducted
to investigate gene–drug interactions on the enriched database and to quantify their relative contributions to
drug sensitivity. On the other hand, developed machine learning models predict drug sensitivity from somatic
mutation or drug interaction datasets. The research also includes ablation studies and feature importance
to introduce a thorough analysis of gene and drug sensitivity. The developed pipeline not only yielded an
R2 of 0.91 in initial evaluations but also demonstrated robust generalizability by maintaining a 0.73 R2 score
in predicting AUC values across independent data sources. Overall, statistical analysis, machine learning
performances, and ablation studies offer a new perspective on drug sensitivity prediction