Journal of Biostatistics and Epidemiology, vol.6, no.4, pp.275-281, 2020 (Scopus)
Objective: Our aim is to perform an analysis, using big data, of cases diagnosed with primary hypothyroidism and aged 18 and over who presented to our hospital, by evaluating the laboratory and socio-demographic data of the patients. Clustering analysis was performed in the big dataset for the purpose of structure-search study on the subject. Methods: According to ICD-10 diagnoses of hypothyroidism between 2005-2018 in our hospital 130159 patients aged 18 and over with E03 and E06 diagnosis codes were included in the study. Since drugs containing levothyroxine used in primary hypothyroidism treatment have an effect on the measured hormone levels, in our study, TSH, fT3 and fT4 laboratory values in the first diagnosis of cases who had not received any treatment as part of the diagnosis according to demographics were analysed. Patients with one or more missing laboratory values were excluded, and data of 2680 patients with complete data and TSH values above 4.94 mU/L were retained. Analysis was made with the k-means clustering technique, with the data separated into two sets. k-means clustering was performed by including age, TSH, fT3 and fT4 variables. Cliff’s Delta effect size coefficients and confidence intervals were calculated to perform size of the difference. Results: The higher prevalence of primary hypothyroidism in female and the peak in hypothyroidism at 4-5 decades in both genders were observed. In which ages were low, fT3 and fT4 values were higher, whereas TSH values were lower in male. In which ages were low, TSH values were higher, whereas fT4 values were lower in female. Conclusion: This study is the first big data analysis study carried out about primary hypothyroidism in our country. Despite the difficulties in implementation, it should not be forgotten that studies like these are important methods for enabling data to be created in our country.