An Active Learning Approach Using Clustering-Based Initialization for Time Series Classification


Koyuncu F. S., İNKAYA T.

12th International Symposium on Intelligent Manufacturing and Service Systems, IMSS 2023, İstanbul, Turkey, 26 - 28 May 2023, pp.224-235 identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1007/978-981-99-6062-0_21
  • City: İstanbul
  • Country: Turkey
  • Page Numbers: pp.224-235
  • Keywords: Active Learning, Clustering, Initialization, Machine Learning, Time Series
  • Bursa Uludag University Affiliated: Yes

Abstract

The increase of digitalization has enhanced the collection of time series data using sensors in various production and service systems such as manufacturing, energy, transportation, and healthcare systems. To manage these systems efficiently and effectively, artificial intelligence techniques are widely used in making predictions and inferences from time series data. Artificial intelligence methods require a sufficient amount of labeled data in the learning process. However, most of the data in real-life systems are unlabeled, and the annotation task is costly or difficult. For this purpose, active learning can be used as a solution approach. Active learning is one of the machine learning methods, in which the model interacts with the environment and requests the labels of the informative samples. In this study, we introduce an active learning-based approach for the time series classification problem. In the proposed approach, the k-medoids clustering method is first used to determine the representative samples in the dataset, and these cluster representatives are labeled during the initialization of active learning. Then, the k-nearest-neighbor (KNN) algorithm is used for the classification task. For the query selection, uncertainty sampling is applied so that the samples having the least certain labels are prioritized. The performance of the proposed approach was evaluated using sensor data from the production and healthcare systems. In the experimental study, the impacts of the initialization techniques, number of queries, and neighborhood size were analyzed. The experimental studies showed the promising performance of the proposed approach compared to the competing approaches.