SPEAKER IDENTIFICATION FROM SHOUTED SPEECH: ANALYSIS AND COMPENSATION


Hanilci C., Kinnunen T., Saeidi R., Pohjalainen J., Alku P., ERTAŞ F.

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver, Kanada, 26 - 31 Mayıs 2013, ss.8027-8031 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Cilt numarası:
  • Doi Numarası: 10.1109/icassp.2013.6639228
  • Basıldığı Şehir: Vancouver
  • Basıldığı Ülke: Kanada
  • Sayfa Sayıları: ss.8027-8031
  • Bursa Uludağ Üniversitesi Adresli: Evet

Özet

Text-independent speaker identification is studied using neutral and shouted speech in Finnish to analyze the effect of vocal mode mismatch between training and test utterances. Standard mel-frequency cepstral coefficient (MFCC) features with Gaussian mixture model (GMM) recognizer are used for speaker identification. The results indicate that speaker identification accuracy reduces from perfect (100 %) to 8.71 % under vocal mode mismatch. Because of this dramatic degradation in recognition accuracy, we propose to use a joint density GMM mapping technique for compensating the MFCC features. This mapping is trained on a disjoint emotional speech corpus to create a completely speaker- and speech mode independent emotion-neutralizing mapping. As a result of the compensation, the 8.71 % identification accuracy increases to 32.00 % without degrading the non-mismatched train-test conditions much.