SPEAKER IDENTIFICATION FROM SHOUTED SPEECH: ANALYSIS AND COMPENSATION

Hanilci C., Kinnunen T., Saeidi R., Pohjalainen J., Alku P., ERTAŞ F.

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver, Kanada, 26 - 31 Mayıs 2013, ss.8027-8031

Yayın Türü: Bildiri / Tam Metin Bildiri
Cilt numarası:
Doi Numarası: 10.1109/icassp.2013.6639228
Basıldığı Şehir: Vancouver
Basıldığı Ülke: Kanada
Sayfa Sayıları: ss.8027-8031
Bursa Uludağ Üniversitesi Adresli: Evet

Özet

Text-independent speaker identification is studied using neutral and shouted speech in Finnish to analyze the effect of vocal mode mismatch between training and test utterances. Standard mel-frequency cepstral coefficient (MFCC) features with Gaussian mixture model (GMM) recognizer are used for speaker identification. The results indicate that speaker identification accuracy reduces from perfect (100 %) to 8.71 % under vocal mode mismatch. Because of this dramatic degradation in recognition accuracy, we propose to use a joint density GMM mapping technique for compensating the MFCC features. This mapping is trained on a disjoint emotional speech corpus to create a completely speaker- and speech mode independent emotion-neutralizing mapping. As a result of the compensation, the 8.71 % identification accuracy increases to 32.00 % without degrading the non-mismatched train-test conditions much.