Enhancing Audio Replay Attack Detection with Silence-Based Blind Channel Impulse Response Estimation


BEKİRYAZICI Ş., Hanilçi C., ÖZCAN SEMERCİ N.

27th International Conference on Speech and Computer, SPECOM 2025, Szeged, Macaristan, 13 - 15 Ekim 2025, cilt.16187 LNCS, ss.333-344, (Tam Metin Bildiri) identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Cilt numarası: 16187 LNCS
  • Doi Numarası: 10.1007/978-3-032-07956-5_24
  • Basıldığı Şehir: Szeged
  • Basıldığı Ülke: Macaristan
  • Sayfa Sayıları: ss.333-344
  • Anahtar Kelimeler: ASVspoof 2019, ASVspoof 2021, Replay attack detection, ResNet
  • Bursa Uludağ Üniversitesi Adresli: Evet

Özet

Replay attacks pose a major threat to automatic speaker verification (ASV) systems, considerably degrading performance. Since replayed utterances are captured and reproduced using external microphones and speakers, they inherently reflect these acoustic influences. Such acoustic distortions serve as valuable cues for differentiating between genuine and spoofed speech, provided they can be effectively extracted and modeled. In this context, blind channel impulse response estimation has been shown to be an effective approach in replay attack detection, as it enables the characterization of the acoustic path through which the signal has propagated without requiring explicit knowledge of the original source or environment. Furthermore, prior studies have highlighted the importance of silence segments in this task, noting that these regions, being free of speech content, primarily capture the characteristics of the transmission channel. As such, silence segments offer a unique and robust opportunity for extracting channel-related features that are less influenced by speaker variability and phonetic content, thereby improving the discriminability between bonafide and replayed signals. In this paper, we argue that channel impulse response estimates derived from silence parts contain more discriminative information than those obtained from the entire signal or voiced parts. To exploit this insight, we propose to use log-magnitude channel frequency response estimated from the silence parts for replay attack detection. Experiments on ASVspoof 2019 and 2021 datasets show that utilizing silence-based channel response features reduces the EER from 4.21% to 3.17% and from 29.16% to 24.43%, respectively, compared to using the entire signal.