Fast bitwise pattern-matching algorithm for DNA sequences on modern hardware


ÖZCAN G., Unsal O. S.

TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, cilt.23, sa.5, ss.1405-1417, 2015 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 23 Sayı: 5
  • Basım Tarihi: 2015
  • Doi Numarası: 10.3906/elk-1304-165
  • Dergi Adı: TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, TR DİZİN (ULAKBİM)
  • Sayfa Sayıları: ss.1405-1417
  • Anahtar Kelimeler: Computer architecture, bitwise string match, short DNA pattern, packed variables, 32-bit word
  • Bursa Uludağ Üniversitesi Adresli: Hayır

Özet

We introduce a fast bitwise exact pattern-matching algorithm, which speeds up short-length pattern searches on large-sized DNA databases. Our contributions are two-fold. First, we introduce a novel exact matching algorithm designed specifically for modern processor architectures. Second, we conduct a detailed comparative performance analysis of bitwise exact matching algorithms by utilizing hardware counters. Our algorithmic technique is based on condensed bitwise operators and multifunction variables, which minimize register spills and instruction counts during searches. In addition, the technique aims to efficiently utilize CPU branch predictors and to ensure smooth instruction flow through the processor pipeline. Analyzing letter occurrence probability estimations for DNA databases, we develop a skip mechanism to reduce memory accesses. For comparison, we exploit the complete Mus muse alas sequence, a commonly used DNA sequence that is larger than 2 GB. Compared to five state-of-the-art pattern-matching algorithms, experimental results show that our technique outperforms the best algorithm even for the worst-case DNA pattern for our technique.