Fast bitwise pattern-matching algorithm for DNA sequences on modern hardware


ÖZCAN G., Unsal O. S.

TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, vol.23, no.5, pp.1405-1417, 2015 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 23 Issue: 5
  • Publication Date: 2015
  • Doi Number: 10.3906/elk-1304-165
  • Journal Name: TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, TR DİZİN (ULAKBİM)
  • Page Numbers: pp.1405-1417
  • Keywords: Computer architecture, bitwise string match, short DNA pattern, packed variables, 32-bit word
  • Bursa Uludag University Affiliated: No

Abstract

We introduce a fast bitwise exact pattern-matching algorithm, which speeds up short-length pattern searches on large-sized DNA databases. Our contributions are two-fold. First, we introduce a novel exact matching algorithm designed specifically for modern processor architectures. Second, we conduct a detailed comparative performance analysis of bitwise exact matching algorithms by utilizing hardware counters. Our algorithmic technique is based on condensed bitwise operators and multifunction variables, which minimize register spills and instruction counts during searches. In addition, the technique aims to efficiently utilize CPU branch predictors and to ensure smooth instruction flow through the processor pipeline. Analyzing letter occurrence probability estimations for DNA databases, we develop a skip mechanism to reduce memory accesses. For comparison, we exploit the complete Mus muse alas sequence, a commonly used DNA sequence that is larger than 2 GB. Compared to five state-of-the-art pattern-matching algorithms, experimental results show that our technique outperforms the best algorithm even for the worst-case DNA pattern for our technique.