TARA: An algorithm for fast searching patterns on text files of multiple


Külekci M. O.

22nd International Symposium on Computer and Information Sciences, Ankara, Turkey, 7 - 09 November 2007, pp.136-141 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Volume:
  • Doi Number: 10.1109/iscis.2007.4456850
  • City: Ankara
  • Country: Turkey
  • Page Numbers: pp.136-141

Abstract

This work introduces a new multi-pattern matching algorithm that performs searching of fixed-length strings on text files very fast by benefiting from bit-parallelism. The algorithm is given name tara. Bounded gaps as well as character classes in keywords are also supported. Although the worst case time complexity is quadratic, it performs very fast in practise. Experiments are conducted to compare the performance of the proposed algorithm with widely used GNU grep file search utility and also with 9 variants of Aho&Corasick and Comentz&Walter algorithms on natural language text. On the average tara is approximately 10% faster then grep, where up to 70% percent speed up is observed. The benchmark with the AC and CW variants results that the speed up obtained by tara is 3,5 times relative to its nearest successor.