Towards Faster String Matching

Hannu Peltola

    Research output: ThesisDoctoral ThesisCollection of Articles

    Abstract

    Exact string matching is a much studied and popular problem. The task is to find the occurrences of a string pattern in a long text. We introduce several new algorithms for online exact string matching and study their experimental performance and compare them with some earlier algorithms. Moreover, we give examples of anomalies caused by repetitive texts, and present algorithms for that kind of data. Specialized algorithms for long patterns (longer than about 50 characters) are also studied. In addition, we discuss the principles of fair testing of string matching algorithms. Among the designed algorithms there are novel ones and variations of previous algorithms. Most algorithms are based on bit-parallelism, which is one of the most effective techniques adopted increasingly during the last few years. For example classical exact string matching algorithms were significantly slower than the best ones of the new algorithms. Exact string matching is so common task that even a small improvement is useful in practice. It turned out that the uncomplicated algorithms are often the fastest in practice, especially for short patterns. Reducing the number of tests in algorithms often accelerates search. Surprisingly, the number of examined text characters is not always correlated with practical search speed among various exact string matching algorithms. Typically, there is clear dependence on the number of examined text characters and performance only among closely related algorithms. However, no exact string matching algorithm is the best one for all alphabets and pattern lengths on all kinds of computer hardware.
    Translated title of the contributionNopeampaa merkkijononhakua
    Original languageEnglish
    QualificationDoctor's degree
    Awarding Institution
    • Aalto University
    Supervisors/Advisors
    • Tarhio, Jorma, Supervising Professor
    • Tarhio, Jorma, Thesis Advisor
    Publisher
    Print ISBNs978-952-60-5156-7
    Electronic ISBNs978-952-60-5157-4
    Publication statusPublished - 2013
    MoE publication typeG5 Doctoral dissertation (article)

    Keywords

    • exact string matching
    • q-gram
    • bit-parallelism
    • BNDM
    • Horspool's algorithm

    Fingerprint

    Dive into the research topics of 'Towards Faster String Matching'. Together they form a unique fingerprint.

    Cite this