Abstract
Exact string matching is a much studied and popular problem. The task is to find the occurrences of a string pattern in a long text. We introduce several new algorithms for online exact string matching and study their experimental performance and compare them with some earlier algorithms. Moreover, we give examples of anomalies caused by repetitive texts, and present algorithms for that kind of data. Specialized algorithms for long patterns (longer than about 50 characters) are also studied. In addition, we discuss the principles of fair testing of string matching algorithms. Among the designed algorithms there are novel ones and variations of previous algorithms. Most algorithms are based on bit-parallelism, which is one of the most effective techniques adopted increasingly during the last few years. For example classical exact string matching algorithms were significantly slower than the best ones of the new algorithms. Exact string matching is so common task that even a small improvement is useful in practice. It turned out that the uncomplicated algorithms are often the fastest in practice, especially for short patterns. Reducing the number of tests in algorithms often accelerates search. Surprisingly, the number of examined text characters is not always correlated with practical search speed among various exact string matching algorithms. Typically, there is clear dependence on the number of examined text characters and performance only among closely related algorithms. However, no exact string matching algorithm is the best one for all alphabets and pattern lengths on all kinds of computer hardware.
Translated title of the contribution | Nopeampaa merkkijononhakua |
---|---|
Original language | English |
Qualification | Doctor's degree |
Awarding Institution |
|
Supervisors/Advisors |
|
Publisher | |
Print ISBNs | 978-952-60-5156-7 |
Electronic ISBNs | 978-952-60-5157-4 |
Publication status | Published - 2013 |
MoE publication type | G5 Doctoral dissertation (article) |
Keywords
- exact string matching
- q-gram
- bit-parallelism
- BNDM
- Horspool's algorithm