I recently finished an algorithm problem that involved implementing the Rabin Karp search algorithm.

After reading several articles about it, I thought it would be fun to run some tests to prove how much faster it is than a naive search algorithm.

Unfortunately most of my tests show that it is actually far worse than two naive/brute force approaches that I used. This surprised me because on TalentBuddy my initial naive approaches were too slow to pass their tests. Only after implementing Rabin Karp could I get the tests to pass.

I’ve been anxious to start using R to visualize data, and this was a great opportunity to do so.

Benchmark details:

1. For each graph, the tests were run 10 times back to back and the graphs show the mean of each algorithm.

2. The random data was generated from an English dictionary.

3. 100,000 bytes were taken from the end of the randomly generated file and this was used as the search pattern. Then I concatenated the random file together four times to make the pattern appear four times.

4. It was a bit hard to find long common phrases in the Bible and Shakespeare.

5. The slow behavior of Rabin Karp seemed to take place in the “sliding” part of the algorithm. When it moves up one character to do a comparison … that loop was the slowest part.

Naive 1 = My best attempt at a naive search.

Naive 2 = http://www.geeksforgeeks.org/searching-for-patterns-set-1-naive-pattern-searching/

Rabin Karp = http://www.geeksforgeeks.org/searching-for-patterns-set-3-rabin-karp-algorithm/

I used this article to generate my random word file.

All my code, scripts, data, etc. can be found here: https://github.com/ryan1234/talentbuddy/tree/master/mapreduce/hashstring/analysis/hash-string