Project Euler!

Well it appears I am quite late to the party. The other day while trying to solve a CodeChef problem I stumbled across Project Euler. It was right up my alley as I’ve been trying to get into sites that help me practice algorithms and math.

Briefly some notes after my first 37 problems:

1. Project Euler encourages me to write really bad code. Since there are no time or memory constraints (just give us the right answer), I end up writing the worst possible code to brute force answers.
2. I love Python. What else is there to say? There is a reason Python is one of the most popular languages for solving Project Euler problems. String parsing, math functions, the terseness of the language, the ability to easily handle large numbers … it’s just fantastic.
3. I am about done with C++ for problems. The syntax and constant Googling of how to use various data structures became too annoying for me. Plus the extra few steps of having to compile and then run … it’s not that much, but it adds up. Again I love just running Python code without having to compile first.
4. It’s addicting. The site was made for someone with my personality. Goals, levels, competition, etc.

I read a blog the other day of a guy talking about how Project Euler was boring. He had solved about 150 problems (he even contributed a problem!) and then got tired of the site. Something along those lines.

Well I plan to beat that guy. We’ll see how it goes. =)

Project Euler!

Code Chef Problem – Distinct Prime Factors (KPrime)

Whew. This problem took me the better part of a week to solve. The first few days I misunderstood the problem and went off on a bit of a rabbit trail. Even though I have consistently learned to not rush head long into solving a problem before properly understanding it, I keep repeating this mistake.

The Problem

The problem is in two parts:

1. Find the number of distinct primes for a number n. (For example, 10=2*5, so that is 2 distinct primes. 20=5*2*2, so again, 2 distinct primes)
2. For a range of numbers, for example, 1 to 100, how many numbers have k number of distinct primes?

An example:

If we consider the numbers 4 to 10, with k=2. The question is … 4,5,6,7,8,9,10 … out of those numbers, how many have 2 distinct prime factors?

The answer?

– 4=2*2, so it has 1 distinct prime factor
– 5=5*1, so it has 1 distinct prime factor
– 6=2*3, so it has 2 distinct prime factors
– 7=7*1, so it has 1 distinct prime factor
– 8=2*2*2, so it has 1 distinct prime factor
– 9=3*3, so it has 1 distinct prime factor
– 10=2*5, so it has 2 distinct prime factors

The answer must be 2 then. 6 and 10 have two distinct prime factors.

The constraints were that the range could go up to 100,000 and k could range from 1 to 5.

First Swings

Initially I thought of actually generating prime number combinations. For example, if k=3, I would generate [2,3,5], [3,5,7], [5,7,11], etc. The idea being that order doesn’t matter and numbers can’t be repeated. The problem was that this would only yield numbers that had uniquely distinct prime factors. In other words, [2,2,3,3,5,5,5] also have three distinct prime factors, so in that sense it is the same as [2,3,5].

I abandoned this approach after trying it out for way too long.

My Good Friend OEIS

After Googling around a bit I ended up on this page: OEIS A001221.

For algorithm competitions, OEIS (Online Encyclopedia of Integer Sequences) is invaluable.

Unfortunately there were no code samples for generating k-distinct primes on the OEIS page. I tried to read the math specifics on Wikipedia and various other math sites and found the function I was looking for was omega(n). Even though I knew the function, I couldn’t find any good implementations of it.

New Approach

I did actually see something on OEIS that gave me an idea. There was a MAPLE implementation that inspired me to do this:

1. Loop over all the primes under 100,000.
2. For each prime, go through the multiples of it within a limit. (The limit ended up being ceil(100000 / prime(i)) )
3. For each number in (2), we add 1 to an index in a vector. For example, if we’re looking at 7 for the prime, we would have:
a. nums[7*1] = nums[7*1] + 1
b. nums[7*2] = nums[7*2] + 1
c. nums[7*3] = nums[7*3] + 1
d. …

In this way I could create a vector that cached the distinct prime count for numbers 1 to 100,000. Then when a range was given to me, I could iterate over the vector and count how many times a given number appeared.

Temptation to Cheat

At one point in my search for an implementation – or at least good written explanation – of omega(n), I came across this page: http://stackoverflow.com/questions/17545888/number-of-distinct-prime-factors-of-a-number.

A guy was doing the same problem and gave up and asked Stack Overflow to finish it for him. Not cool.

The temptation was there to just give in after a week of work. I could just read the answers on SO and submit the problem and be done with it. However I hadn’t spent all that time just to quit and cheat.

First Submission

My first submission failed from the time constraint. A few weeks ago I made a subtle transition over to C++ for algorithm competitions because I had to use it for a problem on TalentBuddy. The language is still fairly new to me, so when my first submission failed I wasn’t surprised. I must have missed some major performance issues with my code.

The good news was that I didn’t get a “Wrong Answer” reply.

Micro Optimizations Work!

Every single time an algorithm has failed on CodeChef, micro optimizations have not worked. I’ll try minimizing variables, making loops shorter, etc. and it never works. There is always a major issue with the algorithm that only a major re-factor can fix.

For the first time, I was able to do micro optimizations and it worked.

I did some performance profiling and was shocked that my prime generation function only took 2ms to run. Moreover, the caching of k-distinct primes only took 23ms! 25ms out of 1000ms allowed was spent caching the data I needed.

That meant looping through the vector of k-distinct primes was too slow! This really surprised me because I figured that code was as good as it could get.

The big change was accessing a vector by index instead of using .at().

nums.at(i)

… became …

nums[i]

Just like that my vector caching of k-distinct primes went from 23ms to 10ms. My submission finally worked!

Conclusion

C++ is very powerful and micro optimizations add up. Over time I figure I’ll learn more and more of them. I’m excited to see how other people solved this problem.

Code: https://github.com/ryan1234/codechef/tree/master/jul-2013/kprime

Code Chef Problem – Distinct Prime Factors (KPrime)

String Searching Analysis – Rabin Karp versus Naive Search

I recently finished an algorithm problem that involved implementing the Rabin Karp search algorithm.

After reading several articles about it, I thought it would be fun to run some tests to prove how much faster it is than a naive search algorithm.

Unfortunately most of my tests show that it is actually far worse than two naive/brute force approaches that I used. This surprised me because on TalentBuddy my initial naive approaches were too slow to pass their tests. Only after implementing Rabin Karp could I get the tests to pass.

I’ve been anxious to start using R to visualize data, and this was a great opportunity to do so.

Benchmark details:

1. For each graph, the tests were run 10 times back to back and the graphs show the mean of each algorithm.
2. The random data was generated from an English dictionary.
3. 100,000 bytes were taken from the end of the randomly generated file and this was used as the search pattern. Then I concatenated the random file together four times to make the pattern appear four times.
4. It was a bit hard to find long common phrases in the Bible and Shakespeare.
5. The slow behavior of Rabin Karp seemed to take place in the “sliding” part of the algorithm. When it moves up one character to do a comparison … that loop was the slowest part.

Naive 1 = My best attempt at a naive search.
Naive 2 = http://www.geeksforgeeks.org/searching-for-patterns-set-1-naive-pattern-searching/
Rabin Karp = http://www.geeksforgeeks.org/searching-for-patterns-set-3-rabin-karp-algorithm/

results4

results3

results2

results1

I used this article to generate my random word file.

All my code, scripts, data, etc. can be found here: https://github.com/ryan1234/talentbuddy/tree/master/mapreduce/hashstring/analysis/hash-string

String Searching Analysis – Rabin Karp versus Naive Search

CodeChef problem: PREDICT

The Problem

Each month CodeChef runs a coding competition in which I try to solve at least two problems. Usually the easiest two are simple and can be written and verified in a few hours each.

The problem that had me a bit stuck this month was one involving probability. I haven’t done much work with probability since college (over 10 years ago), so I had to brush up on some ideas.

The outline of the problem (with an example):

1. Two teams are playing each other, team A and team B.
2. The probability that team A wins is pX. (For example, 0.51, meaning a 51% chance that team A wins.)
3. The probability that team B wins is 1 – pX.
4. We will always bet 10,000 dollars between the two teams.
5. We can bet on both teams at once.
6. If a team wins, then the money made is calculated by (2 * (1 – pX)) * x, where “x” is the amount we bet. For example, we bet 6,000 on team A, who has a 51% chance of winning … if they win, we make (2 * (1 – 0.51)) * 6000, which is 5,880 dollars.
7. If a team loses, then we lose the amount we bet. If we bet 4,000 dollars on team B and they lose, we’ve only lost 4,000 dollars.

The problem is to find the optimal amount to bet on each team so that we maximize our money.

This means that if I bet 1,000 on team A and 9,000 on team B, I make X amount of money. If I change the amount for team A from 1,000 to 2,000 (and subsequently the team B amount from 9,000 to 8,000), then I’ll make Y amount of money. We just keep changing the amounts until we find one that we know is the highest.

Thought Process

Whenever I read the word “optimal” I assume the problem is going to be difficult. Specifically with this problem I thought it was going to mean I would come up with a function, f(x), where I would have to find a maximum – or a place where the slope was 0. It made me think the problem was harder than it was.

I tried to do some math based on the problem details (http://www.codechef.com/JUNE13/problems/PREDICT) and could not seem to generate the example shown.

This lead me to Google around for betting formulas which lead me to the Kelly Criterion – http://en.wikipedia.org/wiki/Kelly_criterion. Aha! Complex math and formulas! I’ve done CodeChef problems before where a formula was at the heart of a problem, so I assumed this must be my answer.

Long story short, the Kelly Criterion was a dead end. I could never make it work given my variables, so I assumed it didn’t match my problem space.

Progress…

The real break through came when I sat down and went slowly and thought through the problem. I had gone through the problem before, but this time I tried not to skip steps.

The example on the website had two inputs and one output. Eventually I worked through their example by hand so that I could re-produce their result. The next step was to generalize the formula. This is the result:

p = 0.51
x = 6000
total = 10000
result = ((total + ((2 * (1 - p)) * x) - (total - x)) * p) + ((total + (2 * (1 - (1 - p))) * (total - x) - (total - (total - x))) * (1 - p))

When I printed the result I got 10018, which matched perfectly.

I decided to use Python to solve this problem because it’s quick and easy to get going and several people in the comments on the site had mentioned that Python gave easy/accurate floating point accuracy which was important.

Surprising Results

Now that I had a formula that worked in one case, it was time to graph some general results. I just ran the formula through a loop and graphed the output. This is what I got:

chart2

It’s linear. I couldn’t believe it, so I picked some other probability for team A and ran my test again. I thought I’d see a curve somehow, but got this:

chart1

Not a curve. I suppose it shouldn’t surprise me since the formula earlier shows no possibility of being a curve.

Since it was linear, this meant that based on the percentage of winning, I would either bet 0 dollars or all 10,000 on team A. Then it is simply a matter of plugging that into the formula to find the optimal amount.

Conclusion

This took me a lot longer than it should. Each time I struggle with a problem like this I learn to take my time and patiently work through a problem. I also learn to not make assumptions and follow rabbit trails.

Solution: https://github.com/ryan1234/codechef/tree/master/jun-2013/predict

CodeChef problem: PREDICT

Algorithm Practice

I’ve been trying to stay current on algorithms, math and harder computer science concepts for about a year now.

To do this I’ve come across the following sites/resources:

1. TopCoder
2. Code Chef
3. Career Cup
4. Talent Buddy

Quick thoughts on each:

1. Top Coder

This site sucks. Maybe if you spend quite a bit of time to learn it and wade through their horrific UI, it’s great fun. But from my experience it’s impossible to understand. It feels like a website from 1998 and it does everything it can to make you hate it.

Maybe one day I’ll figure it out, but then again, I doubt it. There are too many other good sites to improve you algorithm skills.

2. Code Chef

The UI is pretty good and they run regular contests with problems ranging from easy to very difficult. It’s a fun format and the only negative I can find is that for a while it was hard to understand just how they’d pipe input into my programs and the approach for producing my answers.

Once you get one or two programs running it is great fun.

They run a contest every month for a week so it gives me plenty of time to finish a few problems in a few hours. It’s been a great experience for me and up until recently, it was my favorite site for coding practice.

3. Career Cup

This site is run by the lady that wrote “Cracking the Coding Interview”. The book itself has tons of great problems and is very focused on computer science basics. The advice and the problems are priceless. I’ve worked through the first 5-6 chapters and completed each homework problem. Some of them are quite challenging and it’s fun to see how close you can come to the right answers.

Looking forward to finishing the rest of the chapters.

4. Talent Buddy

I found this site on Facebook and quickly got addicted.

The first 4-5 problems are very easy and it gets you hooked. I worked pretty fast until I got to a GPS themed problem that involved some non-trivial math. After Googling around about line segments and closest paths etc. I was able to break through.

I’ve started to realize this site has some pretty serious problems and it represents conceptually some new areas that Code Chef and Career Cup don’t cover.

Lots of geometry and math, which is something I need to get better at.

Algorithm Practice

All About Mod 1000000007

I recently finished a problem on Codechef that had a statement like this in it:

Take the result and mod the result by 1000000007.

Why do we do this?

In Java, an integer’s maximum value is 2^31-1, or 2,147,483,647. That is obviously bigger than 1,000,000,007.

In Java, a long’s maximum value is 9,223,372,036,854,775,807!

If we look at the 88th Fibonacci number, we will overflow a long with a value of 1,100,087,778,366,101,931.

If we have an algorithm problem that suggests we mod the result, that means we can mod the numbers as we operate on them.

Addition

Let’s consider a simple example with a maximum value of 10. We don’t want to overflow the number 10, so we need a prime number a little less than 10, so let’s pick 7.

We might try to do something like this:

(1 % 7) + (2 % 7) = 3.

This seems to be the same as if we had done:

1 + 2 = (3 % 7).

Seems logical. If we add the two numbers and mod the result, it’s the same as if we had modded each number separately and then added those results.

This was my approach to a recent algorithm problem and I eventually realized that this could happen:

(5 % 7) + (6 % 7) = 11.

Not quite the same as:

5 + 6 = (11 % 7) – which is of course not true.

The reason is that at some point the sum should exceed the number we are modding by and so the sum of two numbers should be modded, not the individual numbers.

The number 1000000007 is special in the regard that any sum that is modded by 1000000007 will stay below the maximum value of an integer.

You can calculate Fibonacci numbers and always mod the result by 1000000007 and you will never overflow an integer.

All About Mod 1000000007