Code Chef Problem – Distinct Prime Factors (KPrime)

Whew. This problem took me the better part of a week to solve. The first few days I misunderstood the problem and went off on a bit of a rabbit trail. Even though I have consistently learned to not rush head long into solving a problem before properly understanding it, I keep repeating this mistake.

The Problem

The problem is in two parts:

1. Find the number of distinct primes for a number n. (For example, 10=2*5, so that is 2 distinct primes. 20=5*2*2, so again, 2 distinct primes)
2. For a range of numbers, for example, 1 to 100, how many numbers have k number of distinct primes?

An example:

If we consider the numbers 4 to 10, with k=2. The question is … 4,5,6,7,8,9,10 … out of those numbers, how many have 2 distinct prime factors?

The answer?

– 4=2*2, so it has 1 distinct prime factor
– 5=5*1, so it has 1 distinct prime factor
– 6=2*3, so it has 2 distinct prime factors
– 7=7*1, so it has 1 distinct prime factor
– 8=2*2*2, so it has 1 distinct prime factor
– 9=3*3, so it has 1 distinct prime factor
– 10=2*5, so it has 2 distinct prime factors

The answer must be 2 then. 6 and 10 have two distinct prime factors.

The constraints were that the range could go up to 100,000 and k could range from 1 to 5.

First Swings

Initially I thought of actually generating prime number combinations. For example, if k=3, I would generate [2,3,5], [3,5,7], [5,7,11], etc. The idea being that order doesn’t matter and numbers can’t be repeated. The problem was that this would only yield numbers that had uniquely distinct prime factors. In other words, [2,2,3,3,5,5,5] also have three distinct prime factors, so in that sense it is the same as [2,3,5].

I abandoned this approach after trying it out for way too long.

My Good Friend OEIS

After Googling around a bit I ended up on this page: OEIS A001221.

For algorithm competitions, OEIS (Online Encyclopedia of Integer Sequences) is invaluable.

Unfortunately there were no code samples for generating k-distinct primes on the OEIS page. I tried to read the math specifics on Wikipedia and various other math sites and found the function I was looking for was omega(n). Even though I knew the function, I couldn’t find any good implementations of it.

New Approach

I did actually see something on OEIS that gave me an idea. There was a MAPLE implementation that inspired me to do this:

1. Loop over all the primes under 100,000.
2. For each prime, go through the multiples of it within a limit. (The limit ended up being ceil(100000 / prime(i)) )
3. For each number in (2), we add 1 to an index in a vector. For example, if we’re looking at 7 for the prime, we would have:
a. nums[7*1] = nums[7*1] + 1
b. nums[7*2] = nums[7*2] + 1
c. nums[7*3] = nums[7*3] + 1
d. …

In this way I could create a vector that cached the distinct prime count for numbers 1 to 100,000. Then when a range was given to me, I could iterate over the vector and count how many times a given number appeared.

Temptation to Cheat

At one point in my search for an implementation – or at least good written explanation – of omega(n), I came across this page: http://stackoverflow.com/questions/17545888/number-of-distinct-prime-factors-of-a-number.

A guy was doing the same problem and gave up and asked Stack Overflow to finish it for him. Not cool.

The temptation was there to just give in after a week of work. I could just read the answers on SO and submit the problem and be done with it. However I hadn’t spent all that time just to quit and cheat.

First Submission

My first submission failed from the time constraint. A few weeks ago I made a subtle transition over to C++ for algorithm competitions because I had to use it for a problem on TalentBuddy. The language is still fairly new to me, so when my first submission failed I wasn’t surprised. I must have missed some major performance issues with my code.

The good news was that I didn’t get a “Wrong Answer” reply.

Micro Optimizations Work!

Every single time an algorithm has failed on CodeChef, micro optimizations have not worked. I’ll try minimizing variables, making loops shorter, etc. and it never works. There is always a major issue with the algorithm that only a major re-factor can fix.

For the first time, I was able to do micro optimizations and it worked.

I did some performance profiling and was shocked that my prime generation function only took 2ms to run. Moreover, the caching of k-distinct primes only took 23ms! 25ms out of 1000ms allowed was spent caching the data I needed.

That meant looping through the vector of k-distinct primes was too slow! This really surprised me because I figured that code was as good as it could get.

The big change was accessing a vector by index instead of using .at().

nums.at(i)

… became …

nums[i]

Just like that my vector caching of k-distinct primes went from 23ms to 10ms. My submission finally worked!

Conclusion

C++ is very powerful and micro optimizations add up. Over time I figure I’ll learn more and more of them. I’m excited to see how other people solved this problem.

Code: https://github.com/ryan1234/codechef/tree/master/jul-2013/kprime

Code Chef Problem – Distinct Prime Factors (KPrime)

CodeChef problem: PREDICT

The Problem

Each month CodeChef runs a coding competition in which I try to solve at least two problems. Usually the easiest two are simple and can be written and verified in a few hours each.

The problem that had me a bit stuck this month was one involving probability. I haven’t done much work with probability since college (over 10 years ago), so I had to brush up on some ideas.

The outline of the problem (with an example):

1. Two teams are playing each other, team A and team B.
2. The probability that team A wins is pX. (For example, 0.51, meaning a 51% chance that team A wins.)
3. The probability that team B wins is 1 – pX.
4. We will always bet 10,000 dollars between the two teams.
5. We can bet on both teams at once.
6. If a team wins, then the money made is calculated by (2 * (1 – pX)) * x, where “x” is the amount we bet. For example, we bet 6,000 on team A, who has a 51% chance of winning … if they win, we make (2 * (1 – 0.51)) * 6000, which is 5,880 dollars.
7. If a team loses, then we lose the amount we bet. If we bet 4,000 dollars on team B and they lose, we’ve only lost 4,000 dollars.

The problem is to find the optimal amount to bet on each team so that we maximize our money.

This means that if I bet 1,000 on team A and 9,000 on team B, I make X amount of money. If I change the amount for team A from 1,000 to 2,000 (and subsequently the team B amount from 9,000 to 8,000), then I’ll make Y amount of money. We just keep changing the amounts until we find one that we know is the highest.

Thought Process

Whenever I read the word “optimal” I assume the problem is going to be difficult. Specifically with this problem I thought it was going to mean I would come up with a function, f(x), where I would have to find a maximum – or a place where the slope was 0. It made me think the problem was harder than it was.

I tried to do some math based on the problem details (http://www.codechef.com/JUNE13/problems/PREDICT) and could not seem to generate the example shown.

This lead me to Google around for betting formulas which lead me to the Kelly Criterion – http://en.wikipedia.org/wiki/Kelly_criterion. Aha! Complex math and formulas! I’ve done CodeChef problems before where a formula was at the heart of a problem, so I assumed this must be my answer.

Long story short, the Kelly Criterion was a dead end. I could never make it work given my variables, so I assumed it didn’t match my problem space.

Progress…

The real break through came when I sat down and went slowly and thought through the problem. I had gone through the problem before, but this time I tried not to skip steps.

The example on the website had two inputs and one output. Eventually I worked through their example by hand so that I could re-produce their result. The next step was to generalize the formula. This is the result:

p = 0.51
x = 6000
total = 10000
result = ((total + ((2 * (1 - p)) * x) - (total - x)) * p) + ((total + (2 * (1 - (1 - p))) * (total - x) - (total - (total - x))) * (1 - p))

When I printed the result I got 10018, which matched perfectly.

I decided to use Python to solve this problem because it’s quick and easy to get going and several people in the comments on the site had mentioned that Python gave easy/accurate floating point accuracy which was important.

Surprising Results

Now that I had a formula that worked in one case, it was time to graph some general results. I just ran the formula through a loop and graphed the output. This is what I got:

chart2

It’s linear. I couldn’t believe it, so I picked some other probability for team A and ran my test again. I thought I’d see a curve somehow, but got this:

chart1

Not a curve. I suppose it shouldn’t surprise me since the formula earlier shows no possibility of being a curve.

Since it was linear, this meant that based on the percentage of winning, I would either bet 0 dollars or all 10,000 on team A. Then it is simply a matter of plugging that into the formula to find the optimal amount.

Conclusion

This took me a lot longer than it should. Each time I struggle with a problem like this I learn to take my time and patiently work through a problem. I also learn to not make assumptions and follow rabbit trails.

Solution: https://github.com/ryan1234/codechef/tree/master/jun-2013/predict

CodeChef problem: PREDICT