CodeChef problem: PREDICT

The Problem

Each month CodeChef runs a coding competition in which I try to solve at least two problems. Usually the easiest two are simple and can be written and verified in a few hours each.

The problem that had me a bit stuck this month was one involving probability. I haven’t done much work with probability since college (over 10 years ago), so I had to brush up on some ideas.

The outline of the problem (with an example):

1. Two teams are playing each other, team A and team B.
2. The probability that team A wins is pX. (For example, 0.51, meaning a 51% chance that team A wins.)
3. The probability that team B wins is 1 – pX.
4. We will always bet 10,000 dollars between the two teams.
5. We can bet on both teams at once.
6. If a team wins, then the money made is calculated by (2 * (1 – pX)) * x, where “x” is the amount we bet. For example, we bet 6,000 on team A, who has a 51% chance of winning … if they win, we make (2 * (1 – 0.51)) * 6000, which is 5,880 dollars.
7. If a team loses, then we lose the amount we bet. If we bet 4,000 dollars on team B and they lose, we’ve only lost 4,000 dollars.

The problem is to find the optimal amount to bet on each team so that we maximize our money.

This means that if I bet 1,000 on team A and 9,000 on team B, I make X amount of money. If I change the amount for team A from 1,000 to 2,000 (and subsequently the team B amount from 9,000 to 8,000), then I’ll make Y amount of money. We just keep changing the amounts until we find one that we know is the highest.

Thought Process

Whenever I read the word “optimal” I assume the problem is going to be difficult. Specifically with this problem I thought it was going to mean I would come up with a function, f(x), where I would have to find a maximum – or a place where the slope was 0. It made me think the problem was harder than it was.

I tried to do some math based on the problem details (http://www.codechef.com/JUNE13/problems/PREDICT) and could not seem to generate the example shown.

This lead me to Google around for betting formulas which lead me to the Kelly Criterion – http://en.wikipedia.org/wiki/Kelly_criterion. Aha! Complex math and formulas! I’ve done CodeChef problems before where a formula was at the heart of a problem, so I assumed this must be my answer.

Long story short, the Kelly Criterion was a dead end. I could never make it work given my variables, so I assumed it didn’t match my problem space.

Progress…

The real break through came when I sat down and went slowly and thought through the problem. I had gone through the problem before, but this time I tried not to skip steps.

The example on the website had two inputs and one output. Eventually I worked through their example by hand so that I could re-produce their result. The next step was to generalize the formula. This is the result:

p = 0.51
x = 6000
total = 10000
result = ((total + ((2 * (1 - p)) * x) - (total - x)) * p) + ((total + (2 * (1 - (1 - p))) * (total - x) - (total - (total - x))) * (1 - p))

When I printed the result I got 10018, which matched perfectly.

I decided to use Python to solve this problem because it’s quick and easy to get going and several people in the comments on the site had mentioned that Python gave easy/accurate floating point accuracy which was important.

Surprising Results

Now that I had a formula that worked in one case, it was time to graph some general results. I just ran the formula through a loop and graphed the output. This is what I got:

chart2

It’s linear. I couldn’t believe it, so I picked some other probability for team A and ran my test again. I thought I’d see a curve somehow, but got this:

chart1

Not a curve. I suppose it shouldn’t surprise me since the formula earlier shows no possibility of being a curve.

Since it was linear, this meant that based on the percentage of winning, I would either bet 0 dollars or all 10,000 on team A. Then it is simply a matter of plugging that into the formula to find the optimal amount.

Conclusion

This took me a lot longer than it should. Each time I struggle with a problem like this I learn to take my time and patiently work through a problem. I also learn to not make assumptions and follow rabbit trails.

Solution: https://github.com/ryan1234/codechef/tree/master/jun-2013/predict

CodeChef problem: PREDICT

Dead Ends and Wasted Time

It’s easy to write about success, but not so easy to write about failure. Not just failure either, but unresolved failure.

Some quick specifics:

1. At work I tried to introduce a new library to a .NET project. Specifically the Thinktecture.IdenityModel.dll library.
2. In its assembly manifest it wants version 4.5.0.0 of the Newtonsoft.Json.dll.
3. My project references lots of other projects with really, really, really old versions of Newtonsoft.Json.dll.
4. When old meets new, I get errors that .NET can’t find the 4.5.0.0 version of Newtonsoft. The older version has won and is used across the board.

Unfortunately I lost an entire day trying to resolve this issue. I’m pretty sure some engineer out there would have a clean and clear solution for it, but I couldn’t find that guy online. =)

Things I tried:

1. Merging all DLLs Thinktecture needs into one mega DLL with ilmerge.exe. Didn’t work, but man ilmerge.exe was fun to work with. Good to know it exists.
2. Upgrading the references to the old Newtonsoft.Json.dll. Didn’t work because there are breaking changes between old and new (~2.0 and 4.5).
3. Explicitly referencing the new version in the web.config for my project. I found all sorts of ways to do this online and none of them worked.
4. Turning on assembly loading tracing. This was also really fun and interesting. Super cool to see how .NET is loading assemblies. Ultimately no help though.

When I’m stuck in a maze I tend to only go back 2-3 steps and then try a new approach. After burning through an entire day though, it’s time to take 10 steps back and try a different path.

Dead Ends and Wasted Time

Windows Azure SDK 2.0 Troubleshooting

Flashbacks

Several years ago I got the chance to work with Microsoft’s AppFabric for caching. I had a heck of a time getting it running well because when it failed, it did so silently. No stack traces, no log files, no messages in the console … just null objects and non-working software.

Recently I ran into some trouble with the Windows Azure SDK and it felt all too familiar.

The Start

Like the AppFabric experience, everything started well. I added a new worker role to an existing cloud service project and it ran fine. Data went from point A to point B, I checked in my code and forgot about it.

Later I heard groans coming from another developer at work. Turns out I had upgraded the Azure SDK from version 1.8 to 2.0 and it was forcing everyone else to download the new SDK so the code would build.

After spending a few minutes upgrading we made sure the build was good and moved on to other things.

Refactoring

Several weeks later the decision was made to move my worker role project to a different cloud service. I created the new project, added the worker role, checked in and since everything built fine I figured I was safe.

I didn’t need to test it again because I hadn’t made any code changes. As long as it built, it should run the same as it had before.

Confusion

A new feature came across my desk regarding the worker role, so it was time to fire it up and do some work.

The first thing I did was to run the project to refresh my memory as to how it worked.

Failure. The Windows Azure Emulator would start and then immediately stop. Huh?

Eventually I was able to scrape out the contents of the emulator’s console and I could see that the worker role was indeed starting, but right after starting, it would die. No descriptive errors, no activity in Visual Studio, nothing. Silence.

Stabs in the Dark

A short list of what I tried:

1. Refreshing the DLLs in the worker role project. Version 1.8? Try 2.0. Version 2.0? Maybe 1.8 again?
2. Removing all code related to dependencies. Maybe our TraceLogger class was failing in some odd way?
3. Tracing. I tried to trace the SDK and my code, but couldn’t get it working.
4. Breakpoints. I would set them all over but none of them were getting hit.
5. Event viewer. The old standby. Empty. No messages, no errors.
6. Lots and lots of Googling. I read a lot of articles about flipping this bit or that flag, but nothing worked.
7. Running another worker project in the same codebase. That worked fine, but I couldn’t see why.
8. Doing a diff between the .csproj file of a working worker role and my failing one.
9. Doing a diff between a working cloud service project and my broken one.

Solution/Conclusion

The prior steps all took place in a few hours towards the end of a work day. The next morning I came in and tried this:

1. Slowly and methodically start commenting code out. I removed one method at a time – or stubbed it out – and re-ran the worker role locally.
2. I went through all the code in this manner. The class, the base class it inherited from and all properties/fields that were in both.
3. Changed the custom base class (a wrapper around RoleEntryPoint) to RoleEntryPoint. Ah ha! That worked!

The problem ended up being that we had several projects “Common”, “Utility” that had older version of the Azure SDK. They were all using 1.8 and my worker role was trying to use 2.0.

Just inheriting from a class in one of those other projects caused the failure.

Once again though I wish there was some output somewhere that I could have easily latched on to that would have pointed me in the right direction.

Windows Azure SDK 2.0 Troubleshooting

Writing Javascript Widgets – Trial and Error

The Old

Today I’m going to look at two Javascript widgets that I’ve written. One is from 2011 from my job at PropertyRoom.com and the other is for my current job.

The widget we’ll look at for PropertyRoom.com displays paged products on the home page (http://www.propertyroom.com). It simply scrolls left and right through a list of products.

To start I want to look at function names. The function content is not as important as the actual function names.

var PANEL_PAGE_SIZE = 5;
var SLIDE_DIRECTION = 'none';
var USER_CAN_CLICK = true;

function LoadPanelPage(panel, pageNumber) { ... }
function SetPanelArrowEvents(panel, pageNumber, listingCount) { ... }
function LoadListingsAJAX(panel, categoryId, panelId, pageNumber) { ... }
function SlidePanelPage(panel, pageNumber) { ... }
function EnablePanelClicks() { ... }
function SetPanelPagingIndicators(panel, pageNumber, listingCount) { ... }

Some things to notice from these global variables and functions:

  • I am using global variables.
  • One word is all over, namely panel. This is my lazy way of making sure functions don’t collide.
  • State is passed all over. Several functions need at least two parameters to work.

These examples reflect how I’ve written Javascript widgets in the past. Long function names instead of namespaces. A few choice global variables to track state. A hierarchy of function calls where needed data is simply passed from function to function.

The New

I’ve been wanting to develop a better approach to Javascript code for a long time. I keep coming across articles and libraries where the code is pretty elegant. It looks object oriented, it’s organized well and it looks nothing like what I’ve usually written in the past.

The latest widget I’ve written went through this process:

1. Start simple to get a prototype working.
2. Quickly add features, which means I fall into old habits. Functions that act as namespaces, global variables, lots of state, etc.
3. My first refactor was to convert the code into a proper jQuery plugin. This was surprisingly easy.
4. My second refactor was to rename all functions to avoid collisions with other jQuery plugins. Especially important since our company will have many more widgets in the future.
5. My third refactor was to create objects to group functions. This was a fanstastic feeling! It went from this:

function LoadSegments() { ... }
function DrawSegments() { ... }
function BindSegments() { ... }
function LoadSubscriberSegments() { ... }
function BindSubscriberSegments() { ... }

… to this …

var Segment = { 
	load : function() { ... },
	draw : function() { ... },
	bind : function() { ... }	
}

var SubScriberSegment = {
	load : function() { ... },
	bind : function() { ... }
}

This was a revelation to me. If I want to make a call to load segments I can just do this:

Segment.load();

Notepad++ works really well with Javascript when objects are written this way. It’s easy to collapse entire objects and it makes navigating around the code easy.

6. My last refactor was to include a debug mode into my plugin/widget. Really simple, but something I’ve lacked in the past. I write to the Console (if it exists) and it allows me to trace the flow of the code and more importantly, see the data going in and out.

Conclusion

Making the transition to a more object oriented way of writing Javascript is exciting, but I know I have a long way to go.

Up next is to investigate whether or not Dart can make widget writing easier.

Writing Javascript Widgets – Trial and Error

Algorithm Practice

I’ve been trying to stay current on algorithms, math and harder computer science concepts for about a year now.

To do this I’ve come across the following sites/resources:

1. TopCoder
2. Code Chef
3. Career Cup
4. Talent Buddy

Quick thoughts on each:

1. Top Coder

This site sucks. Maybe if you spend quite a bit of time to learn it and wade through their horrific UI, it’s great fun. But from my experience it’s impossible to understand. It feels like a website from 1998 and it does everything it can to make you hate it.

Maybe one day I’ll figure it out, but then again, I doubt it. There are too many other good sites to improve you algorithm skills.

2. Code Chef

The UI is pretty good and they run regular contests with problems ranging from easy to very difficult. It’s a fun format and the only negative I can find is that for a while it was hard to understand just how they’d pipe input into my programs and the approach for producing my answers.

Once you get one or two programs running it is great fun.

They run a contest every month for a week so it gives me plenty of time to finish a few problems in a few hours. It’s been a great experience for me and up until recently, it was my favorite site for coding practice.

3. Career Cup

This site is run by the lady that wrote “Cracking the Coding Interview”. The book itself has tons of great problems and is very focused on computer science basics. The advice and the problems are priceless. I’ve worked through the first 5-6 chapters and completed each homework problem. Some of them are quite challenging and it’s fun to see how close you can come to the right answers.

Looking forward to finishing the rest of the chapters.

4. Talent Buddy

I found this site on Facebook and quickly got addicted.

The first 4-5 problems are very easy and it gets you hooked. I worked pretty fast until I got to a GPS themed problem that involved some non-trivial math. After Googling around about line segments and closest paths etc. I was able to break through.

I’ve started to realize this site has some pretty serious problems and it represents conceptually some new areas that Code Chef and Career Cup don’t cover.

Lots of geometry and math, which is something I need to get better at.

Algorithm Practice

Classic Development Mistakes

Throughout my career I’ve been involved in many re-write projects. I seem to consistently find myself in the situation where a company has a huge codebase that is 5-10 years old and they have decided to re-write it. The technologies are too old, the codebase is too messy and it’s too hard to introduce new features.

At my current position I was given the responsibility of keeping legacy codebase alive a bit longer by adding a few new features.

When I started going through the code I noticed quite a few patterns I’ve seen over the years. They all have the same motivation however.

1. Massive methods

Lots of developers seem to love to write very long methods. I saw one method recently that was 700 lines long! Each logical part of the method was separated by long comment blocks. Things like this:

/* ================================ SHIPPING PROCESSING ============================== */

I have to admit I have commited this exact sin many times early in my career. Why are there comments in the method? Why not just write a new method?

2. Enumerations everywhere

Legacy codebases always seem to have a lot of enumerations. In most cases I’ve seen the enumerations simply serve as a fancy way to replace strings so that code can branch. We want to carry around the enumeration so we can use it in an “if” statement or a “case” statement.

I see things like this quite often:

	if (StatusCode.Error && Request["..."] == "ABC") { ... }
	else {
		if (StatusCode.SemiOk) { ... }
		if (StatusCode.Ok && counted > 0) { ... }
	}

When used in a complex manner (like the one above), it becomes very tricky to debug. The developer that wrote this code just wants a “cleaner” way to reason about his objects and data. It’s easier to read in the IDE and “feels” better since there is intellisense.

3. Mega Classes

Usually when I open a legacy codebase I find a few very large classes. They are on average somewhere between 1,000 and 3,000 lines of code. They’ll have 30-60 properties. You can easily see that over time things were just added on as bug fixes and new features were introduced.

Usual suspects:

1. Some methods are static, some are not.
2. Some methods mutate other objects/classes.
3. Lots of checking of properties to see if they are null or valid. Often times this validation is repeated all over.

Conclusion

what do all these things have in common?

I believe the motivation behind these anti-patterns is organization. The developer wants to organize and structure their code and does the best they can with:

1. Comment blocks in big methods.
2. Enumerations that feel like they are categorizing things with intellisense.
3. Putting lots of code in the same place with mega classes. I’ve noticed the more object oriented a project, the more the classes grow in number, not size.

Classic Development Mistakes

App Fabric and Mongo – Separated at Birth

It occured to me the other day that the data modeling I have done in the past for AppFabric and the data modeling for Mongo were shockingly similar.

AppFabric Modeling

At PropertyRoom.com we used App Fabric to cache auction listings, prices, categories for listings, etc. We only had a few SQL Server machines on the back end and they were easily overwhelmed by the day to day traffic. App Fabric’s job was to alleviate the pain the databases felt.

The initial solution was to cache the results of all function calls in App Fabric. For example this call:

public List GetCategoryListings(int categoryId) { ... }

This would cache all auction listings for a given category, including the price and how much time was left in the auction.

I quickly found auction listings being sprayed all over the cache with additional calls like these:

public Listing GetListing(int listingId) { ... }
public List GetWatchList(int userId) { ... }
public List GetActionListings(List listingIds) { ... }

Each call held some information about the auction listing (like the listing title) and when a listing got updated I came across the problem of having to pro-actively invalidate the cache.

In other words, if listing 123 changed from “New Gold Watch” to “New Rolodex Gold Watch”, I would have to examine the cache to update everything that was suddenly old. I came up with a naive approach that tagged objects in the cache with the listing ids that they contained. For example, one function call that returned 100 listings would have a tag like this:

Tag = [ 1, 123, 1000, 1010 … ] (listing ids)

This proved to be a disaster. I quickly realized the entire architecture around caching, tagging objects, pro-actively invalidating objects was too fragile and far too complex. The following facts about software were proved true yet again:

1. It’s easier to write new code than debug old code.
2. If you code at the height of your cleverness, then because of our first assertion, you are not qualified to debug your own code.

The solution was to divide and conquer.

I divided all objects into one of two categories. It was either data that could expire on its own (which accounted for about 90% of our data), or it was an item that had to be expired manually, and thus would exist in only one place in the cache.

That last part was crucial.

Mongo Data Modeling

Fast forward a year and I ran into the same problem with Mongo, but didn’t realize it until it was too late.

Our team was struggling a bit trying to learn how to denormalize data in Mongo and made the mistake of putting data that changes often into many documents. For example, we were dealing with article data, and we embedded the article data into many different documents.

When an article would change, we would have to go track down all the places it had been written and then update those documents. Big mistake.

The lesson learned was something I had heard from a Mongo conference, and it has been proven through experience:

1. If the data changes slowly – or never – then it’s ok to copy it. (This means embedding in Mongo terms.)
2. If the data changes rapidly, then never copy it. Have one master version of it and only reference it when needed. (Linking in Mongo terms.)

App Fabric and Mongo – Separated at Birth