App Fabric and Mongo – Separated at Birth

It occured to me the other day that the data modeling I have done in the past for AppFabric and the data modeling for Mongo were shockingly similar.

AppFabric Modeling

At PropertyRoom.com we used App Fabric to cache auction listings, prices, categories for listings, etc. We only had a few SQL Server machines on the back end and they were easily overwhelmed by the day to day traffic. App Fabric’s job was to alleviate the pain the databases felt.

The initial solution was to cache the results of all function calls in App Fabric. For example this call:

public List GetCategoryListings(int categoryId) { ... }

This would cache all auction listings for a given category, including the price and how much time was left in the auction.

I quickly found auction listings being sprayed all over the cache with additional calls like these:

public Listing GetListing(int listingId) { ... }
public List GetWatchList(int userId) { ... }
public List GetActionListings(List listingIds) { ... }

Each call held some information about the auction listing (like the listing title) and when a listing got updated I came across the problem of having to pro-actively invalidate the cache.

In other words, if listing 123 changed from “New Gold Watch” to “New Rolodex Gold Watch”, I would have to examine the cache to update everything that was suddenly old. I came up with a naive approach that tagged objects in the cache with the listing ids that they contained. For example, one function call that returned 100 listings would have a tag like this:

Tag = [ 1, 123, 1000, 1010 … ] (listing ids)

This proved to be a disaster. I quickly realized the entire architecture around caching, tagging objects, pro-actively invalidating objects was too fragile and far too complex. The following facts about software were proved true yet again:

1. It’s easier to write new code than debug old code.
2. If you code at the height of your cleverness, then because of our first assertion, you are not qualified to debug your own code.

The solution was to divide and conquer.

I divided all objects into one of two categories. It was either data that could expire on its own (which accounted for about 90% of our data), or it was an item that had to be expired manually, and thus would exist in only one place in the cache.

That last part was crucial.

Mongo Data Modeling

Fast forward a year and I ran into the same problem with Mongo, but didn’t realize it until it was too late.

Our team was struggling a bit trying to learn how to denormalize data in Mongo and made the mistake of putting data that changes often into many documents. For example, we were dealing with article data, and we embedded the article data into many different documents.

When an article would change, we would have to go track down all the places it had been written and then update those documents. Big mistake.

The lesson learned was something I had heard from a Mongo conference, and it has been proven through experience:

1. If the data changes slowly – or never – then it’s ok to copy it. (This means embedding in Mongo terms.)
2. If the data changes rapidly, then never copy it. Have one master version of it and only reference it when needed. (Linking in Mongo terms.)

App Fabric and Mongo – Separated at Birth

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s