Quick CouchDB Javascript engine replacement test (couchjs)

Somewhere online I had read about a new project from a guy at IrisCouch to replace the built in Javascript engine that comes with CouchDB.

You can see his project on GitHub here: https://github.com/iriscouch/couchjs

The instructions for replacing the Javascript engine of CouchDB are pretty simple. I spent most of my time trying to fight to install the right version of node.js and npm on my local Ubuntu VM.

Unfortunately after switching to the new couchjs engine I didn’t notice any performance gains. I’m not exactly sure what the motivation is for writing a new Javascript engine for CouchDB if not for performance. Maybe since I’m running BigCouch the advantages are negated? The GitHub site says the new couchjs Javascript engine is using V8. Do you remember the first time you ran a Javascript heavy feature in Chrome? It was much faster, right? I was very excited to see a performance gain of 10-20% (much like Chrome’s performance gains), but instead saw no visible, or measured, gains.

Bummer.

Something to keep an eye on though in case I need to squeeze every last drop of performance out of CouchDB.

Quick CouchDB Javascript engine replacement test (couchjs)

Writing a CouchDB replication filter in Erlang

I’m working a project that uses BigCouch (a fork of CouchDb) and the performance when replicating data from one machine to another was a little less than I had hoped for.

Replication with CouchDb uses the _changes feed and instead of replicating an entire database, I had a filter setup to limit the number of records that would go across the wire.

The filter I wrote was in Javascript and looks like this:

{
   "_id": "_design/segmenting",
   "filters": {
      "by_year_month": "function(doc, req){var month = req.query.month;var year = req.query.year;if (doc.pubYear == year && doc.pubMonth == month){return true;}else{return false;}}"
   }
}

The goal is to get documents by using a month field and a year field that exist in each document.

Here is a sample test to see if the filter is working:

http://127.0.0.1:5984/database_name/_changes?filter=segmenting/by_year_month&month=4&year=2012

When reading about something totally unrelated I stumbled across the idea that I could write my filter function in Erlang instead of Javascript. This had the advantage of speaking CouchDb’s native language – Erlang.

After much reading/researching and testing I ended up with the equivalent in Erlang:

{
   "_id": "_design/fast_segmenting",
   "language": "erlang",
   "filters": {
      "by_year_month": "fun({Doc}, {Req}) -> {Query} = proplists:get_value(<<\"query\">>, Req), Month = list_to_integer(binary_to_list(proplists:get_value(<<\"month\">>,            Query))), Year = list_to_integer(binary_to_list(proplists:get_value(<<\"year\">>, Query))), case {proplists:get_value(<<\"pubMonth\">>, Doc),    proplists:get_value(<<\"pubYear\">>, Doc)} of {Month, Year} ->; true; _ ->; false end end."
   }
}

This code is gross to read, so let me show the Javascript and Erlang filters with some better formatting. Keep in mind every document has a field called “pubMonth” representing the month and “pubYear” representing the year.

Javascript:

function(doc, req)
{
   var month = req.query.month;
   var year = req.query.year;

   if (doc.pubYear == year && doc.pubMonth == month)
   {
      return true;
   }
   else
   {
      return false;
   }
}

Erlang:

fun({Doc}, {Req}) ->
   {Query} = proplists:get_value(<<\"query\">>, Req),
   Month = list_to_integer(binary_to_list(proplists:get_value(<<\"month\">>, Query))),
   Year = list_to_integer(binary_to_list(proplists:get_value(<<\"year\">>, Query))),

   case {proplists:get_value(<<\"pubMonth\">>, Doc), proplists:get_value(<<\"pubYear\">>, Doc)} of 
      {Month, Year} -> true;
      _ -> false
   end
end.

To prove the Erlang version was faster I used the same parameters and ran a Javascript test and an Erlang test. I ran the test three times on a server that had no traffic hitting it.

The largest database had 5,896 documents and the smallest had 910. The average of the average three runs per database came to 52%. That means the Erlang function across all databases – over three test runs – was 52% faster!

Writing a CouchDB replication filter in Erlang