Writing a CouchDB replication filter in Erlang

I’m working a project that uses BigCouch (a fork of CouchDb) and the performance when replicating data from one machine to another was a little less than I had hoped for.

Replication with CouchDb uses the _changes feed and instead of replicating an entire database, I had a filter setup to limit the number of records that would go across the wire.

The filter I wrote was in Javascript and looks like this:

{
   "_id": "_design/segmenting",
   "filters": {
      "by_year_month": "function(doc, req){var month = req.query.month;var year = req.query.year;if (doc.pubYear == year && doc.pubMonth == month){return true;}else{return false;}}"
   }
}

The goal is to get documents by using a month field and a year field that exist in each document.

Here is a sample test to see if the filter is working:

http://127.0.0.1:5984/database_name/_changes?filter=segmenting/by_year_month&month=4&year=2012

When reading about something totally unrelated I stumbled across the idea that I could write my filter function in Erlang instead of Javascript. This had the advantage of speaking CouchDb’s native language – Erlang.

After much reading/researching and testing I ended up with the equivalent in Erlang:

{
   "_id": "_design/fast_segmenting",
   "language": "erlang",
   "filters": {
      "by_year_month": "fun({Doc}, {Req}) -> {Query} = proplists:get_value(<<\"query\">>, Req), Month = list_to_integer(binary_to_list(proplists:get_value(<<\"month\">>,            Query))), Year = list_to_integer(binary_to_list(proplists:get_value(<<\"year\">>, Query))), case {proplists:get_value(<<\"pubMonth\">>, Doc),    proplists:get_value(<<\"pubYear\">>, Doc)} of {Month, Year} ->; true; _ ->; false end end."
   }
}

This code is gross to read, so let me show the Javascript and Erlang filters with some better formatting. Keep in mind every document has a field called “pubMonth” representing the month and “pubYear” representing the year.

Javascript:

function(doc, req)
{
   var month = req.query.month;
   var year = req.query.year;

   if (doc.pubYear == year && doc.pubMonth == month)
   {
      return true;
   }
   else
   {
      return false;
   }
}

Erlang:

fun({Doc}, {Req}) ->
   {Query} = proplists:get_value(<<\"query\">>, Req),
   Month = list_to_integer(binary_to_list(proplists:get_value(<<\"month\">>, Query))),
   Year = list_to_integer(binary_to_list(proplists:get_value(<<\"year\">>, Query))),

   case {proplists:get_value(<<\"pubMonth\">>, Doc), proplists:get_value(<<\"pubYear\">>, Doc)} of 
      {Month, Year} -> true;
      _ -> false
   end
end.

To prove the Erlang version was faster I used the same parameters and ran a Javascript test and an Erlang test. I ran the test three times on a server that had no traffic hitting it.

The largest database had 5,896 documents and the smallest had 910. The average of the average three runs per database came to 52%. That means the Erlang function across all databases – over three test runs – was 52% faster!

Writing a CouchDB replication filter in Erlang

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s