I’m working a project that uses BigCouch (a fork of CouchDb) and the performance when replicating data from one machine to another was a little less than I had hoped for.
Replication with CouchDb uses the _changes feed and instead of replicating an entire database, I had a filter setup to limit the number of records that would go across the wire.
The filter I wrote was in Javascript and looks like this:
{
"_id": "_design/segmenting",
"filters": {
"by_year_month": "function(doc, req){var month = req.query.month;var year = req.query.year;if (doc.pubYear == year && doc.pubMonth == month){return true;}else{return false;}}"
}
}
The goal is to get documents by using a month field and a year field that exist in each document.
Here is a sample test to see if the filter is working:
http://127.0.0.1:5984/database_name/_changes?filter=segmenting/by_year_month&month=4&year=2012
When reading about something totally unrelated I stumbled across the idea that I could write my filter function in Erlang instead of Javascript. This had the advantage of speaking CouchDb’s native language – Erlang.
After much reading/researching and testing I ended up with the equivalent in Erlang:
{
"_id": "_design/fast_segmenting",
"language": "erlang",
"filters": {
"by_year_month": "fun({Doc}, {Req}) -> {Query} = proplists:get_value(<<\"query\">>, Req), Month = list_to_integer(binary_to_list(proplists:get_value(<<\"month\">>, Query))), Year = list_to_integer(binary_to_list(proplists:get_value(<<\"year\">>, Query))), case {proplists:get_value(<<\"pubMonth\">>, Doc), proplists:get_value(<<\"pubYear\">>, Doc)} of {Month, Year} ->; true; _ ->; false end end."
}
}
This code is gross to read, so let me show the Javascript and Erlang filters with some better formatting. Keep in mind every document has a field called “pubMonth” representing the month and “pubYear” representing the year.
Javascript:
function(doc, req)
{
var month = req.query.month;
var year = req.query.year;
if (doc.pubYear == year && doc.pubMonth == month)
{
return true;
}
else
{
return false;
}
}
Erlang:
fun({Doc}, {Req}) ->
{Query} = proplists:get_value(<<\"query\">>, Req),
Month = list_to_integer(binary_to_list(proplists:get_value(<<\"month\">>, Query))),
Year = list_to_integer(binary_to_list(proplists:get_value(<<\"year\">>, Query))),
case {proplists:get_value(<<\"pubMonth\">>, Doc), proplists:get_value(<<\"pubYear\">>, Doc)} of
{Month, Year} -> true;
_ -> false
end
end.
To prove the Erlang version was faster I used the same parameters and ran a Javascript test and an Erlang test. I ran the test three times on a server that had no traffic hitting it.
The largest database had 5,896 documents and the smallest had 910. The average of the average three runs per database came to 52%. That means the Erlang function across all databases – over three test runs – was 52% faster!