Cross Database Joins with Groovy/Gremlin

It’s funny how things connect to each other. The other day I was watching a video about a graph database, which lead me to the Titan project, which lead me to this blog post: http://thinkaurelius.com/2013/02/04/polyglot-persistence-and-query-with-gremlin/

Our database environment at work is definitely a polyglot. It’s a mixture of SQL Server, Postgres, CouchDB, MongoDB, MySQL, Couchbase and a few others. Most of the time these databases do not need to talk directly. Each one is usually a silo and fairly self contained.

However there are a few strong relationships between databases. MySQL talks to Mongo, and SQL Server talks to Mongo. They have a lot of data in common.

The punchline of the post is that Gremlin (https://github.com/tinkerpop/gremlin/wiki) is a Groovy based REPL that allows us to easily talk to heterogenous databases.

Obviously we could write pure Java to do this, but Groovy with Gremlin makes it a joy, not a chore.

For example, here is some code to connect to SQL Server (assuming you have the SQL JDBC jar file):

import com.microsoft.sqlserver.jdbc.SQLServerDriver
import groovy.sql.Sql

def getArticles = {
        sql = Sql.newInstance("jdbc:sqlserver://127.0.0.1;databaseName=cms", "login","password", "com.microsoft.sqlserver.jdbc.SQLServerDriver")

        def map = new HashMap()

        sql.eachRow("select article_id, title from article") {
                map[it.article_id] = title
        };

        return map
}

Very simple. Here is some code in the same script to connect to Mongo and get some data:

def getMongoDatabase = {
        def mongo = new GMongo("127.0.0.1")
        return mongo.getDB("articles")
}

def getExistingMongoArticleIds = { db ->
        def cursor = db.articles.find([:], [_id : 1]).toArray()
        def article_ids = []

        cursor.each { item ->
                article_ids.add((int)item._id)
        }

        return article_ids
}

The beauty of this is that as our company moves towards more and more JSON (we use it in our services, our code and our databases), Groovy provides a concise syntax and is easy to use.

Granted all the data has to come back to the client to do cross-system joins, but with some creativity, the performance impacts can be minimized.

Cross Database Joins with Groovy/Gremlin

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s