Cross Database Joins with Groovy/Gremlin

It’s funny how things connect to each other. The other day I was watching a video about a graph database, which lead me to the Titan project, which lead me to this blog post:

Our database environment at work is definitely a polyglot. It’s a mixture of SQL Server, Postgres, CouchDB, MongoDB, MySQL, Couchbase and a few others. Most of the time these databases do not need to talk directly. Each one is usually a silo and fairly self contained.

However there are a few strong relationships between databases. MySQL talks to Mongo, and SQL Server talks to Mongo. They have a lot of data in common.

The punchline of the post is that Gremlin ( is a Groovy based REPL that allows us to easily talk to heterogenous databases.

Obviously we could write pure Java to do this, but Groovy with Gremlin makes it a joy, not a chore.

For example, here is some code to connect to SQL Server (assuming you have the SQL JDBC jar file):

import groovy.sql.Sql

def getArticles = {
        sql = Sql.newInstance("jdbc:sqlserver://;databaseName=cms", "login","password", "")

        def map = new HashMap()

        sql.eachRow("select article_id, title from article") {
                map[it.article_id] = title

        return map

Very simple. Here is some code in the same script to connect to Mongo and get some data:

def getMongoDatabase = {
        def mongo = new GMongo("")
        return mongo.getDB("articles")

def getExistingMongoArticleIds = { db ->
        def cursor = db.articles.find([:], [_id : 1]).toArray()
        def article_ids = []

        cursor.each { item ->

        return article_ids

The beauty of this is that as our company moves towards more and more JSON (we use it in our services, our code and our databases), Groovy provides a concise syntax and is easy to use.

Granted all the data has to come back to the client to do cross-system joins, but with some creativity, the performance impacts can be minimized.

Cross Database Joins with Groovy/Gremlin