It’s funny how things connect to each other. The other day I was watching a video about a graph database, which lead me to the Titan project, which lead me to this blog post: http://thinkaurelius.com/2013/02/04/polyglot-persistence-and-query-with-gremlin/
Our database environment at work is definitely a polyglot. It’s a mixture of SQL Server, Postgres, CouchDB, MongoDB, MySQL, Couchbase and a few others. Most of the time these databases do not need to talk directly. Each one is usually a silo and fairly self contained.
However there are a few strong relationships between databases. MySQL talks to Mongo, and SQL Server talks to Mongo. They have a lot of data in common.
The punchline of the post is that Gremlin (https://github.com/tinkerpop/gremlin/wiki) is a Groovy based REPL that allows us to easily talk to heterogenous databases.
Obviously we could write pure Java to do this, but Groovy with Gremlin makes it a joy, not a chore.
For example, here is some code to connect to SQL Server (assuming you have the SQL JDBC jar file):
import com.microsoft.sqlserver.jdbc.SQLServerDriver import groovy.sql.Sql def getArticles = { sql = Sql.newInstance("jdbc:sqlserver://127.0.0.1;databaseName=cms", "login","password", "com.microsoft.sqlserver.jdbc.SQLServerDriver") def map = new HashMap() sql.eachRow("select article_id, title from article") { map[it.article_id] = title }; return map }
Very simple. Here is some code in the same script to connect to Mongo and get some data:
def getMongoDatabase = { def mongo = new GMongo("127.0.0.1") return mongo.getDB("articles") } def getExistingMongoArticleIds = { db -> def cursor = db.articles.find([:], [_id : 1]).toArray() def article_ids = [] cursor.each { item -> article_ids.add((int)item._id) } return article_ids }
The beauty of this is that as our company moves towards more and more JSON (we use it in our services, our code and our databases), Groovy provides a concise syntax and is easy to use.
Granted all the data has to come back to the client to do cross-system joins, but with some creativity, the performance impacts can be minimized.