Integration Testing with Entity Framework

At work we have a .NET codebase that is largely uncovered by unit tests. Currently we have about 20% coverage and unfortunately a large chunk of the untested code is integrated tightly with Entity Framework.

After searching the web for a while I realized that it was going to be very tough to unit test anything related to Entity Framework. The best I could hope for was writing integration tests.

I’ve come across the problem of doing integration testing against databases before and each time I’ve failed miserably. Two examples stand out in my mind:

Youthful attempts at MySQL integration testing

I tried to do integration testing with MySQL for a SaaS company many years ago. The steps went something like this for testing:

a. Restore a production backup for MySQL to a new instance.
b. Run a test against the database.
c. Check the diffs to make sure we wrote data correctly, updated data, etc.
d. Kill the instance.
e. Repeat from (a) for the next test.

The problem was that it took too long to restore the database for each test. Also the schema and data changed often enough that it ended up breaking the tests that depended on a static database.

Overall the approach was just too complex to manage. I was in a bit over my head at that time in my career.

More mature integration testing with Mongo and MySQL

Fast forward a few years and I found myself needing to do database integration tests with Mongo and MySQL.

This time instead of trying to create a live instance of MySQL, I came across the idea of creating an embedded database in memory that my tests would point to. For MySQL I ended up using H2 since the data types were fairly similar. I tried other databases, but things like Derby had compatability issues when trying to map data/schemas from MySQL.

For Mongo we used a project called “Embedded Mongo” and it worked pretty well.

Even though I successfully got both integration tests off the ground and working, I would consider the effort a failure. Why?

After several months of heavy development around the databases, I realized I had neglected the integration tests. I simply did not keep up with the maintenance of them and by the time I re-visited them, they were all broken.

The effort was a failure because I made the decision to abandon the integration tests instead of update them. The gap between reality and the tests had grown so large that I didn’t have the mental energy to fix the issues.

Entity Framework Integration Testing

Back to present day, and I find myself in a familiar place. I want to do integration tests against a database. I remember all the lessons I’ve learned from the past:

1. Make sure your tests can be updated easily in the future.
2. Try to use an embedded/in memory database.
3. Speed matters. You don’t want to create an entire database from scratch for each test.
4. Complexity kills. It’s easy to spiral out of control with complexity making the testing work 10x harder than the actual implementation.

After spending a few days waist deep in thought and neck deep in code, I’ve given up for the time being on doing integration testing with Entity Framework.

In a way I’m proud of myself because in the past I would easily spend (or waste I suppose) weeks trying to nail a solution. The older I get, the better I’ve become at cutting my loses. Better to abandon an approach entirely and come back to it later than be stubborn and let other projects suffer.

Some things I noticed from my attempts to test Entity Framework:

1. It’s confusing. The documentation, generated code, the articles online … nothing was intuitive to me.
2. Lots of hacks. The people that want to do integration tests around EF are getting really creative. I always get nervous when I don’t sense a community concensus around an approach.
3. SQL Server Compact Edition (CE) is a poor choice to mimic SQL Server. Lots of people suggested replacing a live SQL Server instance with the embedded version. While I was able to get this working, the schema discrepancies were too dramatic to make it feasible. Plus it was confusing how I could alternate between SQL Server and SQL Server CE in EF.
4. Complexity. It was just too complex. Yes there are tools to generate a schema that you can use to construct a database in SQL Server CE. Yes there are tools to generate data for SQL Server CE. Yesyou can automate all of this. No it cannot be easy.

Conclusion

The real issue at work might be that we’ve hidden a ton of Linq that ties directly to Entity Framework in one database access class. The design from the start didn’t have unit testing in mind and it might be too late with project deadlines, etc. to refactor.

I would agree with a statement I read online that it’s a poor substitute to do testing against Linq to objects because EF generates actual SQL from the Linq you’ve written. It’s best to test the actual SQL generation to find potential bugs.

One of these days I hope to find a solid approach that will span technologies for doing integration testing with databases.

Integration Testing with Entity Framework

App Fabric and Mongo – Separated at Birth

It occured to me the other day that the data modeling I have done in the past for AppFabric and the data modeling for Mongo were shockingly similar.

AppFabric Modeling

At PropertyRoom.com we used App Fabric to cache auction listings, prices, categories for listings, etc. We only had a few SQL Server machines on the back end and they were easily overwhelmed by the day to day traffic. App Fabric’s job was to alleviate the pain the databases felt.

The initial solution was to cache the results of all function calls in App Fabric. For example this call:

public List GetCategoryListings(int categoryId) { ... }

This would cache all auction listings for a given category, including the price and how much time was left in the auction.

I quickly found auction listings being sprayed all over the cache with additional calls like these:

public Listing GetListing(int listingId) { ... }
public List GetWatchList(int userId) { ... }
public List GetActionListings(List listingIds) { ... }

Each call held some information about the auction listing (like the listing title) and when a listing got updated I came across the problem of having to pro-actively invalidate the cache.

In other words, if listing 123 changed from “New Gold Watch” to “New Rolodex Gold Watch”, I would have to examine the cache to update everything that was suddenly old. I came up with a naive approach that tagged objects in the cache with the listing ids that they contained. For example, one function call that returned 100 listings would have a tag like this:

Tag = [ 1, 123, 1000, 1010 … ] (listing ids)

This proved to be a disaster. I quickly realized the entire architecture around caching, tagging objects, pro-actively invalidating objects was too fragile and far too complex. The following facts about software were proved true yet again:

1. It’s easier to write new code than debug old code.
2. If you code at the height of your cleverness, then because of our first assertion, you are not qualified to debug your own code.

The solution was to divide and conquer.

I divided all objects into one of two categories. It was either data that could expire on its own (which accounted for about 90% of our data), or it was an item that had to be expired manually, and thus would exist in only one place in the cache.

That last part was crucial.

Mongo Data Modeling

Fast forward a year and I ran into the same problem with Mongo, but didn’t realize it until it was too late.

Our team was struggling a bit trying to learn how to denormalize data in Mongo and made the mistake of putting data that changes often into many documents. For example, we were dealing with article data, and we embedded the article data into many different documents.

When an article would change, we would have to go track down all the places it had been written and then update those documents. Big mistake.

The lesson learned was something I had heard from a Mongo conference, and it has been proven through experience:

1. If the data changes slowly – or never – then it’s ok to copy it. (This means embedding in Mongo terms.)
2. If the data changes rapidly, then never copy it. Have one master version of it and only reference it when needed. (Linking in Mongo terms.)

App Fabric and Mongo – Separated at Birth

Moving data from SQL Server to MongoDB/CouchDB/Couchbase

I must have re-written the first sentence for this blog 20 times. The reason is that I realize the audience for this post must be incredibly small. You have to work somewhere that is using SQL Server. You also have to work somewhere that is using MongoDB or a Couch product (CouchDB, Couchbase).

Even further you have to want to move data from SQL Server to a document database that stores its data as JSON. But who knows, maybe someone will find this approach useful. =)

The goal is to take data from SQL Server (relational) to MongoDB (document). I’m assuming since you are running SQL Server that you have some .NET languages around the office.

To start, make a new Visual Studio solution and create a Visual C# SQL CLR Database Project. You can find this under the > Database grouping of project types.

The basic idea is that you want to write code, deploy it to SQL Server and then run the code as a stored procedure that produces JSON.

Example of code:

using System.Collections.Generic;
using System.Data;
using Microsoft.SqlServer.Server;
using POC;
using POC.PageElements;


public partial class StoredProcedures
{
    [Microsoft.SqlServer.Server.SqlProcedure]
    public static void StoredProcedure1()
    {
        Builder builder = new Builder();
        var articles = builder.Build<Article>("exec usp_get_articles");

        foreach (var article in articles)
        {
            article.Contributors = builder.Build<Contributor>("exec usp_get_article_contributors " + article.ArticleId);
            foreach (var contributor in article.Contributors)
            {
                contributor.SocialLinks = builder.Build<SocialLink>("select * from contributor_social_link where contributor_id = " + contributor.Id);
            }

            article.Categories = builder.Build<Category>("exec usp_get_article_categories " + article.ArticleId);

            article.Pages = builder.Build<Page>("exec usp_get_article_pages " + article.ArticleId);
            foreach (var page in article.Pages)
            {
                var paragraphs = builder.Build<Paragraph>("exec usp_get_paragraphs " + page.Id);
                foreach (var paragraph in paragraphs)
                {
                    var images = builder.Build<Image>("exec usp_get_paragraph_image " + paragraph.Id);
                    paragraph.Image = (images.Count > 0) ? images.ToArray()[0] : null;
                    page.PageElements.Add(paragraph);
                }


                var tables = builder.Build<Table>("exec usp_get_tables " + page.Id);
                foreach (var table in tables)
                {
                    page.PageElements.Add(table);
                }


                var imageGroups = builder.Build<ImageGroup>("exec usp_get_image_groups " + page.Id);
                foreach (var imageGroup in imageGroups)
                {
                    var images = builder.Build<Image>("exec usp_get_image_group_images " + imageGroup.Id);
                    imageGroup.Images = (images.Count > 0) ? images : new List<Image>();
                    page.PageElements.Add(imageGroup);
                }


                var inlineImages = builder.Build<Image>("exec usp_get_inline_images " + page.Id);
                foreach (var inlineImage in inlineImages)
                {
                    page.PageElements.Add(inlineImage);
                }


                var videos = builder.Build<Video>("exec usp_get_videos " + page.Id);
                foreach (var video in videos)
                {
                    page.PageElements.Add(video);
                }


                var blockQuotes = builder.Build<BlockQuote>("exec usp_get_block_quotes " + page.Id);
                foreach (var blockQuote in blockQuotes)
                {
                    page.PageElements.Add(blockQuote);
                }
            }
        }



        SqlPipe sp;
        sp = SqlContext.Pipe;

        var dataRecord = new SqlDataRecord(
            new SqlMetaData("id", SqlDbType.Int),
            new SqlMetaData("json", SqlDbType.Text));
        sp.SendResultsStart(dataRecord);

        foreach (var article in articles)
        {
            dataRecord.SetInt32(0, article.Id);
            dataRecord.SetString(1, article.ToString());
            sp.SendResultsRow(dataRecord);
        }

        sp.SendResultsEnd();
    }
};

Some of the code calls functions that I haven’t included, but you can use your imagination to understand what they do. They simply run queries that populate a list of objects. We keep constructing a very rich Article object by making one database call after another.

Once we’re done, we have a list of articles that have been populated from the database. Remember the code above is actually deployed as executable code in SQL Server. Inside of Management Studio you would navigate to your database and then Programmability->Assemblies. If deployment goes well, you’ll see your code there.

The end of the code blob above shows that we are calling article.ToString(); The contents of this call:

        public override string ToString()
        {
            return JsonConvert.SerializeObject(this);
        }

As you can see we’re simply serializing the article to JSON using the Newtonsoft JSON library for C#.

I will say that the deployment of code to SQL Server (especially third party libraries) is not easy. It’s possible, but it will take some Googling and investigation.

The benefits of being able to produce JSON from SQL Server is massive however. You can quickly change the objects in C# that get serialized to JSON, which helps development velocity. My use case has each article running roughly 13 queries, which means that the total number of queries for 1,000 articles would be 13,000!

However in my testing I was able to produce around 100,000 articles in less than 10 minutes. The secret sauce is the fact that the code is running on the database machine and so we don’t have to pay the price of network latency.

Hopefully this approach is helpful in sending non-trivially formed data from SQL Server to MongoDB!

Moving data from SQL Server to MongoDB/CouchDB/Couchbase