Several years ago I got the chance to work with Microsoft’s AppFabric for caching. I had a heck of a time getting it running well because when it failed, it did so silently. No stack traces, no log files, no messages in the console … just null objects and non-working software.
Recently I ran into some trouble with the Windows Azure SDK and it felt all too familiar.
Like the AppFabric experience, everything started well. I added a new worker role to an existing cloud service project and it ran fine. Data went from point A to point B, I checked in my code and forgot about it.
Later I heard groans coming from another developer at work. Turns out I had upgraded the Azure SDK from version 1.8 to 2.0 and it was forcing everyone else to download the new SDK so the code would build.
After spending a few minutes upgrading we made sure the build was good and moved on to other things.
Several weeks later the decision was made to move my worker role project to a different cloud service. I created the new project, added the worker role, checked in and since everything built fine I figured I was safe.
I didn’t need to test it again because I hadn’t made any code changes. As long as it built, it should run the same as it had before.
A new feature came across my desk regarding the worker role, so it was time to fire it up and do some work.
The first thing I did was to run the project to refresh my memory as to how it worked.
Failure. The Windows Azure Emulator would start and then immediately stop. Huh?
Eventually I was able to scrape out the contents of the emulator’s console and I could see that the worker role was indeed starting, but right after starting, it would die. No descriptive errors, no activity in Visual Studio, nothing. Silence.
Stabs in the Dark
A short list of what I tried:
1. Refreshing the DLLs in the worker role project. Version 1.8? Try 2.0. Version 2.0? Maybe 1.8 again?
2. Removing all code related to dependencies. Maybe our TraceLogger class was failing in some odd way?
3. Tracing. I tried to trace the SDK and my code, but couldn’t get it working.
4. Breakpoints. I would set them all over but none of them were getting hit.
5. Event viewer. The old standby. Empty. No messages, no errors.
6. Lots and lots of Googling. I read a lot of articles about flipping this bit or that flag, but nothing worked.
7. Running another worker project in the same codebase. That worked fine, but I couldn’t see why.
8. Doing a diff between the .csproj file of a working worker role and my failing one.
9. Doing a diff between a working cloud service project and my broken one.
The prior steps all took place in a few hours towards the end of a work day. The next morning I came in and tried this:
1. Slowly and methodically start commenting code out. I removed one method at a time – or stubbed it out – and re-ran the worker role locally.
2. I went through all the code in this manner. The class, the base class it inherited from and all properties/fields that were in both.
3. Changed the custom base class (a wrapper around RoleEntryPoint) to RoleEntryPoint. Ah ha! That worked!
The problem ended up being that we had several projects “Common”, “Utility” that had older version of the Azure SDK. They were all using 1.8 and my worker role was trying to use 2.0.
Just inheriting from a class in one of those other projects caused the failure.
Once again though I wish there was some output somewhere that I could have easily latched on to that would have pointed me in the right direction.