Debugging War Story – 8 hours with iTunes

The other day I got an email saying that a new production bug had come up and it was my turn to take it.

The Problem

We have a bunch of podcasts at work and each is powered by an RSS feed. An example URL looks something like this:

…/mc/PodcastStream.aspx?p=yDi0V4EwP58=&iid=Hc0ccBgnSdU=&site=yDi0V4EwP58=

For some reason iTunes dropped our podcast and when we tried to re-submit it, we got this error:

itunes podcast error parsing feed: invalid xml: error on line 1: content is not allowed in prolog.

Thus the podcast is dead in the water and even though the URL looks ok, iTunes rejects our RSS feed.

Initial Investigation

The technology stack here is C# (.NET 4.0) and IIS. The code is a straightforward .aspx page that generates XML and writes it to the HTTP response stream.

Things I tried quickly:

1. I checked the byte order marker for the body of the HTTP response. I removed the BOM that I found and it still failed.
2. I tried encoding the HTTP response as ASCII instead of UTF8. Nope.
3. Messed around with the ContentType HTTP header. Tried “text/xml” (which apparently should default to ASCII), “text/xml; charset-utf8” and “text/html”. Interesting work, but none of these worked.
4. Copied the IIS logs to my local machine so I could see iTunes requesting the RSS feed page. I found it easily and it seemed we were serving the content fine. Lots of 200 responses. (I noted that iTunes was doing a HEAD and then a GET. Something to look at later?)
5. Created a static, good working version of the RSS feed. iTunes accepted this, but it wasn’t dynamic, so newer podcasts wouldn’t be seen.
6. Ran my good and bad RSS feeds through the RSS feed validator website. (http://feedvalidator.org/)

Having failed to hit low hanging fruit I knew I had to dig deeper.

A deeper dive

I try not to get too complex too fast when troubleshooting, but sometimes I give in to the “hero” work. I want to find a really nasty bug so I can say that I solved something difficult. I once read an article about a guy that proved a cosmic ray flipped a bit on his box! (https://blogs.oracle.com/ksplice/entry/attack_of_the_cosmic_rays1).

A more serious approach:

1. I compared the HTTP headers of the valid static request/response to the failed ASPX request/response. There were some minor differences, but nothing that stood out.
2. I moved the successful base case on to the production web server where the failing RSS feed was. This would help eliminate IIS as the issue. Sure enough, the static RSS page was fine and the dynamic one was bad.
3. I needed to iterate quickly with changes so I moved all the C# code from a .cs file to the .aspx file. This was fantastic because I could make changes without having to recompile and deploy a new DLL. Since I was working in production I made a /temp/ directory where I could work without risking some sloppy error.
4. Now I had two files on production and one was failing and one was ok. The bad one was an .aspx file and the good one was an .html file. Next up was to change the .html file to an .aspx, so I did that and had the C# simply read a text file from disk and serve that.
5. Loaded up a hex editor to compare the output from the bad feed and the good feed. They were the same!
6. Went to my Ubuntu VM and did a bunch of curl tests. Who knows what environment iTunes was running to make the calls. curl reported nothing interesting. It seemed to be the same.

Desperation

At this point I’ve taken out a hex editor. I’ve moved to Linux to get more serious with curl. I’ve begun working on production to do quick tests. Things have gone from serious to desperate.

To summarize, at this point, this is what the good feed and the bad feed had in common:

1. Same HTTP headers for requests and responses. Byte for byte, the same!
2. Same content in the HTTP response body. The same even down at the byte by byte level!
3. Same URL and querystring.
4. On the same server in the same directory.
5. Both running through IIS.

Now my brain was spinning. Same headers? Same responses? Ugh.

Out came Wireshark on my Ubuntu VM. What I found:

1. All TCP traffic was the same between the good and bad feed.
2. The only difference was that the bad feeds consistently seemed to hit one extra domain controller on our network. The good feeds never hit this machine. It turns out this was a major dead end and I’m stoked I didn’t fall for it.
3. All TCP packets were the same, meaning each byte was the same between the two requests. (At least the same for things that mattered.)

Resolution!

The big break came when I started to take the code apart for the broken feed line by line. I commented out large sections and quickly narrowed it down to one section of code.

            if ((!(Request.QueryString["s"] == null)
                        && ((Request.QueryString["s"].Length > 0)
                        && !(enc.Decrypt(Request.QueryString["s"]) == "Error"))))
            {
		...
            }
            else if ((!(Request.QueryString["p"] == null)
                        && ((Request.QueryString["p"].Length > 0)
                        && !(enc.Decrypt(Request.QueryString["p"]) == "Error"))))
            {
		...
            }
            else
            {
                Response.Write("No S or P in querystring");
                Response.End();
            }	

The code looks at the query string and checked to see if certain values are there. If they are, it tries to decrypt the values and then parses the values into integers.

I quickly realized that for some reason we were always hitting the Response.Write(…) line only when iTunes hit the URL.

If I hit the URL in Chrome, Firefox, via curl … all these returned fine. But when iTunes took the URL, it failed.

The reason ended up being that iTunes lower cases the URL before checking it. I pasted in the following:

…/mc/PodcastStream.aspx?p=yDi0V4EwP58=&iid=Hc0ccBgnSdU=&site=yDi0V4EwP58=

which iTunes turned into:

…/mc/podcaststream.aspx?p=ydi0c4ewp58=&iid=hc0ccbgnsdu=&site=ydi0c4ewp58=

Of course trying to decrypt a case sensitive string that has been lower cased fails.

End, end, end

The fix in the end was to remove the encrypted values from the query string (they weren’t sensitive anyways) and resubmit to iTunes.

All is well now and eight hours has been almost entirely wasted.

Debugging War Story – 8 hours with iTunes

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s