Tejas Software Consulting Newsletter » Blog Archive » Yet another email hyperlink bug

Yet another email hyperlink bug

Often it’s useful to swap stories about bugs, and I thought it might be useful to share how I isolated a problem with Gmail that affected my readers.

Have you ever run into this bug - you’re reading a plain text email with a URL at the end of a sentence, like http://example.com. Your mail program conveniently makes the link clickable, but it includes the “.” in the URL, so the link doesn’t work. I can appreciate that it might be difficult for an application to apply enough heuristics to get a URL in ASCII text to work as intended every time. The bug I’m going to describe, though, is more of a head-scratcher than that.

I sent out the Tejas Software Consulting Newsletter to my email distribution list, and within a few minutes, two different people reported that the link to my home page near the top of the email didn’t work. The contact information in the message looked like this, in plain ASCII text:

-Danny R. Faught, Software Alchemist
Tejas Software Consulting
faught@tejasconsulting.com
http://tejasconsulting.com/
+1-817-294-3998

The http://tejasconsulting.com/ link worked fine for me, but the report from these two readers was that their browser tried to go to http://tejasconsulting.com/+1-817-294-3998 instead, which yielded a 404 error. I double-check the raw source of the newsletter as I received it, even using the “od” utility to verify that the only data between the URL and the phone number was a carriage return and a linefeed. It must be a problem with their software. I ask the two people to give me more information about their operating system and mail program. One replies quickly - Gmail on IE 7 and Windows Vista. We both concur that Vista or IE is probably to blame.

I don’t have Windows Vista, but I fire up a Windows XP Pro PC that has IE 7. Using my webmail interface, the URL works fine. Dang, maybe the problem is with Vista. But just to be sure, I forward the email to my Gmail account and I do reproduce the bug there. Yep, same problem with Gmail on Firefox on another XP box. Just about that time, the second complainant reports that he is also using Gmail.

Okay, now we’re getting somewhere. I hypothesize that Gmail is interpreting the + at the beginning of the line as a line continuation, and I start to wonder how I can work around it. I add a page named “1-817-294-3998″ to the root of my web server, with a request to contact me about configuration details to help track down the problem. Along the way, vim tells me “E16: Invalid range: 1-817-294-3998″ when I try to create the file. I have to work around the problem, because I can’t figure out how to tell vim not to think the “+” is a command line option. <sigh> This happens to me all the time - I find some tricky test data for one application, and it breaks other things too.

Now let’s try to isolate the problem. I create this test data in the body of an email that I send to my Gmail account -

http://tejasconsulting.com/
+x

http://tejasconsulting.com/
a+b

http://tejasconsulting.com/ +1-817-294-3998

http://tejasconsulting.com/
+1
+2
+3
+4

+1
http://tejasconsulting.com/
+2

Because it could take a few minutes between each test case while I wait for the mail to arrive, I decided to send several subtests at the same time. I wait impatiently for the mail to show up on Gmail, and it doesn’t help that when I click the refresh link, I get no indication that it tried to do anything. The message shows up back at tejasconsulting.com because of a forwarding rule I have in place. Then a minute or so later, I see the email on my Gmail account. Strange, you would think that I would see it first on the machine it was forwarded from.

While I wait, I glance at another message, which includes an email in <> angle brackets in the middle of a line, followed by a “- Show quoted text -” link on the next line, with some text not shown unless I click the link. Strange - the hidden text is not a quote. I bet the “>” character is confusing it.

Anyway, when the test email arrives, none of these test cases reproduce the problem. Dang, it’s more complicated than I thought. So, as I instruct my students, I back up and start to take a more systematic top-down approach. I take all five lines of contact information plus the blank lines that follows and paste them several times into a new email. I then progressively chop off small bits of the phone number from one to the next, giving six subtests. Then I go back to the full phone number and progressively chop off the lines of text above it, giving four more subtests. I set the subject to “test 2″ and send it off.

Aha, now I’m getting somewhere. All subtests that use the full phone number reproduce the bug. In the chopped phone number tests, “+1-817″ reproduces the bug, but “+1-” does not. I set up test number 3, which determines that “+1-8″ is the smallest substring that reproduces the problem.

Hmmm, strange. With test number four, I learn that the “+” and “-” can be swapped, or I can use “+” or “-” for both. I can use letters in place of the numbers, and I still see the bug, but I can’t use spaces and I can’t add a space to the front of the string. Multiple lines under the URL that match the pattern are all concatenated.

With test five, I play with replacing the “+” and “-” characters with different symbols, finding some that trigger the bug and some that don’t. I see that I can vary the position of the symbols on the line, and add extra symbols and still reproduce the bug.

Okay, I’m starting to face a combinatorial explosion here to get this isolated any further. But I decide that I can try all printable ASCII characters from hex 40 to hex 7e in one position in the string - that’s 95 tests. I go to my trusty Bash shell and whip out a Perl one-liner to generate the data. I copy it from the shell, paste into an an email, and that’s test six. I wait for it to arrive. And wait some more. It shows up on tejasconsulting.com, but it’s nowhere to be seen on Gmail. Cool! This is a much bigger bug. I try one more time, test six-a, same results.

For test seven and eight, I chop the test six data roughly in half, putting half in each email. Strange, both arrive just fine - so it’s not as simple as a single character causing the delivery problem. Wait a minute, six-a arrives on Gmail, but the original six is still not there. Okay, forget the delivery failure for now, let’s get back on track.

Looking at emails number seven and eight, I find that all of these characters trigger the problem, if they replace the third character in “+a?b” - “#&,+-/=?0123456789″. I’m glad I took a systematic approach; I probably wouldn’t have bothered to try numbers otherwise.

Okay, test nine - with 18 problem characters, I could try all combinations of these characters in two positions with 324 subtests. Hmmm, how about I just use 0 and 9 and leave out the rest of the numbers. That gives me 81 test cases, which I send on their way. All of them reproduce the bug, which I can tell simply by noticing that the second line in each block is blue and underlined. (I notice later that I left “+” out of the test data, so I should have actually had 100 tests cases. Oops. This step was really overkill anyway.)

Test ten - I’m running out of ideas for how to characterize this. I try these strings: “##a”, “#a#”, “a##”, “xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx#a#”, “xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx#a#b”, and ” #a#”. All reproduce the bug except the case with the leading space - ” #a#”, as expected, but I’m confused when “a##” also does not reproduce it.

All right, one more test. For test eleven, I find that “a##b” and “x#a#” reproduces the bug, and “##”, “###”, and “+-=” do not. That didn’t do me much good. I decide to fire up the customer service form on Gmail and describe this as best as I can:

I included these two lines of contact information in a newsletter I sent out:

http://tejasconsulting.com/
+1-817-294-3998

Users who read the message in Gmail complained that the URL was “http://tejasconsulting.com/+1-817-294-3998″ when they clicked it, and it didn’t work. I expected that they would go to http://tejasconsulting.com/ when they click the link.

I have isolated the problem somewhat - it occurs under these conditions:

* Reading a plain text email in Gmail
* A URL is at the end of a line
* One or more lines directly below the URL contain any two or more of these characters - “#&,+-/=?0123456789″ plus one or more characters not in this set, like “1″ or “a”.
* The lines below the URL do not start with a space character.

Examples of text that exhibits the URL concatenation problem:

http://tejasconsulting.com/
##a

http://tejasconsulting.com/
#a#

http://tejasconsulting.com/
a##b

There is at least one exception to the conditions above, where the problem does not occur:

http://tejasconsulting.com/
a##

What do you think - is that a reasonable summary of the issue?

By the way, in the narrative above, I see symptoms of four additional Gmail problems to investigate, plus one in vim. It never ends, does it?

Leave a Reply

You must be logged in to post a comment.