Toughest Bug

by john on November 2, 2003

Don Park asks, What is the toughest bug you had to fix?

I’ve worked on my share of bugs over the years, mostly ones of my own doing. For me the most difficult ones to debug where the work of others.

It wasn’t a bug per-se but the most difficult problem I ever worked while in Tech Support lasted six months. Fortunately for the customer the problem was only happening on one machine so it wasn’t as though he was stopped dead for half a year, but it was a strange problem and one I really wanted to solve. The problem should have been a simple one – he was using an ODBC compliant application to connect to an AS/400. There are many layers to analyze in such a client-server configuration, but it wasn’t anything I hadn’t done a thousand times before. This time though after a week of working on the problem I escalated to development because something definitely seemed wrong.

Another week with our development staff and we were convinced the problem was in the network transport layer so we involved the tech support group for that vendor. And after some period of time with them they escalated to their development staff and still no resolution. So by this time a couple of months have gone by and the problem has been escalated through two technical support and development organizations.

Now like I said this was not a show-stopping issue because the customer was still able to work, but nobody likes to have a problem drag on like this. I had cordoned off a section of my desk for all the FAX communication we had which including .INI files, etc. The pile was about 6 inches high.

Then one day after the problem was about six months old I decided to take another look at the ODBC configuration file for like the thousandth time. For some reason the problem jumped out at me. One of the settings was AS/400 serial number, which typically looked something like “S1234567” In the .INI file I was looking at the value was “1234567”. I called the customer, asked him to insert the letter “S” and sure enough, it worked.

Six months, a tech-savvy customer, two different tech support groups, two different development groups, and the problem boils down to the letter S. To say I was relieved, yet pissed, would be an understatement.

{ 1 comment }

Joe Grossberg November 3, 2003 at 2:25 pm

Oh, that’s an easy one. I didn’t work on this particular problem; I witnessed a coworker suffer.

We had a small client whose website was driven by an Access database. They also had FTP access, so they could make minor changes without buying a whole content management system or having to pay us to do it.

But we had a recurring problem. Namely, on a regular basis, the database would appear to lose all data it had collected after a certain date.

After specifically asking the client if he had been doing anything to overwrite the database (he vehemently insisted otherwise, and was quite angry we would even consider blaming him).

One of our programmers spent hours looking at stuff — scheduled tasks, SQL statements, security, log files, etc. — to no avail.

Then, the client called back to let us know that, yes, an old copy of the Access database was in his FTP folder. And when he did his weekly FTPing of files, that one was among them.

The hardest one I personally worked on had to do with a misunderstanding about the details of how the Python programming language was implemented: http://www.joegrossberg.com/archives/000434.html

Previous post:

Next post: