Frustration
4 January, 2002
In a previous job, the hardware specs were notoriously incomplete. It had gotten to the point where I had a six-month-old hardcopy spec that was the "gospel" copy. This was because I had filled in all of the missing or incompletely defined registers for every piece of hardware that I had touched. (And thus had also reverse engineered and debugged at the same time.) This is the story of how I added yet another bit to yet another register in the Gospel According to Brian.
Our board was based around a MIPS R4650, which has exactly one development/debugging tool. We had an "ICE" from EPI. I say ICE in quotes because it wasn't exactly an "in circuit emulator". It was more like a large paperweight or maybe a boat anchor. That's what it was: a very expensive boat anchor.
Working on the assumption that it was a software bug -- it was still early in the life of the project, we didn't know better yet -- I dragged out the logic analyzer. Yup, an old HP logic analyzer was a better tool than the "ICE" from EPI. With a little investigation into the symptoms I managed to set it up to trigger when the processor went out to lunch. Then filter through the backtrace to find out what instructions had been fetched just before it died. Armed with this address, I went through a disassembly listing of the program. The address was in the middle of a small, simple chunk of code: the LED driver.
Now, I don't want to overglorify this piece of code. To call it a driver is a bit of an overstatement. I'll be generous by saying it was about 100 lines of code. It was really just a few convenience functions for accessing that hardware register that toggled the LEDs on the front panel of the board. This explained why the bug didn't happen that often: we were still early in development and the LED driver was hardly used anywhere. Looking at the one place where it was used didn't turn up anything. This "driver" was solid. You couldn't use it incorrectly and break something.
Time to look elsewhere for problems. I grabbed the Gospel According to Brian and checked out that register. The driver was pointed at the correct address. I was using the right bits. What could be wrong? Hmm. Why don't I try writing directly to that address from the shell? I bashed in the address and wham: locked up.
Now at this point in the life of the project, none of the Verilog code for the programmable logic on the board was accessible to anyone but the hardware engineer that wrote it. So I couldn't check out the register myself. Not that I would have been able to -- we realized later that his code was horrendous. So I had to ask him.
"Hey Jack, is there anything tricky about the LED register?"
"What do you mean?"
"I write this value to the LED register and the processor locks up." He wanted to see for himself. I demonstrated. He looked at my annotated version of the spec.
"Oh yeah. You need to write bit 15 high. That's an old version of the spec."
Grrrrr. I closed my eyes. Breathed deeply. Counted to ten. "So where's the new version?"
"I haven't updated it yet."
Sigh.
I added bit 15 to my annotations. Turns out that writing this bit low flushed the FPGA that controlled the SDRAM. So every time I went to write the LEDs, I would flush the FPGA, disable the SDRAM, and the processor went out to lunch waiting for the next instruction to come from memory. I sent out an email to the other driver guys so to let them know so that they could fill in their own copies of the spec...
This company has, of course, shut down this project. They've also closed all of their East Coast operations. And they were delisted from the Nasdaq. Oh yeah, and they also filed for Chapter 11 last summer.
There's other amusing stories from this place. It would make a great case study in organizational behavior. Or project management. I'll write some more stories here when I get a chance...