2011-05-26

Four Monitors, Anyone? A PCIe Card for X1, X4, or X8 Lane Slots

After several dead ends (PCI cards with unresolved PCI-to-PCI bridge issues, x16 extender cards that didn't work in the form factor of the chassis, interrupt conflicts with NIC hardware), one solution has been found that works in my Dell PowerEdge 2900. This is a low end server being used as a workstation. It wasn't designed to support an x16 PCIe card, so it's been difficult finding a substitution. The AMD ATI FirePro 2270 x1 version nicely supported dual monitors and installed without a hitch. Others with older machines where PCIe x1, x4 or x8 lanes are needed may wish to consider this alternative. Note: A "pro" not a "gamer" class device, but you can use two or more to support more than two monitors. Very tempting.

p.s. Another viable option for Nvidia users may be the PNY NVS Quadro 440 x1.

2011-05-14

Update on Elitebook Windows Event 17: HP HW Repair Process Insights

Update 19 May 2011 Laptop was repaired after 2 working days. The fan problem was corrected, but the  Event 17 errors have returned. The rep from the team handling repairs in progress insisted on dialing into the machine and reading the event log messages herself, then issued another RMA number. She indicated this would be "priority" (what is that?). Another box was shipped and the laptop has been returned to the depot.

Update 14 May 2011 As reported earlier, an HP Elitebook (c. 2010) was returned for repair after much investigation. The laptop was returned by overnight service from HP Depot repair with a new Nvidia 5100M graphics card. Impressively, a technician phoned form the depot facility to verify the nature of the complaints and to understand whether WLAN problems were also being experienced (Answer: Unclear). Unfortunately, the machine turns itself off a few minutes into the HP run-in test. Maybe fan or fan sensor disconnected or failed? The machine was returned to the depot the next day.

Internal Repair Processes Interestingly, HP hands off duties for repairs in progress to a different team. The chat channel used for triage is seemingly abandoned, though it can be assumed that transcripts are available for downstream technicians to read. Also, HP sends a new box for shipping rather than simply having customers print a new prepaid shipping label. Odd, considering that a day is lost while the box is shipped, and there's another box to be  disposed of (the packing material used is not recyclable).

2011-05-06

Gotcha! Different Notch Offset for Fully Buffered DIMMs (FB-DIMMs)

An aging Dell 2900 server needed more RAM to extend its life. The used marketplace for RAM seemed like the right place to go. The specification called for ECC buffered, 555 or 667mhz PC2-5300 DIMMs, deployed in pairs. Straightforward enough. Or so it seemed. After reading the Wikipedia entry on DIMMs to refresh my memory of standards in effect at the time that this machine was new, a pair of 4GB DIMMs was purchased.

What was immediately apparent, but which presales research had failed to discover, is that PC2-5300 FB-DIMMs have their single notch in a different location than otherwise identical PC2-DIMMs. To avoid this mistake in the future, refer to the photo above, which shows the offset position of the FB-DIMM notch.

When a PC is not Point-and-Shoot: WHEA-Logger EventID 17

When everything's working smoothly, a PC seems a simple device, like a toaster, or, to use a better example, like a digital camera. Turn it on, launch an application, do work, and turn it off. A smoothly working PC offers its own quietly efficient version of Point and Shoot. 

Beneath the covers, with both devices, but more so with today's PC, a lot needs to go right to produce the appearance of a carefree toaster-like experience. Some problems are comparatively common (if not frequent) but straightforward to diagnose and correct. Disk drives still fail, and for most, while SMART doesn't generally provide warning of pending failure, it's usually obvious when it happens and the corrective action is also obvious. But once in awhile, problems arise that are a reminder of how much complexity lies beneath that ever-thinner chassis.

An incident in this latter category has been occurring with an HP Elitebook 8540p laptop. Several times a minute, Windows 7 throws this exception to the event log:

Event 17, WHEA-Logger
Component: PCI Express Root Port
Error Source: Advanced Error Reporting (PCI Express)
Bus: Device:Function: 0x0:0x3:0x0
Vendor ID:Device ID: 0x8086:0xd138
Class Code: 0x30400

More details are provided by Windows in the full WMI-interfaced Windows Hardware Error Architecture (WHEA) message. There were several problems with the laptop that could have been related to this error, which was thrown at least several times a minute, and sometimes more often depending on what processing was occurring. These included infrequent blue screens, fairly frequent glitching in pro audio devices (e.g., Echo AudioFire Pre8 and Ableton Live, as well as various USB audio devices), and an unsatisfactory rating by the Echo-recommended DPC Latency Checker.

Over several months, HP support and I worked on this issue. New device drivers were tried, USB device drivers were removed and added, self-tests and diagnostics were run. Nothing turned up. Finally, the hated "reinstall Windows" suggestion reared its head. Since everything else had been tried, the partition was erased and Windows 7 x64 was reinstalled. Result? Even before all the HP utilities were reinstalled, the problem recurred.

The last HP technician suggested it could be an issue with the Nvidia NVS 5100M discrete graphics card. There are two related threads in forums that illustrate the scope and diligence of some users in trying to resolve this error. One is in NotebookReview (more related to Asus motherboard problems) and the other in an Intel Community forum (where most of the blame was laid at the feet of AMD ATI graphics cards.

The laptop is off to HP for a repair attempt. Stay tuned. In the meantime, should this post have appeared in TechnologyHead.com or in ErrorProcessing.com? Arguing in favor of the later, for instance, were several remarks by hardware technicians consulted by worried forum posters that "these problems didn't used to be reported anyway, and can easily be ignored." Arguing in favor of TechnologyHead was the aspect of complexity; after all, the error message was close to what passes for best practice these days in error reporting (if not recovery).