Make your own free website on Tripod.com

MSM2 Lecture #9 - Troubleshooting a "Dead" PC

Materials:
Working complete PC
Several PCs prepared by the instructor to fail booting up
Student bootable floppy diskette - "New Boot A Version 2"
Student bootable CD-ROM
Objectives:
The student should become familiar with:
The basic troubleshooting tree,
Entry points to the troubleshooting tree,
Typical error messages and their meanings including,
All BIOS error messages and error codes.
Competency:
The student will be able to progress through the basic troubleshooting tree including where to start the tree based on the given situation. The student will be able to recognize the basic BIOS messages and codes and be able to successfully diagnose the problem with the system.
Procedures
  1. Lecture module #8 discusses the procedures and the common error messages and their causes in "Troubleshooting Start Up" which is one of the most common problems that can get the user to call the PC technician. However, troubleshooting start up is relatively easy because the system issues error messages combined with behaviors that indicate exactly where the problem lies. The problem with fixing a PC is finding the problem. Once found it is a relatively simple matter to fix it; replace the part or replace the corrupted file or sector data.

  2. The second most common problem aside from simple buggy Windows issues (corrupted files and/or programs and drivers that won't behave in each other's company) is the "dead" PC. This term deserves a formal definition:

    "dead" PC: Absolutely no output to the screen.

  3. The problem remember, is to find the problem. The "dead" PC is by far the most difficult PC to work with because it is not displaying the error messages leading you straight to the problem. The "dead" PC troubleshooting tree presented here is a work in progress.

  4. Once you have a fully booted the PC. Shut it down open the case and begin observing proper ESD procedures. Carefully detach the monitor cable from the video card port and then remove the video card. Set the card aside on the work table top and turn on the PC.

  5. You will notice (if the speaker is attached and functional) a series of unequal length beeps and obviously nothing on screen. This is the typical BIOS error message indicating that the POST failed to locate a functioning video controller. Replace the video card and reattach the monitor cable to it. The technician should have a book that lists a variety of different BIOS beep codes as an onsite/workbench reference manual. One of the better ones is the "Pocket PCRef" by Thomas Glover and Millie Young.

  6. Now carefully remove all system RAM modules and set them aside. Reboot the PC. Different systems respond differently to this problem. In the total absence of working memory the system can either produce no beeps at all, or a continuous series of even length beeps. Some system BIOS'es use the continuous series of even beeps to indicate video controller failure as well as memory failure. Some systems can indicate a bad memory module if the lowest 64KB of memory are still functional by displaying a standard BIOS error code on screen. Some systems have this memory built into the motherboard and can display the error in a total absence of memory as well. It will be a 200 level error meaning that it will be 2xx. Often a subcode is displayed as well which will give the error the form 2xxyy where the xx is the major error code and the yy is the minor error subcode. BIOSes can even indicate the memory address where the error occurred if a majority of the RAM is actually working which would take the form: 2xxyy ssss:oooo where ssss = the memory segment and oooo = the memory offset where the error occurred. Replace the RAM modules.

  7. Due to the possible damage that can be caused by attaching the IDE data cable backwards we will NOT perform this experiment but be aware that the system can act even more "dead" under this condition than if the video card or RAM modules were bad or missing entirely.

  8. We have seen examples of the most difficult problems to diagnose in the PC: the "dead" PC troubleshooting. Here "dead" simply means that there is no output to the screen. Without the ability to display an error message telling the technician what is wrong, the technician must proceed through the troubleshooting "tree" starting with the most obvious and easily checked causes and proceeding further into the system trying to eliminate each possible cause of the problem.

  9. When working with a "dead" PC the technician needs to start by making a general assessment of the nature of the problem. A "dead" PC falls into basically 3 categories: "No LEDs, no sounds", "Some LEDs, some sounds", "Full LEDs, full sounds". In the first category, power is the primary suspicion here from the wall to the motherboard. In the third category, the system appears to be booting normally so the suspicion is the path of the video signal from the video controller to the display. The second category has the most possible causes and is the most difficult to troubleshoot. Here is the basic "dead" PC troubleshooting hierarchy:

    EXTERNAL SECTION
    1. Observation. 
      A. Front panel LED's?
        i. Yes: it is getting power
        ii. No: possibly not receiving power
      B. Front panel LED's of the monitor?
        i. Yes: it is getting power
        ii. No: it is definitely not getting power: Fix this - DONE?
      C. Sounds (fans, drives)?
        i. Yes: it is getting power
        ii. No: possibly not getting power
    2. Check the power supply fan.
      A. Yes: it is getting power
      B. No: Possibly...
        i. not getting power: Check this - DONE?
        ii. power supply fan has failed: Replace power supply - DONE?
    3. Check Power Cable to System and Monitor
      A. Plugged into system/monitor and outlet?
        i. Yes: go to next check
        ii. No: Correct this - DONE?
      B. Plugged into a powerline appliance?
        i. No: go to next check
        ii. Yes: plug PC/display straight to wall outlet - DONE?
      C. Wall Outlets working and wired correctly?
        i. Yes: go to next check
        ii. No: Correct this - DONE?
    4. Check Video Cable: Connected securely to video controller output
    and the display?
      A. Yes: go to next check
      B. No: Correct this - DONE?
    5. Monitor functioning?
      A. Yes: go to next check
      B. No: Replace - DONE?
    INTERNAL SECTION
    6. Processor fan(s) functioning?
      A. Yes: go to next check
      B. No: replace fan (possibly CPU and motherboard as well!) - DONE?
    7. Internal case fan(s) functioning?
      A. Yes: go to next check
      B. No: correct this, but heat damage to other systems...NOT DONE
    8. General inspection of interior
      A. Poor cooling (absence of additional case fans)?
        i. No: go to next check
        ii. Yes: Correct this, but heat damage to other systems...NOT DONE
      B. Heavy dirt/dust?
        i. No: go to next check
        ii. Yes: Correct this - DONE?
      C. Chip creep (socketed DIP chips raised in the sockets)?
        i. No: go to next check
        ii. Yes: Press them back down firmly - DONE?
      D. Loose, broken, missing, or backwards cables, cards, connectors, screws, etc.
        i. No: go to next check
        ii. Yes: Correct this - DONE?
    8. Strip down test - Leave keyboard, video card, RAM. Remove all other external and
       internal periplerals including data cable attachments to motherboard. Turn on...
      A. Displays POST screens: replace all other parts one-by-one until the problem part is
         found - DONE?
      B. Still "dead": Leave it stripped down and continue to next check
    9. Video controller functioning (replace with a KGC - Known Good Component)?
      A. Yes: go to next check
      B. No: Replace - DONE?
    10. Memory functioning (replace with KGC)?
      A. Yes: go to next check
      B. No: Replace - DONE?
    11. Power supply functioning (test with meter, or better to replace with KGC)?
      A. Yes: go to next check
      B. No: Replace - DONE?
    12. Motherboard and CPU functioning (difficult to test, try POST CARD)?
      A. POST card identifies low level POST error: Replace motherboard/CPU - DONE
      B. POST card does not initialize (stays at 00): either bad BIOS EEPROM, CPU,
    motherboard. Replace in that order - DONE
    
  10. When a PC is acting "dead" it is important to remember some of the potential causes of this behavior including but certainly not limited to a failed processor fan. These days AMD Athlons can reach a temperature of almost 700° F in seconds after the fan fails. This is sufficient to not only devastate the processor but the socket and the motherboard circuitry below it as well. As a result PC's should be tested for very short periods of time, not because of the possibility of saving a processor whose cooling fan has failed, but for the possibility of saving other components from damage.

  11. The first steps that should be taken are to note front panel activity. Does the power LED come on? Do the keyboard LED’s flash? Does the HDD LED flicker? Does the floppy drive LED flash with or without performing a simple head seek? All of these are indicators as to where to continue on into the troubleshooting tree.

  12. Since checking the power supply fan is so easy this should always be done while carefully observing the front panel for activity. Check it by reaching around and feeling for it. In the case of depressurizing systems (AT power supplies blow air out of the case) and if necessary by shining a flashlight through the rear vents of the power supply in the case of pressurizing systems (ATX power supplies pull air into the case) and are difficult to feel by hand if they are working.

  13. Of course the front panel LED's could be diconnected, but this will be determined later if the system has to be opened. Until then assume that they and the system speaker are working. The major breakdown of "first glance" front panel activity:

    IndicatorActiveInactive
    Power LEDSystem is getting powerNot getting power (should be no activity at all)
    Keyboard LEDsEarly POSTNot reaching early POST
    FDD LEDLater POST/Boot SequenceNot reaching late POST or boot strap loader
    HDD LEDLater POST/Boot SequenceNot reaching late POST or boot strap loader

  14. Based on the first indicator, the Power LED, you can decide whether to check the AC power cord connection to the system. If there is no activity whatsoever, you should immediately suspect AC power is not reaching the system at all. If there is any LED/drive activity then AC power is getting to the system (although it may not be good power). If the power LED is on, but the keyboard LEDs do not flash, then the BIOS is not getting past the motherboard in the POST sequence and the trouble could be on the motherboard. If they do flicker then some portion of the POST is succeeding. If the power LED is on, and the keyboard LED's flicker and the FDD LED does NOT, then the POST is failing late and the Boot sequence is never reached. The same can be said for the FDD head seek since they usually accompany each other (LED and head seek activity). The asterisk on these entries indicates that in some systems the FDD may be intentionally disabled and removed from the boot sequence in some systems for security reasons, so you cannot always assume the worst because there is no FDD activity though the FDD is usually completely removed from such systems. If there is Power LED and keyboard activity but no HDD LED activity then the strongest suspicions are that there is a late POST failure, the drives are connected improperly, or the power supply is either not providing steady voltages and/or is too weak for the wattage it must generate. The main thing we determine at this stage is whether to check the AC power cord or not.

  15. While obvious, the external cabling should always be checked especially if there is no sign of life at all on the front panel. It is easy to forget to plug in the power cord to a system after moving it or working on it and it is also the easiest to fix. If there is activity at the front panel such as the power LED and other LEDs or even drive activity then you know that the system is plugged in. Here is the monitor check sub-branch of the tree:

    ConditionSolution
    Monitor AC PowerGet power to the monitor
    Monitor VGA/SVGA CableProperly attach the cable
    Monitor SettingsAdjust the settings
    Monitor MalfunctioningSwap with a KGC

    Check the monitor for: be sure it is switched on, an LED indicator is on, its AC cord is plugged in, check its settings by trying brightness and contrast at both ends of the controls settings.

  16. Working with the AC power to the system:

    ConditionSolution
    AC power cord unpluggedReattach it
    Power cord attaches to powerline applianceBypass the appliance
    Wall outlet not providing powerTest with an outlet checker, plug unit in elsewhere.

    Determine if either the monitor or the system unit is connected to a power line appliance such as a power strip, surge protector, line conditioner, or uninterruptible power supply. If this is the case: remove the system and/or monitor connections to the device and attach them directly to the wall outlet. This eliminates a faulty $5 power strip from the situation and concludes the technician's visit if it was the problem. Since a wall outlet tester is so inexpensive and easily used, at this point test the outlet to be sure that it is wired correctly and providing good AC current.

  17. Other activity to monitor before opening the case:

    ConditionMeaning
    Single speaker beepUsually means a successful POST
    Speaker beeps twiceUsually means failed the POST
    Complex beep codeUse reference manual to determine the POST failure
    Continuous steady beepsUsually means failed memory or video controller
    No speaker beepsUsually means failed early POST or speaker not working
    No drive activity soundsNot reaching late POST or power supply failure
    Power supply fan not workingReplace the power supply (not just the fan)

  18. At this point the technician has exhausted all possible problems that could be resolved without having to open the case. This is an important point. You must work from the outside to the inside. You will look foolish tearing apart the entire system before realizing that the surge protector was not plugged in. At this point you must prepare to open the case. If the PC is in a poor working environment such as poor lighting or cramped quarters or insufficient desktop space then it should be moved to another location. If the other location does not have a monitor then you must move that to the better work location as well.

  19. Be sure that the system is completely connected and set up in the new location and try it once more briefly. If it works, then take it back to its original location and try it again. If it now works then you can suspect the following:

    ProblemDeterminationSolution
    Loose components,cables
    connectors,screw,etc.
    Visual and tactile
    inspection
    Unseat then firmly reseat connectors/cards
    Poor AC sourceUse Outlet TesterTry another outlet or location
    OverheatingCan be difficult to determineReplace a heatsink with a fan, add case fans if absent
    Excessive DustVisual inspectionClean using computer grade products and equipment
    EMI/RFIDifficult to detectAttach a Line conditioner
    Chip "creep"Often difficult to seePress down firmly and evenly on all socket mounted DIP IC's

  20. With the PC off but plugged in (did you check the new outlet?) remove the case cover and immediately check for everything listed above. This is a visual inspection of the general appearance of the internal layout of the system. Is it clean? Does it seem warm inside? Does everything appear tightly attached? You should also immediately determine now that the case is off if the system is AT, ATX or if the power supply is proprietary. You should be able to recognize a standard AT power connection to the motherboard and know what signals to expect in each wire. You should also know the standard ATX power connection to the motherboard and what signal each wire carries or at least have diagrams in a professional technician's (yours!) notebook to compare the system to. Some name brand PC's have proprietary power supplies. You must determine if the system is AT-like or if it is ATX-like before touching anything in the system.

  21. Begin practicing proper ESD procedures which differs between AT and ATX systems. ESD procedures. Check by hand for loose cable connections or expansion cards. Remove each one at a time, document it and reattach it. Check for DIP chips that have "creeped" up in the socket by pressing down on each one firmly. They will make crunching sounds as they reseat. They should only be removed if it has been determined that they are bad and are being replaced.

  22. At this point during the internal inspection be sure to check cables not only for snug straight connections but also that they are not accidentally reversed. With ESD procedures in effect you may remove cables and check for bent pins and inspect each connector for the location of pin #1 and verify that the cable was attached correctly. If any IDE/ATAPI drive cable was reversed this is a primary cause of a "dead" PC especially if a new IDE/ATAPI device such as a burner or CD-ROM/DVD-ROM drive has recently been installed. After checking all cable connections be sure to detach and reattach all power cable connectors in the system as well.

  23. Whether any of the miscellaneous problems has been uncovered or not at this point we will need to try to turn the system on. If suspicious miscellaneous troubles have been uncovered perform the corrective procedure. Now if it is an ATX system be sure to unplug it (turn the system physical switch off), reattach the power supply to motherboard connector, then plug it back in (turn the physical switch back on) then try to turn the PC on again.

  24. This time keep your eyes on the processor fan(s) and the power supply/case fan(s). Now that the unit is much closer, listen for any unusual sounds from fans or drives. Here is the internal fan hierarchy:

    ProblemDeterminationSolution
    CPU Fan(s)Watch for activity, listen for noises.If bad replace the fan and be prepared to replace a burnt CPU and motherboard as well
    Power Supply Fan(s)Watch for activity, listen for noisesIf bad replace the entire power supply not just the fan
    Case Fan(s)Watch for activity, listen for noises.If absent, install at least one, if bad replace the fan(s)

  25. At this point all fairly obvious internal possible problems have been addressed and the less overt problems must be sought out. If the machine has more than one expansion card the problems could stem from resource conflicts. Up to now you have kept the unit fairly together. If you do a removal of a component you must remember exactly where it was installed in order to put it back correctly. Remember that PCI auto configuration in the BIOS depends on slot locations and you can cause any of the PnP operating systems to loose configuration of drivers and literally have to "discover" new devices when you get the PC running again. This can cause major problems at the operating system/drivers and software usage level. Document and diagram the system since you may have to perform a complete disassembly – never guess with someone else's property.

  26. The first thing you should try now is to remove all IDE/ATAPI data ribbon cables and attempt to boot the PC. If it works, double check the Pin #1 alignment of the cable connector on the motherboard. Then reconnect each device separately and try to boot the PC. If the machine boots with each one alone then try:

    1) Attach one device to each controller and try to boot. Switch one device choice at a time but one per cable in pairs. If failures occur at this point then the most likely cause is current draw on the power supply. Set up a digital volt meter to monitor the output of an unused power connector. Test the voltage during a successful scenario, then during a failure configuration. Even if the DVM does not confirm your suspicions directly, this is the most likely cause.

    2) If the system passes individual device/IDE controller combinations now try the pair of devices on a single channel at a time. If a failure occurs at this point, then the two devices are conflicting. Check the OEM manufaturer’s website for jumper setting details. If the system still completely fails, move to the next branch.

  27. If the video controller is integrated into the motherboard you must determine how to disable it so that you can try to boot the PC with a different video controller. If the motherboard manual is not available at this point you must get it in order to continue testing the system. In lieu of this you can purchase a POST card diagnostic tool. This card available in ISA, PCI, Parallel port and Dual ISA/PCI construction attaches to an open expansion slot (the parallel port versions generally only work on IBM name brand units unfortunately) and display the POST code that was last output by the POST program. This indicates either the last successful subsystem tested or the current subsystem that has failed. With this information you can make an accurate determination of what to check or replace directly. These cards are now less than $100 (they used to be several thousands each) and are a highly recommended tool. If the video controller is on an expansion slot (AGP, PCI, or ISA) remove it and try another controller. You should always carry a simple jumperless ISA video card and a simple PCI video card in your "big" tool box.

  28. If the system still fails with the "known good" video card and there are other expansion cards in the system, remove all other cards (record their locations) and try again. If the system still fails then it is safe to assume that the video controller may not be the problem.

  29. Replace the RAM modules. This is getting harder than it used to be. Eight years ago you had to carry 4 x 8 bit SIMMs, 4 x 9 bit SIMMs, 2 x 32 bit SIMMs, 2 x 36 bit SIMMs at the very least. Now you should carry 2 x 32 bit SIMMs, 1 x PC 66/100 SDRAM, 1 x PC 133 SDRAM, 2 x PC1600 DDR, 2 x PC2100 DDR, 2 x PC2700 DDR, 2 x DDR2-400, 2 x DDR2-533, 2 x DDR2-800, 2 x DDR2-1066 and 2 x RAMBUS module and 4 blanks! (RAMBUS will not work if a slot is not occupied to close the circuit and the banks usually have 3 slots, the modules and blanks are needed to fulfil a dual channel RAMBus system). Be sure to determine the exact RAM type necessary before attempting to boot. If the system still fails then we are down to the most difficult component to deal with.

  30. You should already have determined the exact type of power supply the system is using. If you are sure that the power supply is exactly equal to a generic AT or ATX you may replace it with one and try to boot. This requires that you read the manual for the system, visit the website and make a phone call to their technical support staff if necessary. Replacing a proprietary unit with a generic will assuredly destroy the motherboard and ruin the entire system when the problem might have only been a $50 power supply that most companies will gladly overnight ship. If the system still fails with a "known good" power supply we are down to the two worst components.

  31. In the case of a processor, you must first determine if the motherboard supports autodetection and configuration of the processor. If it does not (P5 generation and older) then you will have to read the motherboard manual carefully to determine how to set the jumpers in order to install another test processor. Most later P6/P7 generation systems support CPU autodetection in the early phase of the boot process and you can safely install another CPU as long as the motherboard is jumpered for autodetection. Be very sure not to attempt a boot from a "known good" processor for more than a few seconds. Even an old system will get to the video card splash screen within seconds. You should have all data cables removed and all cards out except your own video card at this point to remove unnecessary complications from the system. You don’t need it to boot up, just show signs of life again. (You should note that it is far easier to test modern systems with a POST card than by trying to install a known good CPU.)

  32. If the system still fails with your "known good" processor then by process of elimination the problem lies on the motherboard. Without a POST card you can only assert this by virtue of the fact that you have either eliminated or replaced all other subsystems with "known good" units and the system still fails. The motherboard is a FRU (Field Replaceable Unit) and at this point the customer is basically buying a whole new computer on which the OS would have to be reinstalled. If they have important data on the hard drive, be sure to lend them another hard drive to install the OS onto so you can drag and drop folders onto the new installation. In the case of OEM systems their original Restore CD will not work because the system has a different chipset and possibly even a new processor (they might as well upgrade especially if equal replacements are no longer available). You cannot pirate the OS or the software. The customer will have to get new copies from the OEMs but more likely the customer will have to get the replacement component from the OEM as well.

  33. A complete tool kit with a variety of standard KGC - Known Good Components is a must along with a good reference manual that has a comprehensive collection of BIOS error codes. The third and possibly most important diagnostic tool in this arsenal is a BIOS POST Code diagnostic card. These used to be extremely expensive but can be found for as little as $21. A POST card will reduce the "hit and miss" diagnostic efforts outlined above to minutes. Simply install the card, turn the system on and watch for the last code displayed on the card's built in digital display. Look up the code in the manual that came with it or even in other resource manuals or on line to determine what the cause of the problem is. The POST card in the lab has quickly identified power supply to motherboard connectors loose or skewed on many occasions in which the "hit and miss" diagnosis had turned up nothing.

  34. If the POST card stays at 00, then the CPU is not doing anything; a strong indicator that it is damaged and should be replaced. With most systems now using LGA775, a $30 Celeron can be a simple and effective KGC to carry around and try in "Dead" PC's in which you suspect a bad CPU.

Review Questions
  1. List all checks that should be performed prior to opening the system unit:








  2. List all checks that should be performed concerning the AC power source to the computer system unit:








  3. List all checks that should be performed concerning the monitor:








  4. What is the first component that should be checked once the system unit is openned?


  5. What three internal components in particular (listed in the main troubleshooting outline by name) are responsible for most "dead" PC conditions?


  6. List the four general interior inspections:











  7. What is the easiest way to test the video card and RAM modules?


  8. Despite having a very sensitive Digital Voltmeter, what is the probably the best way to test the power supply?


  9. If everything else has been eliminated then the "dead" condition is being caused by one of what three components?








  10. What is the easiest way to verify which one has failed?


  11. You replace a suspect CPU on an LGA775 based motherboard with a never before opened Celeron, the POST card still reads "00" What component must be causing this?


Copyright©2000-2004 Brian Robinson ALL RIGHTS RESERVED