A+: Thorough Troubleshooting *

Virtually every subsystem in the computer has hardware, software, and firmware components. A thorough troubleshooting process will take into account both the subsystem and all of its components. The following steps are involved in the troubleshooting cycle:
Step 1. Back up customer data (if possible). Before you do anything to a customer’s system, you should ensure that the system’s data has been backed up. The easiest way to ensure that you can restore the system to its “as-was” configuration is to use a disk-imaging program such as Symantec Norton Ghost or Acronis True Image. The current versions of these programs perform disk-imaging to preserve the contents of the system drive (and other specified drives) at both a data and operating system level. However, if you need to restore specified files only, the current versions of these programs also permit file-level restoration. For speed and convenience, use an external hard disk connected to a USB 2.0, FireWire, or eSATA port as the destination for the image (note that eSATA ports might not be supported by some disk-imaging programs).
Step 2. Find the most likely cause. Based on the client interview and the information from the prior post, determine the subsystem that is the most likely cause of the problem.
Step 3. Record the current configuration of the subsystem. This includes items such as the driver version, BIOS settings, cable type and length, and hardware settings. Before you change anything, record the current configuration. Depending on the item, this might include recording jumper or DIP switch settings, printing the complete report from Windows Device Manager, recording BIOS configurations, and backing up the Windows Registry. If you perform an image backup as recommended in the previous step, the Windows Registry is included as part of the backup. If you don’t record the current configuration of the system’s hardware and software before you start the troubleshooting cycle, you will not be able to reset the system to its previous condition if your first change doesn’t solve the problem.
Step 4. Change one component or setting at a time. Change a single hardware component or hardware/software/firmware setting you suspect is the cause of the problem. If you replace hardware, use a replacement that you know to be working. No matter how concerned your client is and no matter how heavy your workload, change only one component before you retest the system. Examples of changing a single component or configuration setting include swapping a data or power cable, removing the device from Windows Device Manager, changing a device’s IRQ or other hardware resource setting, reinstalling a device’s driver software, and reinstalling or repairing an application. Performing two or more of these types of tasks before you retest the system can make matters worse, and if you fix the problem you won’t know which change was the correct change to make.
Step 5. Retest after a single change and evaluate the results.
Step 6. Reconfigure or reinstall. If the problem persists, reconfigure or reinstall the device or hardware/software/firmware setting to its original condition and repeat Steps 4 and 5 with another component in the same subsystem.
Step 7. Continue until all subsystem components have been tested. Repeat Steps 4–6 until the subsystem performs normally or until you have tested all components in the subsystem. If the problem stops occurring after a change, that item is the cause of the problem. Repair, replace, or reload it as appropriate to solve the problem.
Step 8. Move on to another subsystem. If changing all components or settings in a particular subsystem does not solve the problem, move on to another subsystem that you think might be the culprit. Choose from one of the subsystems in the prior post. You will find that some problems can be deceiving; they will appear to be caused by one subsystem when in reality they are caused by another.
There are a few other techniques to consider when troubleshooting, including which components to check first, common points of failure, the fact that a device is known to be working doesn’t necessarily mean it’s new, and to keep track of your solutions.


Comments