Troubleshooting Processes

Submitted by coleen.yan@edd… on Mon, 04/15/2024 - 16:53
Sub Topics
Explore
Explore Lessons 3, 8 and 9 of the CompTIA A+ resource.
An IT worker talking to a colleague on a phone

Often, computer technicians spend a large percentage of their time troubleshooting the various software and hardware components used in computers, networks and printers.

Before you can even begin to troubleshoot a physical problem with a piece of hardware or diagnose an error thrown up by a software program, you need to understand the basics of troubleshooting and some best practices used.

The most elaborate toolkit and expensive diagnostic software can be useless if you do not have a consistent plan for solving problems.

Even experienced technicians can sometimes overlook obvious problems or solutions. Troubleshooting can be extremely challenging, but if you follow common best practices and basic troubleshooting procedures, you will often be able to determine the specific cause of a problem, as well as possible solutions to the problem.

Troubleshooting basics

To some extent, being an effective troubleshooter simply involves having a detailed knowledge of:

  1. how something is supposed to work
  2. the sort of things that typically go wrong.

However, the more complex a system is, the less likely it is that this sort of information will be at hand, so it is important to develop general troubleshooting skills to confidently approach new and unexpected situations.

As troubleshooting is a process of problem-solving, it is important to realise that problems have causes, symptoms and consequences.

  • A computer system has a fault in the hard disk drive (cause).
  • Because the disk drive is faulty, the computer is displaying a 'blue screen' (symptom).
  • Because of the fault, the user cannot do any work (consequence).

From a business point of view, resolving the consequences of the problem is more important than solving the original cause and so the most effective solution might be to provide the user with another workstation and replace the drive.

It is also important to realise that the cause of a specific problem might be the symptom of a larger problem.

This is particularly true if the same problem recurs. For example, you might ask why the disk drive is faulty—is it a one-off error or are there problems in the environment, supply chain and so on?

Here are the steps in CompTIA's A+ troubleshooting model:

  1. Identify the problem
    • Question user and identify user changes to computer
    • Perform backups before making changes
    • Inquire about environmental or infrastructure changes
    • Review system and application logs
  2. Establish a theory of probable cause (question the obvious and if necessary conduct internal or external research based on symptoms)
  3. Test the theory to determine the cause
    • Once the theory is confirmed, determine the next steps to resolve the problem
    • If the theory is not confirmed, re-establish a new theory or escalate
  4. Establish a plan of action to resolve the problem and implement the solution
  5. Verify full system functionality, and if applicable, implement preventative measures
  6. Document findings, actions and outcomes

While applying this model, other key components of the troubleshooting process are your approach and attitude.

Troubleshooting process

The troubleshooting process has been broken down into 4 aspects. Each aspect and the explanation (included below) can be used to develop your own troubleshooting steps.

  1. Customer service and communication skills
  2. Open and closed questioning
  3. Developing a troubleshooting mindset
  4. Problem identification

Customer service and communication skills

Employers value 'soft skills', such as communicating effectively and using questioning, as highly as technical skills. Troubleshooting is one area where soft skills are vital, as a user may:

  • be upset or angry—perhaps they have lost work or cannot get an expensive, new computer to work
  • not be technically knowledgeable.

It is your job to calm the user and help them give you the information you need to diagnose and solve the problem. You need to ask questions that the user can answer simply, without having to know anything about the computer or its software and guide them through basic troubleshooting steps.

When speaking with a user:

  • be calm and polite
  • do not interrupt when the user is speaking
  • do not use technical language (jargon) or abbreviations that are likely to confuse them.

Open and closed questioning

The basis of drawing troubleshooting information from users is asking good questions.

Questions are commonly divided into 2 types:

  1. Open questions invite someone to explain in their own words.
    Examples:
    'What is the problem?'
    'What happens when you try to switch the computer on?'
    'Were you able to complete this task before, and if so, when did you notice there was an issue?'
    'What types of changes have you noticed since the last time you completed this task?'
  2. Closed questions invite a Yes/No answer or a fixed response.
    Example:
    'Can you see any text on the screen?'
    OR
    'What does the error message say?'

Open questions are good to start with as they help avoid making assumptions about what is wrong and encourage the user to give you all the information they are able to.

However, you should not trust the user's judgement completely. The user may be inexperienced or have formed a false impression of what is going wrong.

Try to establish factual information rather than asking for the user's opinion. Closed questions can be used to 'drill down' into the nature of the problem and guide a user toward giving you useful information.

Developing a troubleshooting mindset

Troubleshooting is not just the application of technical knowledge and expertise--remember that all-important approach.

  • Be calm: If you are calm, it instils confidence in the customer and prevents you from making rash decisions.
  • Take breaks: Troubleshooting requires a great deal of concentration. After working on the same task, the mind can become fatigued and concentration is reduced. Consider taking breaks or leaving the problem for a while to refresh your mind.
  • Challenge assumptions: A problem may be reported similar to one you have experienced before, but you should not assume that the problem is identical. Although the symptoms may be similar, the problem and its solution could be completely different. Always treat each problem as a new challenge. Be prepared to try something different. If you have decided what the problem is, but cannot seem to solve it, it may be that you are incorrect. Be prepared to start again from the beginning with an open mind.
  • Assess costs and impacts: Remember, you must account for corporate policies and evaluate the impact of a problem. A particular problem might be a stimulating challenge, but if resolving it is not the highest priority in terms of business needs, you need to give up on finding The Answer and use a shortcut to get to a solution. When assessing whether to repair a part, consider the cost of the part and the cost of your time to perform the repair. In many circumstances, replacement is the most effective option.
  • Know when to give up! You will not always be able to fix the problem yourself. Be prepared to pass the problem on rather than wasting the customer's time!

Problem identification

When troubleshooting, unless the problem is very simple, it is best to work methodically to ensure that you diagnose the correct problem and choose the best way to resolve it.

The traditional method for problem-solving is to find the cause of the problem and then seek to remove or resolve that cause.

Consider: If you switch your television on and the screen remains dark, you seek out the cause and remove it. Finding the actual cause may involve identifying several possible causes and then checking them out one by one to determine a probable cause.

If your television stops working, you could ask yourself:

  • Is the problem with the television?
  • Has the fuse blown?
  • Is there a problem at the broadcasting station rather than with my television?

With all problems, we run through a list of possibilities before making a decision. The trick is to do this methodically (so that possible causes are not overlooked) and efficiently (so that the problem can be solved quickly).

Troubleshooting is not just a problem-solving process, though. It requires effective decision-making. Sometimes there is no simple solution to a problem:

  • there may be several solutions and which is best might not be obvious
  • an apparent solution might solve the symptoms of the problem but not the cause
  • a solution might be impractical or too costly
  • a solution might be the cause of further problems, which could be even worse than the original problem.
A diagram showing 8 troubleshooting steps
Expanding the 8 troubleshooting steps 
  1. Perform backups
    Consider the importance of data stored on the local computer when performing troubleshooting or maintenance. Check when a backup was last made. If a backup has not been made, perform one before changing the system configuration, if at all possible. The simplest way of making a backup before troubleshooting is to use drive imaging software.
  2. Question the user
    The first report of a problem will typically come from a user or another technician, and they will be one of the best sources of information if you can ask the right questions. Before you begin examining settings in Windows or taking the PC apart, spend some time gathering as much information as you can about the problem. Ensure you ask the user to describe all of the circumstances.
    Good questions to ask include: 'What are the exact error messages appearing on the screen or coming from the speaker?' (Remember that the error could reveal a symptom, not a cause.)
    'Has anyone else experienced the same problem?'
    'How long has the problem been occurring?'
    'What changes have been made recently to the system? Were these changes initiated by you, or via another support request?'
    'Has anything been tried to solve the problem?'
  3. Sources of information
    Of course, you cannot always rely on the user to let you know everything that has happened. To diagnose a problem, you may also need to use the following techniques:
    • Make a physical inspection—look and listen. You may be able to see or hear a fault (scorched motherboard, 'sick-sounding' disk drive, no fan noise and so on).
    If the symptoms of the problem are no longer apparent, a basic technique is to reproduce the problem—i.e. repeat the exact circumstances that produced the failure or error.
    Some problems are intermittent, though, which means that they cannot be repeated reliably. Issues that are transitory or difficult to reproduce are often the hardest to troubleshoot.
    • Check system and application logs or diagnostic software for information.
    • Check the system documentation, such as installation or maintenance logs, for useful information.
    • Consult any other technicians that might have worked on the system recently or might be working at the moment on some related issue.
    Consider that environmental or infrastructure changes might have been instigated by a different group within the company. Perhaps you are responsible for application support and the network infrastructure group has made some changes without issuing proper notice.
  4. Researching the problem
    One of the most useful troubleshooting skills is being able to do research; to find information quickly. Learn to use web and database search tools so that you can locate information that is relevant and useful.
    Identify different knowledge sources available to you.
    Consider both internally available documentation and problem logs and external support resources, such as vendor support or forums.
  5. Re-establishing a new theory
    If your theory is not proven by the tests you make or the research you undertake, you must establish a new theory. If one does not suggest itself from what you have discovered so far, there may be more lengthy procedures you can use to diagnose a cause—remember to assess business needs before embarking on very lengthy and possibly disruptive tests. Is there a simpler workaround that you are overlooking? If a problem is particularly intractable, you can do the reverse of the above process and take the system down to its base configuration (the minimum needed to run). When (if) this is working, you can add peripherals and devices or software subsystems one-by-one, testing after each, until the problem is eventually located.
    This is time-consuming but may be necessary if nothing else is providing a solution.
  6. Problem escalation
    Be aware that you may not have all the answers all the time. Consider consulting your colleagues, internet discussion groups, or manufacturer help lines. This will not only help you to solve the problem more quickly or identify a better solution than you had considered yourself, but will also increase your knowledge and experience. If you cannot solve a problem yourself, it is better to escalate it than to waste a lot of time trying to come up with an answer.
    Formal escalation routes depend on the type of support service you are operating and the terms of any warranties or service contracts that apply. Some obvious escalation routes include:
    • senior staff, knowledge experts, subject matter experts, technical staff, developers, programmers, administrators within your company
    • suppliers and manufacturers—warranty and support contracts and helplines or web contact portals
    • other support contractors/consultants, websites and social media.
    Choosing whether to escalate a problem is complex as you have to balance the need to resolve a problem in a timely fashion with what might be additional costs and the burdens/priorities that senior staff members are already coping with.
    You should be guided by policies and practices in the company you work for. When you escalate a problem, make sure that what you have found out or attempted so far is documented.
    Failing that, describe the problem clearly to whoever is taking over or providing you with assistance.
    If you are completing troubleshooting steps under instruction from another technician—the vendor's support service, for instance—make sure you properly understand the steps you are being asked to take, especially if it requires disassembly of a component or reconfiguration of software that you are not familiar with.
  7. Solution implementation and testing
    If you have established and tested a theory of cause, it should be apparent what steps are required to resolve the issue.
    There are typically 3 solutions to an IT problem:
    • Repair—you need to determine whether the repair cost makes this the best option.
    • Replace—often more expensive and maybe time-consuming if a part is not available. There may also be an opportunity to upgrade the part or software.
    • Workaround—as any software developer will tell you, not all problems are critical.
    If neither repair nor replacement is cost-effective, it may be best to find a workaround or just document the issue and move on.
    You have to assess the cost and time required when you consider solutions. Another consideration involves the potential effects on the rest of the system that your plan of action may have and whether you have the authorisation to proceed. A typical example is applying a software patch, which might fix a given problem but cause other programs not to work. This is where an effective change and configuration management system comes into play. It should help you understand how different systems are interconnected and cause you to seek the proper authorisation for your plan.
  8. Implement the solution
    If you do not have authorisation to implement a solution, you will need to escalate the problem to more senior personnel. If applying the solution is disruptive to the wider network or business, you also need to consider the most appropriate time to schedule the reconfiguration work and plan how to notify other network users.
    When you make a change to the system as part of implementing a solution, test after each change. If the change does not fix the problem, reverse it and then try something else.
    If you make a series of changes without recording what you have done, you could find yourself in a tricky position.

View the following video for more troubleshooting tips.

Troubleshooting flow charts provide a visual representation of the steps required to solve a problem. Troubleshooting must be conducted in a systematic way, and the flow chart breaks the process down into a sequence of decisions. At each step, the user is presented with a yes/no question. Answering yes or no leads them to another question, a resolution or a return to an earlier step. An example is shown below.

A flow chart showing how to isolate a printer problem

Activity

Select a common IT issue from the topics below and create a troubleshooting flowchart you would utilise:

  • Software problems
  • Hardware failures
  • Connectivity issues
  • Security breaches or virus infections
  • Data backup and recovery issue
A designer looking at a laptop to see if everything is functioning correctly

POST (power-on self-test) is the testing diagnostic sequence performed to determine if the hardware is working as expected. Once it has determined that all hardware is working correctly, it will go on to the next stage which will load the operating system.

The main duties of POST are to:

  • check and verify CPU registers
  • validate basic input and output integrity
  • check peripherals
  • confirm main memory 
  • identify and organise devices available for booting.
  •  

Watch the video below to learn more about:

  1. what POST is
  2. what beep codes are
  3. how to identify errors in the boot process
  4. what a POST card is.

General process for POST:

  1. The POST starts by locating video card firmware at the address C000 in memory. If found, the video card is initialised from its own firmware. Information from the card manufacturer may also be displayed at this point.
  2. A start-up screen is displayed. More tests on the system, including counting through system RAM, are performed. If any errors are found, a text error message is displayed. Explanations of these messages are usually found in the system guide. Once numeric codes, these messages now tend to be descriptive, such as 'key stuck'.
  3. You should be able to access the system setup routine from this point. This allows you to reconfigure the settings stored in the system setup. The key used to invoke system setup varies according to the firmware, but is usually Delete, F2, Esc, F10 or F1.
  4. Some PCs indicate that system checks have been successfully completed at this point with a single short beep, but the trend for modern computers is to boot silently.
  5. A search is made for further interfaces that may have firmware chips on them. This could include storage adapters and network cards. Further information about these cards may be displayed at this point and their memory addresses reserved.
  6. The firmware may display a summary screen about the system configuration. This may scroll by quite quickly. Use the Pause key if you want to analyse it.
  7. The operating system load sequence starts. If everything is okay, booting starts! If there is a problem POST throws a beep sound, using specific beep codes to specify the problem.
    You must decode the pattern of beeps and take the appropriate action. Use resources such as the manufacturer's website to determine the meaning of the beep code.

An example below details the codes for an original IBM PC.

Code Meaning
One short beep Normal POST – system is OK.
Two short beebs POST error – error code shown on screen.
No beep Power supply or motherboard problem (use multimeter to check the onboard speaker is functioning).
Continuous beep Power supply, motherboard,or system memory problem.
Repeating short beep Power supply, motherboard, or keyboard problem.
One long, one short beep Motherboard problem.
One long, two or three short beeps Display adapter error.
Three long beeps 3270 keyboard card.

POST not running

If power is present but the computer does not start or the screen is blank and there are no beeps from the speaker, it is likely that the POST procedure is not executing.

If the screen is blank, check that the monitor cable is connected and undamaged and that the monitor is powered on. If the monitor has separate inputs (e.g. HDMI and DVI), make sure it is switched to the correct one. If possible, test with another monitor to confirm that there is no problem with the display. If you can rule out a problem with the display itself, other likely causes are faulty cabling or a damaged or mis-seated CPU or other motherboard component.

To troubleshoot, try the following tests and solutions.

  • Ask what has changed—if the system firmware has been flashed and the PC has not booted since, the system firmware update may have failed. Use the reset procedure.
  • Check cabling and connections, especially if maintenance work has just been performed on the PC. An incorrectly oriented storage adapter cable or a badly seated adapter card can stop the POST from running. Correct any errors, reset adapter cards, and then reboot the PC.
  • Check for faulty interfaces and devices—it is possible that a faulty adapter card or device is halting the POST. Try removing one device at a time to see if this solves the problem (or remove all non-essential devices then add them back one-by-one).
  • Check the PSU—even though the fans are receiving power, there may be a fault that is preventing the Power Good signal from being sent to the CPU, preventing POST.
  • Check for a faulty CPU or system firmware. If possible, replace the CPU chip with a known good one or update the system firmware.
  • Some motherboards have 'jumpers' to configure modes (such as firmware recovery) or processor settings. If the jumpers are set incorrectly, it could cause the computer not to boot. If a computer will not work after being serviced, check that the jumpers have not been changed.

Operating system search/boots to incorrect device

Once the POST tests are complete, the firmware searches the devices as specified in the boot sequence. If the first drive in the sequence is not found, it then moves on to the next.

For example:

  • If there is no fixed disk > the boot sequence checks for a USB attached drive.
  • If no disk-based boot device is found > the system might attempt to boot from the network.
  • If no boot device is found > the system displays an error message and halts the boot process.

If the system attempts to boot to an incorrect device, check that the removable drives do not contain media that are interfering with the boot process and that the boot device order is correctly configured.

For more information on POST, check out the following video below.

An IT person checking diagnostic information on several computers

'Software diagnostics' refers to programs that are used to identify the causes of problems in computers. Diagnostic tools are categorised into 2 groups: software and hardware.

These programs test the onboard systems for issues and help to alert users of potential problems and breakdowns.

  • Hardware diagnostic tools: Physical devices that are designed to determine the conditions of various computer hardware components.
  • Software diagnostic tools: Computer programs that are used, either in or outside of a system, to determine the causes of software and hardware malfunctions within a computer.

Software diagnostic tools can be used to test and regulate various components of a computer system. Some, for example, will monitor temperature; whilst others check Wi-fi speed, network and RAM as well as monitor any other devices connected to your computer.

Once these tools are able to identify the problems that may be causing a disruption to your computer, then the repair process can begin.

Windows performance monitor

An example of a software diagnostic tool is the Windows performance monitor (see below) which is a built-in program within the operating system. This software program monitors the CPU, fan speed, voltage, memory disk and ethernet information.

The following memory usage is displayed:

  • Physical Memory—usage of system RAM (not including the pagefile)
  • Kernel Memory—physical and paged memory used by Windows core files
  • System summary—showing handles, threads and processes (software objects being managed by the CPU), system uptime and the commit charge (overall memory usage, including physical memory and pagefile).

High peak values are nothing to worry about, but consistently high utilisation means that you should consider adding more resources to the system (or run fewer processes!).

CPU and physical memory obviously require physical upgrades. Windows will normally change the pagefile dynamically if it is running out of space. If it has been set manually, you should increase it using the Performance Settings button on the Advanced page of System Properties.

Note: Monitoring the temperature is an important aspect as it can impact the overall performance of your computer. Heat damage can create many problems for your hardware components, including the motherboard and hard disks.

Software diagnostics for Wi-fi networks

Another example includes software diagnostics for Wi-fi networks. The majority of users will need Wi-fi or ethernet to conduct everyday personal and business activities, so there's a chance that every once in a while, you will run into a problem. A software tool will be crucial to support, identify and rectify any Wi-fi issues relating to your computer.

Wifi Analyzer is another good example of a software diagnostic tool that helps monitor your Wi-fi network setup and whether there are surrounding networks disrupting your connection. With this information, you are able to identify any issues and potentially boost your Wi-fi speed.

Software Diagnostics for Network Monitoring

Routers, servers, CPU and switches also need to be monitored to ensure the computer is running efficiently.

Many software programs support this--one example is PRTG network monitoring. PRTG focuses on providing a complete overview of loads and parameters of CPU, RAM load, disk usage and network traffic. This will help prevent server overloads, increase productivity and decrease the risk of network failures.

Software diagnostics are crucial to the running of your computer as they help identify and pinpoint specific issues. It is normal for computers to malfunction, slow down and demonstrate other issues that may be hard to identify by physical checks.

Module Linking
Main Topic Image
A gaming computer on a desk bathed in various neon colours
Is Study Guide?
Off
Is Assessment Consultation?
Off