Windows Troubleshooting Series - Part 1 - Basic Troubleshooting
This is the first post of a series of posts on Windows Troubleshooting.
The idea came after I’ve spent a few weeks troubleshooting a very weird issue with the Windows Spooler, starting from the basic troubleshooting steps we all to through and ending up doing reverse engineering of the service itself and while doing that also going through many videos and books concerning this topic.
Before we start a small disclaimer: while the steps and the tools indicated in this blog series should serve as useful guidelines the actual path you might take on a troubleshooting session might vary quite a lot depending on your experience or the way you approach an issue. And it heavily depends on your experience in the field and knowledge of how things actually work in the background. As you do this more and more and gain experience, you’ll find yourself doing leaps of logic that jump from A to J to C to Z without any reason, but which still bring you to the solution.
So don’t take this as gospel and train your troubleshooting “muscle” through actually solving issues and looking how others have done so. Try to learn from mentors if you have any (though sadly that’s very rare in this field and something I greatly appreciated from my time as an electrician).
The first and most important is always the same: What’s the actual issue? Take the time to clearly understand and define the issue that is currently being faced. Users tend to be very vague in the way they define their problems(“it doesn’t work”) and often once you take the time to actually reproduce it you realize that it’s either non-existing (ex: application has been updated and is showing a new menu, or a menu that was there has been moved away), user error (ex: user is not typing their password correctly or not following procedure on how to use a certain application) or trivial (ex: Chrome notifications are being used by malicious websites to tell the user that their computer infected and they should click on the notification to download a “virus cleaner”). Check also if the issue is limited to only one machine or whether it’s impacting more machines, and whether you can find common points between the impacted machines.
Once you have a clear idea of what the issue is it’s time to start searching: is there an error? If yes, read the error. What does it say? Does it clearly state what the issue is and how to solve it?
Try restarting the application and/or the Operative System and see if it solves the issue (“Have you tried turning it off and on again?”)
If it doesn’t work, search the error on google. Chances are one of the first results are going to be your solution.
Warning: do not blindly apply a solution you found on the internet, especially if it’s a registry change or a powershell/bat/vbs script that needs to be run. Take the time to understand what the solution does and what the impact could be. When it concerns the registry DO NOT delete a key, always rename it! And if possible take a backup before doing those changes.
If googling the error didn’t help, try clearly defining the issue and googling the issue. You can either go with a descriptive description (“Outlook shows error when starting”) or with the old style of searching (“”outlook” “error” “startup””).
Note: wrapping a certain word in double quotes forces the search engine to provide you results which have that word or phrase in the text of the result. It’s very useful when searching for specific errors like 0x0000011B. You can find additional advanced search engine parameters here: https://moz.com/learn/seo/search-operators
Now, neither of those has worked. Time to delve a little deeper. Tools like Sysinternals Suite or WinDBG will be covered in the next posts, but you can do a few more things:
-
Does the application or the OS write logs to the Event Log? Take the time to check the event logs, both the “Windows Logs” tree and the “Applications and Services Logs” tree to see if additional information is present there. You can also enable Debug logs in Event Viewer, but I’ve found it of limited use unless WMI debugging is needed.
-
Does the application write log to a specific location? Check and see if additional information or errors is present in those logs.
-
Run the standard DISM /Online /Cleanup-Image /RestoreHealth and sfc /scannow that are always recommended on the Microsoft Support forums. I’ve personally found them of limited utility, but they have worked a couple of times.
-
Run the Windows Troubleshooters. The networking one especially has been surprisingly useful in the past to identify networking issues, though as usual they tend to be less useful when the issue is more insidious.
-
Is the application/operative system up to date? Sometimes the application or the operative system is outdated in Enterprise Environments so if on the server side it expects a certain version of the client it will throw an error on startup or some functions will not work correctly until the client-side software is updated. So, if your policy allows it, try updating the application to the latest version. Operative systems should be kept up to date as well, both for security reasons and due to the constant rate of change on the Domain Controllers lately, changes which expect the OS to be up to date. Be very careful doing that though, check up with more Senior people in case that could pose an issue.
-
Are the drivers up to date? Or have they been updated via Windows Update? Sometimes after updating the Operative System the system can become unstable if drivers are not updated as well, and Windows Update usually tends to install older versions than the ones on the Manufacturer’s website.
And that’s it for basic troubleshooting steps that come to mind.
I’ve surely missed a few so don’t hesitate to comment on the post with additional recommendations.
Now for additional reading content. As this concerns the initial steps I won’t be recommending books, courses or tools, but rather practice.
Take a look at the following communities, read through the posts, see how people solved issues and try to help some people yourself.
Train that troubleshooting muscle!
And if at work you see a difficult issue try to solve it yourself and document the process.
Next post will be about how to use the SysInternals tools to do more advanced troubleshooting. Stay tuned!