Working at Novell in the Support organization I work with some of the best troubleshooters in and throughout the world. The companies we work with whose products integrate with Novell’s also mean more people from whom we can learn. Many of the steps used in troubleshooting are common sense and laughed about commonly online (“Is it plugged in”) and while these specifics are not that useful most of the time there are general practices that I have learned and feel need to be shared. The purpose of this article is to group everything that may help troubleshooting generally (and in some cases, specifically) for the benefit of those who may not have been in a support organization for several years. This is by no means the end-all, be-all of troubleshooting and I am by no means the best, but working in Novell’s Support Forums (http://forums.novell.com/) some of these skills could speed up resolution times for those seeking help or even prevent the need for outside help altogether if applied.
The idea for this started around 2008-12-25 (Christmas, 2008) when a friend sent me a link on how to ask (technical) questions in a smart way: http://www.catb.org/~esr/faqs/smart-questions.html. I’m going to include a lot of the information from the article below and for younger readers or those more sensitive to coarse language I would recommend avoiding the link altogether.
This article will focus on troubleshooting in general and not just asking questions of others so hopefully this will be all you need and a one-stop shop of sorts. In online forums a lot of the same questions are asked over, and over, and over, and it becomes a bit tedious to continually ask if pinging works, or if Domain Name Service (DNS) requests resolve properly, or whatever. In the web interface to the forums there are often “stickies” that are threads with common issues and starting places to help with those types of questions and while those help they are specific to the issues for which the forum exists and are not accessible to Network News Transfer Protocol (NNTP) users. To address this, and in the interest of sharing and giving back to the community what I have gleaned from the community, this article is now being written.
So let’s get into this with some of my favorite lessons from Dr. Parker back in college. He used to tell us in programming classes that programmers were lazy, in a good way. Reinventing wheels isn’t interesting, fun or productive so programmers reuse wheels (code) instead of rewriting them. Programmers also write programs to do things that they (or others) do over and over again but which can be done more-quickly with a little program. A better way to word it may be that programmers are focused on “efficiency” but the word “lazy” managed to get the attention of a bunch of sleeping college students.
This leads into the first points I’d like to bring up in troubleshooting. Chances are you are not the first to see a problem no matter what it is. If you are the very first to see a problem ever then you are probably a developer writing the problem, er, program, and you know how to work it out better than anybody. If you are a user or administrator of some system then you are probably not the first, so use the wheels of those who have come before you. “Those who do not learn from the past are doomed to repeat it,” after all, so let’s learn from it. This may also be my own little plug for the forums which are a great resource for all things technical. There are forums devoted to every topic under the sun. Before you post to them asking a question be sure to search them either via their own search tools or Google (http://www.google.com/), to see if the question has come up before. Try a few different queries. Other places to check early on include product documentation (no seriously, it’s not there for the health of the people writing it), other websites (Google again), friends/coworkers, and even the source code. You can’t know enough about your problem so learn as much as you can so you can give the best description once it is time to ask a question publicly. If too little information is provided then the first replies may just be asking for more information which means lost time and attention to your issue until details can be provided. Help others help you; they want to, really they do, but no matter what Madam Cleo says crystal balls just don’t work (and if they did do you really think anybody would be in the business of selling that information out to others?).
In order to properly search on an issue you are experiencing the issue must be known. This is the first big way to save time and frustration in your life and in the lives of those who will read what you ask and try to help you. General understanding of the issue is always a starting point but stopping there and throwing up alarms without a better understanding will just end up wasting time so understand the issue first. Part of the process of troubleshooting is isolating the issue (or issues) to the least common denominator. For example the most-common question causing frustration that comes to mind is, “I can’t get online!!! Can you help!!!!”. This is frustrating for a few reasons. First, it’s a lie; you are obviously online if you can post the question. Second, it is so general that the first response will probably be a suggestion to turn on the computer, or an answer to the question regarding the possibility of help with a simple affirmative (“Yes, I can help.”). The third reason this is a terrible start to a question (or a subject to a forum post) is that it implies no work has been done to resolve or isolate the issue. It’s your problem so in the end you need to fix it either on your own or with others’ help. Asking somebody to use their resources before you have even given it a couple seconds’ thought is a way to lose respect quickly. A fourth, less-obvious reason on the forum side is that this is often the subject line of a thread and will be seen by everybody in the forum and anybody searching later. As the subject line is meaningless the content contained therein, no matter how valuable, may be found to be meaningless. The thread may be ignored by those who would help or by those seeking help later on so the data accumulated during the thread’s resolution are lost.
To prevent all of these issues good data need to be in hand which is a large part of the purpose of this article. How do we get good data for troubleshooting? A friend succinctly described this as being able to think critically. Some people have this knack and apply it naturally and others do not, but to a large degree breaking down a problem is natural and, where less natural, can be learned.
First, let’s get some details that are non-trivial and non-generic. Knowing the details to gather may be a tricky start, but remember that the troubleshooting is not being done in an intellectual vacuum. Finding out commands to run, files to check, and other relevant points of data is fairly straight-forward. The most-obvious places to start learning what to check may be the official documentation, Frequently Asked Question (FAQ) documents, ‘man’ (manual) pages, a product knowledge base, online forums, friends/coworkers, Google search, or anywhere else that is focused on the product being used. As an example with a networking issue in Linux, Novell has SUSE Linux Enterprise Server (SLES) documentation, a troubleshooting section contained in the documentation specifically, man pages for various commands that may be involved (ip, dig, ping, etc.), forums on http://forums.novell.com/ dedicated to SLES and, more specifically, to networking issues in SLES, and Support engineers that can ultimately help you if needed. Outside of these finding another Linux user may be as simple as looking around.
These various resources will help start to isolate the issue and with skills and tricks learned from them if the issue is not resolved at least we are armed with valuable data to resolve it eventually. When first venturing into the online world for a product or technology maybe start with areas labeled for ‘newbies’. They are often there to help you at least get started, will often give very fast responses, and won’t have overly high expectations for a first post. Still, do your best so you can avoid wasting the time of people providing the support.
So in theory we now have a bit of an idea about the first steps in troubleshooting. At the very least we understand what we think the product or technology should do and we know what it’s doing, or what it’s not doing, or what we are observing about it that is causing us grief. These are points to describe, specifying which point is which. To do this properly several data points should be included. While these types of things could easily apply to non-technical fields we’re going to stick in the Information Technology (IT) realm for now. First, which product and product version are you using? Which platform and version are you running it on if it isn’t a platform itself? Which steps are you taking to (try to) do something? Is there documentation or another article that you are following? Does it work partially or not at all and how, specifically, does it not work? What should it do if working (What do you expect?)? What do the log/trace/debug files show for it when reproducing the issue (find these if you do not have them; they are there)t? When running from the command line (if possible) what output is given when reproducing the issue? Has it ever worked or is this a first attempt? Does it work on other systems (is it reproducible)? If this worked previously what has changed since then? What steps have been taken by you or others to resolve the issue and what were the results?
The questions above are meant to be vague and open ended to encourage thought and expansion on the symptoms being seen. Remember, Who, What, When, Where, Why, and How. Keep in mind that anytime it is possible providing EXACT output from the system instead of your own interpretation is encouraged for a number of reasons. A simple example for a network issue: “Can you reach Google?” “Yes.” What does this mean? Does it mean pinging with a known Google IP address worked? Does it mean that resolution of Google’s IP address worked while neglecting real browser tests? Does it mean that a browser properly pulled up the main Google homepage? Instead if the results from the ‘ping’, ‘dig’, and ‘curl’ commands were posted the answer would be much clearer to the person asking the question. Hopefully the question can be asked in a way to get the desired data back properly (“Please run this command and post the output.”) but doing this from the start speeds up the time to resolution for you. The questions above, in stimulating thought on your part, are meant to get you to find out how you broke things. Often (too often) the answer to the question of “What happened?” is “Nothing.” While it is possible that nothing truly happened more often than not it is not true. At the very least a hardware failure or Internet Service Provider (ISP) access or something else changed. Computers do not typically just stop working (windows excepted of course) because they are programmed to just go and go and go. They stop when they must (power button, overheating, bad connection to another device upstream) or when you tell them to. While you may not have pushed the power button causing disk corruption answering that “I had a power outage.” is better than saying “Nothing happened the other night during the big lightning storm.” For interesting tall tales check out the library, or the local political scene, but for troubleshooting it’s best to voyage into the area of truth and facts.
Going along with the vein of communication it is important that both senders and receivers of data are using the same protocols for communication, which typically means grammatically-correct, readable English. Languages may be variable in some locales though typically IT information is communicated in English (for example, ever written C++ in Japanese?). Regardless of which language you use questions and answers are typically not done in l33t-sp3@k or in the abbreviated language often used in instant messages or e-mails. Along with the “facts” required in the previous section using shorthand when communicating them can throw other parties off. Similarly if you ever use abbreviations be sure you call out what those mean before using them. There are only so many combinations of letters that can be used in the world to abbreviate other things so provide some context. Another point along these lines is to send data in standards-compliant formats. Use image formats for screenshots that are accessible to anybody (PNG, JPG) or text formats that are the same (TXT, PDF, ODT). All of the data in the world are useless if they cannot be shared, and the use of HTML e-mails, random office document formats, and proprietary image types just slows down resolution of the issue by the people whose assistance you are seeking.
Another point which I believe deserves special emphasis is to clearly specify why you are doing what you are doing. This is especially important when you are trying to add or change functionality more than trying to resolve a problem. A problem typically has a straight-forward solution, but an enhancement may have any number of solutions. In these cases describing your desired endpoint more than your desired path to reach that endpoint can be especially important. Another networking example for this recently came up: “How can I setup a static route to Google on my system?” The first response was, “Why do you want to do this?” which was appropriate since the actual desire was just to access Google and setting up a static route was the completely incorrect way to go about it. The default gateway was not setup properly (user had setup the network information statically and forgotten that detail). The answer to the question would have perhaps ended up with a static route but that was hardly the correct fix. A valid description of “Cannot access websites after configuring network settings manually” would have been much closer to the real issue, and combined with network output from the ‘ip’ command would have led to a very quick fix. In the end describe what you want to do in the end because there may be a much better way than you are proposing. Provide the big picture and get the big answer.
Once you have had a problem, researched/poked/prodded, and are ready to ask the world for help keep a few other things in mind. Despite the amount of information you could get and the potential verbosity of the query a lot of text doesn’t always help keep things clear in the minds of outsiders. Use bullets and succinct sentences to help clarify the issue. Spell out what has happened in chronological order unless some order really, really makes more sense. The document I cited on asking smart questions also mentions leaving out pointless queries like, “Can anyone help me?” and the like. It also mentions that courtesy can help while unnecessary grovelling can hinder. Once a response comes keep in mind that it was given when the alternative was to skip over it, so chances are good that the person responding is really trying to help. This could be when you find out your idea of verbose data was far too brief, or your analysis of the problem was completely wrong, and depending on a million variables the received response could be full of sugar or it could be very coarse. Either way you received a response and either it helps you or it does not. Try what is suggested and after doing so ask follow-up questions if needed along with what you tried from the previous response while posting the requested information. Once a solution is found indicate as much as this makes warm and fuzzy feelings abound on the Information Superhighway (decreasing Information Superhighway road rage).
The very last step, and one that is a bit beyond your initial issue, is to see if you can contribute back as well. The “community” around any product or technology grows as those who are interested help one another. As we receive assistance from others it is useful and appropriate to help those who may not be to the same point we are . Chances are there is always somebody newer at a technology than you are and, if not, there will be after you use it for a day or two. If you find neat new things to do with a given technology or product then write up an article on it. If you find somebody with an issue you had in the past then give them the same help you were previously rendered. This may seem like a purely unselfish thing to do but I do not want to dissuade you just because of that; consider the potential for yourself as you regurgitate what you know to another. It reinforces your own knowledge, may help you round out, refine, or even expand that knowledge, and could also be something you can point to in the future as a reference of your abilities. Being able to provide instructions that will help others and knowing a technology well enough to resolve others’ problems can be as valuable to an employer as made-up jargon and references on your resume or Curriculum Vitae (CV).
For now that is the end of the general steps I wanted to outline. If there are points I have missed please feel free to comment directly on this article below. The more steps that are pinned down the better we can all become and the faster we can get the resolutions we all desire.