Problem Management extends the process of Incident Management. An Incident is a non-standard operational event with the potential to harm the quality of an IT service. Incidents are reported by end-Users, encountered by technicians and system/database Administrators, or automatically detected by system management tools. In all instances, Incidents should be reported to the Service Desk.
A Problem describes the underlying cause of one or more Incidents that are being investigated. However, not all Incidents are investigated as Problems.For example, if the power-supply in a desktop computer blew up, it should be treated as an Incident, and the power-supply replaced. ( Although if the power supply is controlled by Change Management, it would need to be treated as a minor change request. ) However, if there was a spate of burntout power-supplies in the same model desktop, then an underlying Problem with the desktop may possibly exist and further investigation into the cause and potential solutions may be required.
For Incidents to be correctly categorized as a Problem, organizations must define the evaluation criteria. For example, raise a Problem if more than 10 Incidents are logged in the space of three hours that refer to the same Configuration Item.
After the underlying cause of a Problem has been diagnosed, it is referred to as a Known Error. At this point, the root cause of the Problem is known, and the most appropriate course of action is to be determined. This may take the form of a structural resolution by raising a request for change (RFC). Alternatively, it may be decided, after consultation with Users and Customers, to implement a workaround or recovery action.
In the case of the above example:
Problem: Brand X Desktops no longer operating
Root Cause: Faulty power-supply in July ’09 models
Known Error: Warranty - replace with new power-supply
As part of the Problem Management Process, if a Problem is related to a Change Request and that related Change Request is closed, the Problem will be automatically closed. The system views the request hierarchy from low to high as Service Request, Incident, Problem and Change Request, and if a related request of a higher type is closed, all the lesser type requests are automatically closed.