Incident and problem management

INTRODUCTION

IT Service Management (ITSM) includes dealing with incidents and problems. As the role of IT in the company grows, so does the need to provide a good level of service, to ensure the maximum availability of IT services. The business user should be able to get their problems resolved as quickly as possible, if they arise, and be able to work at any time. The implementation of incident and problem management processes aims to do just that. In this article, we describe how the work of an IT service can be arranged within the framework of incident and problem management. This description is based on ITIL suggestions and the experience of our customers.

LANGUAGE OF INCIDENTS AND PROBLEMS

ITIL Service Support is a globally recognized model. It is based on best practices and is used as a guide by IT organizations in developing service management approaches. This model is promising. It also defines additional elements necessary for the successful functioning of an IT organization as a service business. It provides a technical vocabulary for discussing helpdesk, defines concepts, and highlights the differences between different activities. For example, the activity required to respond to service interruptions, to restore it, is different from the activity to find and eliminate the causes of service interruption.

INCIDENTS

An incident is any event that is not part of the standard operations of a service and causes, or may cause, an interruption of service or a reduction in the quality of the service.

Examples of incidents are:

User cannot receive email
Network monitoring tool indicates that the communication channel will soon overflow
The user experiences a slowdown in the application

PROBLEMS

Problem – There is an unknown cause of one or more incidents. One problem can give rise to several incidents.

ERRORS

Known Error – There is an incident or problem for which the cause has been identified and a workaround or solution has been developed. Errors can be identified as a result of analyzing user complaints or analyzing systems.

Examples of errors include:

Incorrect computer network configuration
The monitoring tool incorrectly determines the status of the link when the router is busy
The relationship between incident and problem management is shown in Figure 1. Incidents, problems and known errors are linked in a kind of life cycle: incidents are often indicators of problems ⇒ identifying the cause of the problem determines the error ⇒ errors are then systematically corrected.

INCIDENT MANAGEMENT

Incident Management is the activity of restoring normal service with minimal delay and impact on business operations, which is a reactive, short-term focused recovery service.

It includes:

Identification and registration of incidents
Classification and initial support
Research and diagnostics
Solution and Recovery
closure

Ownership, monitoring, tracking and communication

PROBLEM MANAGEMENT

Problem management is the activity of minimizing the impact on the business of problems that are caused by errors in the IT infrastructure, to prevent the recurrence of incidents associated with such errors. Problem management identifies the causes of problems and identifies solutions to bypass or eliminate them.

Problem management includes:

Problem control
Error control
Problem Prevention
Analysis of the main problems

PROBLEM CONTROL

The purpose of problem control is to find the cause of the problem by following these steps:

Problem identification and logging
Classifying problems and prioritizing their solutions
Research and diagnosis of causes

ERROR CONTROL

Error control ensures that problems are corrected by:

Identifying and logging known errors
Evaluation of remedies and prioritization
Registration for a temporary error workaround in the helpdesk tools
Closing Known Bugs by Implementing Fixes
Monitor known errors to determine if reprioritization is needed