What is IT Incident Management? (Best Practice and Processes)

IT incident management is an ITSM process that is used to resolve IT service disruptions and get interrupted services up and running as quickly as possible. This practice helps ensure that the impact of incidents on critical services is reduced, enabling operations to carry out as usual.

As much as we’d like to, not everything is smooth sailing. Disruptions or interruptions can come up out of nowhere due to various factors when it comes to IT services. Sometimes things just happen outside of our control. An employee’s computer mouse might break suddenly, or the company VPN becomes unreliable. Such happenings can hinder people’s work and, if left alone, can significantly cost the business.

Under the ITIL framework by Axelos, incident management is recognized as one of the key processes under the service lifecycle stage, Service Operations. Good IT incident management not only helps better predict the likelihood of incidents happening but also minimizes the impact they cause should they happen.

Let’s take a look at the various benefits and what’s involved in implementing an effective incident management program.

Benefits of IT incident management

Having an IT incident management strategy provides an organization a whole range of advantages, not just the IT team.

• Reduces negative business impact

With so much of your workforce’s daily operations relying on technology, any incident can have an adverse impact on your organization. For example, a Gartner study discovered that network downtime can cost an organization around $5,600 per minute. Having a process in place to tackle incidents enables the IT team to resolve these issues quickly and return things in a normal working fashion.

• Boosts productivity

Fewer interruptions caused by incidents mean that people don’t have to deal with the frustrations that come with them and can then focus on their work. And should incidents occur, a good IT incident management process means these instances can be minimized. The IT department can also work productively when there are a set of clear actions in place to tackle incidents.

• Enhances end-user satisfaction

Having explicit guidelines and processes in place means that when an end-user reports an incident, your team can get on the case quickly. Prompt response and effective resolution provide the end-user a positive service experience, hence better satisfaction.

• Identifies potential improvements

A defined IT incident management strategy enables you to pinpoint areas of improvement – whether that’s reviewing policies or workflows, assessing if your team requires additional training, or determining if your tech products are fit for purpose.

• Helps demonstrate compliance

With better incident monitoring and reporting, you can centralize key information and data that ensure you are keeping in line with your legislative and regulatory obligations.

• Gives better insights into the reasons behind incidents

Good incident management provides a clear overview of what’s happening in your IT ecosystem, giving you a better understanding of why and how incidents are occurring. This, in turn, can help you improve your incident prevention strategies.

Incident vs problem vs change vs service request

Probably one of the challenging aspects of embracing ITSM best practices is differentiating key terminologies. Incident, problem, change, and service request are commonly seen terms, with each one relating to an important ITSM process as specified in the ITIL framework. But how do they differ?

An incident is an unplanned disruption that negatively impacts an IT service. In other words, something within the infrastructure broke and needs to be fixed. For example, when a laptop is not turning on or when software keeps crashing.

A problem is the cause of one or more incidents of the same nature. A problem may be the reason for several incidents. On the same note, an incident can also be caused by multiple problems.

A change is any modification you make that can affect your IT services. This can include any additions or removals you implement and covers any alterations to documentation, processes, software/hardware, etc.

A service request is a request for a predefined service, whether that’s access to information, equipment, or a standard IT service. Unlike incidents that are about the ‘needs’, service requests are the ‘wants’ of the end-user which further support or enhance their way of working.

Defining what’s considered an incident, problem, change, or service request helps align your goals and create clear ITSM processes.

Types of incidents

Incidents can come from anywhere, at any time. Usually, these can be split into two types: faults or technical incidents.

A fault is an incident that affects the end-user experience, with the end-user usually being the only one affected by it. For example, this could be their software crashing or a printer not working. Faults are often spotted and raised by the user.
A technical incident is an incident that is not as visible to the end-user and is often spotted by the IT team. This can be something like corrupted data or application performance issues. Automated monitoring makes incidents like these easier to detect.

IT incident management processes

What is IT Incident Management? (Best Practice and Processes) - incident management lifecycle

IT incident management involves several processes when dealing with an incident.

1. Incident identification and logging

This is when the incident is first spotted and recorded. This step can be done through various channels: email, phone call, live chat, or a self-service portal.

2. Incident categorization

There are many kinds of incidents that can occur. Categorizing them makes it easier to pinpoint what type of expertise would be needed to resolve the issue. This is where they can specify which area was affected, e.g., software, hardware, or network. Depending on your organization’s requirements, you can create more specific sub-categories as well.

3. Incident prioritization

The next step is to determine how critical the incident is and how to prioritize incoming incidents. This is done by measuring it against the IT incident priority matrix (as seen below).

4. Incident investigation and diagnosis

Once categorized and prioritized, a service desk person (Level 1) will take on the incident and start an initial diagnosis. Most incidents are usually resolved at this stage, but in the case that it’s not…

5. Incident assignment or escalation

The incident can be re-assigned or escalated to another person or team depending on its complexity or if more specific expertise is required.

6. Incident resolution

This is the stage at which the incident is solved, and the necessary steps have been taken to fix the issue.

7. Incident closure

With the fix implemented, the service desk team will then ask the end-user if the fix is satisfactory and will close the incident.

8. User satisfaction survey

To further improve your incident management processes, it’s important to get feedback from the end-user about their experience during the different stages.

Each step is a crucial aspect of incident management and the various teams involved will need to work together to ensure every action is followed through.

IT incident priority matrix

The IT incident priority matrix is a guideline used to determine an incident’s priority. As mentioned in the previous section, incident prioritization is an essential step in incident management. With the matrix, incident priority is measured with the level of potential impact against the level of urgency.

The level of impact is dependent on the effect of the incident on the business. For example, this could be how many users will be affected, revenue that will be lost, or how many services are involved.

The level of urgency is about time. How long can your organization carry on operations without resolving this issue? Will the incident escalate over time? An urgent incident can cause increasing damage the longer it’s left alone.

What is IT Incident Management? (Best Practice and Processes) - incident priority matrix

A major incident is one with high levels of both impact and urgency. This is usually the case for outages or loss of service that are critical to key business operations. Major incidents require immediate attention, with some organizations having dedicated teams to handle these.

Determining the different levels of impact and urgency differs for every organization but using the matrix as a guideline can help you allocate important resources where necessary.

IT incident response plan

An IT incident response plan is the documentation detailing the systems and processes IT teams have in place to effectively handle incidents, especially those relating to cybersecurity. This plan is created so IT teams and employees will know what to do should they encounter incidents.

Incident response plans usually consist of the following:

• Key roles and responsibilities

This informs everyone of who does what and if needed, who to contact when an incident occurs.

• Escalation criteria

At what point do you escalate an incident after attempting diagnosis at Level 1? Setting out escalation criteria helps avoid confusion and provides clear definitions for escalation.

• Process workflow

This specifies all the necessary steps, helping ensure that nothing is missed or falls through the cracks.

• Information on legal or regulatory requirements

To ensure compliance is carried out, it’s important everyone’s aware of their responsibilities and contribution towards compliance. Having the key information on these requirements documented in the plan makes it accessible for the people involved.

IT incident report template

An IT incident report template is a form that makes it easier for users to record incidents. It enables them to provide a summary of when it occurred, what was affected, and the potential severity of the issue. Some ITSM tools like IFS assyst have easy-to-use templates available in their system.

A typical incident report can contain the following sections:

• Initial incident details

This can include information on who reported the incident, their contact details, the date it was reported, and which service was disrupted.

• Detailed incident information

This section goes into the specifics of the incident, such as when the incident occurred/was spotted, the category, and how long it’s been going. This may also include an ID or reference number that makes it easier for the IT team to refer back to later.

• Incident description

This is where any further qualitative data can be provided to help add context to the incident. This is especially useful to have as it provides the support team with more information that will enable them to pinpoint the root cause and diagnose it.

• Impact

This is where the user can specify the effects of the incident and give an indication of the level of priority.

It’s essential that incidents are reported promptly to ensure a smooth service experience. Using an IT incident report template helps simplify this task.

IT incident management roles and responsibilities

To make incident management work, you need to specify key roles and responsibilities in your process. Here’s a quick summary of the different members that can be involved in resolving incidents.

1. Service desk (Level 1)

When an end-user logs an incident, the L1 service desk team will be their first point of contact. Some of their responsibilities include:

Logging and categorizing incidents
Providing quick resolutions at the most immediate opportunity
Escalating unresolved incidents to Level 2 support
Keeping end-users informed of the progress of the incident
Closing the incident once the fix has been implemented

2. Tech lead (Level 2)

When an incident requires some specific expertise, this role will support the service desk member. Some of their responsibilities include:

Investigating the cause of the failed IT service
Giving specialized input in finding resolutions for the incident
Sharing knowledge with Level 1 staff on the issue at hand
Escalating unresolved incidents to Level 3/Incident manager

3. External experts or Vendor (Level 3)

In the case of bigger and more complex incidents, it might be that outside expertise is required. This is usually the case when you need product/system support directly from the supplier, or if the issue is outside your staff’s area of expertise and needs external consultation. Some of their responsibilities include:

Supporting L1 and L2 staff with resolving incidents
Sharing expert insights and recommendations for the issue at hand
Providing relevant services that can help with incident resolution

Depending on an organization’s IT department structure, there may be additional support levels, e.g., Level 4/Level 5, with each one providing technical support as incidents become more complex.

4. Incident manager

The incident manager oversees the end-to-end process of resolving all incidents and ensures all interrupted IT services are up and running again as soon as possible. Some of their responsibilities include:

Coordinating activities related to the process
Allocating the necessary resources to resolve the incidents
Tracking the progress of incidents and ensuring they meet SLA requirements
Overseeing resolution of major incidents
Monitoring the workload of the service desk support staff
Providing insights on areas of improvement to the incident process owner

5. Incident process owner

This role ensures that the incident management process aligns with the organization’s business goals. Some of their responsibilities include:

Creating documentation of the process
Ensuring the procedures, technology, and level of expertise follow legislative and regulatory requirements
Implementing organization-wide policies regarding the process
Planning and making continuous improvements on different aspects of the process
Setting and monitoring KPIs

Selecting incident management tools

Technology plays a huge part in helping your team carry out key activities and obtain useful insights for effective incident management. But choosing the right tech tools can be challenging with the wide range of options available in the market.

Whether you opt for an incident management-specific solution or a system that provides full coverage of various ITSM processes like IFS assyst, there are some key elements that you should look out for from your preferred incident management tool.

• Easy-to-use interface

Effective incident management requires involvement from across the organization. So, from the incident manager to the end-user, your tool should be simple to use and navigate.

• Smart automation

Modern incident management tools should be taking advantage of the latest technologies such as AI or Machine Learning to help automate routine tasks and enable real-time monitoring of your IT infrastructure.

• Comprehensive knowledge database

A good knowledge database where you can record previous incidents and their solutions enables faster response/resolution time should a similar incident occur again.

• Mobile access

As the workplace can differ for every employee, especially as the workforce becomes more distributed, being able to log incidents or access information from anywhere at any time is paramount.

If you’re searching for a system that covers not just incident management but also supports other major ITIL/ITSM processes, have a look at our buyer’s guide taking you through key considerations when selecting an ITSM solution.

Find the right ITSM solution for you.