Enterprise Service Management Office, Incident Management Standards

This document is for all university organizations operating or managing IT services and defines the minimum standard for managing IT Incidents at the University of Illinois Urbana-Champaign.

Purpose

This document is for all university organizations operating or managing IT infrastructure and defines the minimum standard for managing IT Incidents.

Goal of Incident Management

The goal of Incident Management is to minimize the negative impact of IT Incidents by restoring normal service operations as quickly as possible. Resolving disruptions efficiently protects the university’s core operations, including teaching, learning, research, and essential administrative functions, by maintaining continuity and stability across the institution.

Incident Management Standards

Discovery, Logging, and Prioritization
  • Technical Support teams must log all IT Incidents in Team Dynamix upon discovery.
  • Incident records must contain the following minimum information (as defined in the Incident Documentation Requirements) when initially logged:
    • Title
    • Affected Service
    • Description
    • Incident Priority
  • Technical Support teams must prioritize all incidents using the campus prioritization scale.
  • When multiple users report the same incident, Service Desk or user support teams must create a separate Incident Ticket in TDX for each user report. Then link all related tickets to the primary Incident Record using TDX’s parent/child relationship feature.
Escalation and Work Progress
  • Technical support teams must document escalation protocols for service offerings in a location accessible to the service desk.
    • Service desk must follow the documented escalation protocols when handling reports of a potential incident.
  • Technical support teams must document all incident resolution steps taken in the incident record.
    • Update the Incident status on the Incident Record as it changes throughout the life of the incident. (e.g., In Process, Resolved, Closed).
  • If an incident involves suspected security compromise, it must be categorized as a cybersecurity incident and immediately escalated per security procedures.
    • Cybersecurity incidents are a subset of incidents, but they require different handling, escalation, and governance (often via a Security or SOC function). 
Incident Communications
  • Unit Communication Plans Required. Each campus IT unit must maintain a documented Incident Communication Plan that identifies communication roles, approval paths (if applicable), and internal workflows. Unit plans must align with campus Incident Management standards and support timely updates to Status at Illinois or alternative communications channels.
  • Initial Notification: For incidents affecting Enterprise IT Service Offerings, the responsible technical team must post an initial notification to Status at Illinois as soon as the incident is detected, even if full details are not yet known. Early awareness is required to set user expectations and reduce duplicate reporting.
  • Priority-Based Update Cadence: Technical teams must provide ongoing updates in Status at Illinois according to the established communication and resolution targets for the incident priority level. Updates should continue at the defined intervals until the incident is resolved.
  • Resolution Communication: When service is restored, the responsible team must publish a resolution update in Status at Illinois. The resolution notice should clearly state that service has been restored and provide any relevant user guidance (for example, whether any user action is needed to use the affected service).
  • Required Content and Audience Clarity: All incident communications must:
    • Identify the affected service by its official service name
    • Describe the issue using clear, non-technical language appropriate for a broad campus audience
    • Provide the expected timing of the next update
    • Avoid internal troubleshooting details that do not aid user understanding
    • Communications should be concise, transparent, and written for end users rather than technical staff.
  • Incidents that do not affect Enterprise IT Service Offerings or that impact only a small, defined user group must still be communicated through a predetermined, unit-approved channel (for example: unit email lists, Teams channels, local signage, or department websites), rather than posting to Status at Illinois. This standard ensures affected users receive timely information while preserving Status at Illinois for incidents with broader campus impact.
  • Cybersecurity incidents may override standard SLAs and workflows due to legal, regulatory, and risk considerations: Report a Cybersecurity Incident | Privacy & Cybersecurity | Office of the Chief Information Officer | Illinois
Resolution
  • Once service has been restored, update the incident record status and describe how the incident was resolved.  Include links to relevant Knowledge Base articles and related records (changes, problems, incidents).
  • Communicate that the incident has been resolved according to the incident communications standards.
  • Ensure that any user tickets associated with the incident are closed.
Post-Incident Reviews (PIR)
  • Conduct PIRs for all critical and high-priority incidents. PIRs support continual improvement.
  • Technical Support teams must document the actions taken to resolve the incident, the root cause of the incident (if known), and any lessons learned or improvement opportunities identified during the PIR.
  • Track follow-up actions arising from PIRs, including assigned owners and target completion dates, to ensure they are completed.

Incident Record Requirements

The following information must be included in every Incident Record. Units may choose to collect additional information if needed. 

Time and Lifecycle Tracking
This table outlines the Incident Record Requirements for Time and Lifecycle Tracking
Data Point Definition Purpose

Start Timestamp

The time the incident began.

Average incident duration

Discovery Timestamp

The time the incident was discovered.

Average discovery time

Response Timestamp

The time of the first action taken.

Average response time

Resolution Timestamp

The time that the incident was resolved.

Average resolution time

Status

Current lifecycle state (e.g., In Progress, Resolved, Closed).

Tracks incident progression

Roles and People
This table outlines the Incident Record Requirements for Roles and People
Data Point Definition Purpose

Affected Users

The users impacted by the incident.

User impact metrics and prioritization

Responsible

The group or technician currently assigned to the incident.

Identifies responsible parties throughout incident lifecycle

Context and Description
This table outlines the Incident Record Requirements for Context and Description
Data Point Definition Purpose

Title

Short name to refer to the incident

Meaningful way to refer to incident

Description

Clear summary of symptoms, impact, and relevant context.

Captures vital information about the incident and response

Affected Service

The IT service offering experiencing the unplanned interruption or degradation.

Enables service-based reporting and trend analysis

Incident Priority

Priority based on the campus Prioritization Scale.

Incident response context and baseline metrics

Actions Taken

High-level summary of actions taken to trouble-shoot and resolve the incident.

Assists with escalations and incident review

Incident Prioritization Scale

Incidents are prioritized using the campus scale which allows campus-wide incident management reporting. Incident priority is determined by impact and urgency, not solely by the number of users affected.

Incident Prioritization Scale 
This table outlines the Incident Prioritization Scale
Priority User Impact Academic/Business Impact Examples

Critical

Campus-wide impact

Incident prevents teaching, research, or essential business operations

  • Loss of internet connectivity across campus
  • Canvas unavailable
  • Campuswide Inability to send/receive email

High

Broad impact (multiple users, departments, or buildings)

Incident significantly interferes with teaching, research, or essential business operations, but work may continue in a degraded state

  • Building network outage
  • Canvas grading unavailable
  • Email delivery delayed

Medium

Limited impact (single user or small group)

Teaching, research, or business operations are impacted but can continue with a reasonable workaround

  • Slow Wi-Fi in the union
  • Canvas video playback issues
  • Outlook desktop client calendar not updating

Low

Minimal or no immediate user impact

No significant disruption to operations; issue is cosmetic, informational, or deferred maintenance

  • Failed generator for a communications node (risk, not current outage)
  • Teams not displaying user avatars (cosmetic)

Major Incident Trigger

  • Critical incidents will be evaluated for Major Incident activation upon identification.
  • The Major Incident Response Plan is initiated when an incident is determined to meet defined criteria for significant business impact, widespread service disruption, or coordinated cross-team response.
  • Activation of the Major Incident Response Plan follows the procedures and designated roles outlined in the Major Incident Response Plan documentation.

Key points:

  • Single-user incidents are typically Medium unless the issue is minor, cosmetic, or has no meaningful impact.
  • Low priority incidents may not impact users immediately but represent minor issues, cosmetic defects, or risks that should be addressed before they cause disruption.

Communication and Resolution Targets

Targets
This table outlines the Communication and Resolution Targets
Incident Priority

Communication Target

Communicate as soon as you have credible confirmation, even if details are minimal.
Resolution Target

Critical

  • Initial communication: within 10 minutes of discovery
  • Update frequency: every 60 minutes or as communicated in prior communication

4 hours

High

  • Initial communication: within 10 minutes of discovery
  • Update frequency: every 2 hours or as communicated in prior communication

8 hours

Medium

  • Initial communication: Within 30 minutes of discovery if the incident impacts multiple users, teams, or critical functions. For incidents with limited impact, communication may be targeted to affected users only.
  • Update frequency: At least once per business day or as communicated in prior communication

2 business days

Low

  • Initial communication: Not required/targeted
  • Update frequency: Not required; provide updates upon request or at resolution

5 business days

Key Point: Resolution targets apply to time under the control of the assigned support team. Time awaiting vendor or third-party action may be excluded when appropriately documented.

Measurement and Reporting

The following metrics are required for Incident reporting. Note: In these reports, an Incident refers to the actual service-affecting event, while an Incident Record refers to the ticket created in the ITSM system to track that event. Metrics are calculated based on Incident Records, unless otherwise specified.

Measurement and Reporting 
This table outlines the Measurement and Reporting Requirements
Metric Data Reporting Requirement Reporting Breakdown

Number of Incidents/Incident Volume

Count of incident records created to track incidents.

Total count of incident records.

  • By month
  • By priority
  • By service

Average Response Time

Time elapsed between incident record start timestamp and first response timestamp.

Sum of response times across incident records divided by number of incident records.  

Average Resolution Time

Time elapsed between incident record start timestamp and resolution timestamp.

Sum of resolution times across incident records divided by number of incident records.

First Contact Resolution Percentage

Count of incident records resolved without escalation.

 

Incident records resolved without escalation divided by total incident records (percentage).

Information Technology Infrastruture Library (ITIL) Maturity Model

The ITIL Maturity Model is a tool that organizations can use to objectively and comprehensively assess their service management capabilities and the maturity of their Service Value System. 

ITIL Maturity Model
This table outlines the ITIL Maturity Model

Level 1

The practice is not well organized; it is performed as initial/intuitive. It may occasionally or partially achieve its purpose thought an incomplete set of activities. 

Level 2

The practice systematically achieves its purpose though a basic set of activities supported by specialized resources. 

Level 3

The practice is well defined and achieves its purpose in an organized way, using dedicated resources and replying on inputs from other practices that integrated into a service management system.

Level 4

The practice achieves is purpose in a highly organized ways, and its performance is continually measured and assessed in the context of the service management system.

Level 5

The practice is continually improving organizational capabilities associated with its purpose. 

Terminology

Incident

Any unplanned interruption to an IT Service offering or a reduction in the quality of an IT Service. If a service is not operating at the level of performance agreed upon in the service level agreement (SLA) or as defined by the service provider, it constitutes an incident and must be logged.

Incident Record

A documented set of data containing all details and the history of an incident—from initial reporting to resolution and closure—used to manage the incident lifecycle, track progress, and provide data for future improvement. 

User

An individual who uses the IT services provided.

Incident Manager

The Incident Manager coordinates the end-to-end handling of Incidents, ensuring effective collaboration, clear communication, and timely resolution. They monitor Incident activity, lead reviews, and drive continual improvement of Incident processes, models, and practices.

Service Desk

First point of contact for users reporting incidents. Responsible for logging, categorizing, and prioritizing incidents, and providing initial diagnosis or resolution where possible.

Technical Support

Provides specialized investigation and resolution for incidents escalated from the Service Desk or detected by technical teams. Responsible for implementing fixes and collaborating with other support groups as needed.

Campus Unit ITSM Liaison

Ensures that campus-wide Incident Management processes are understood, adopted, and consistently followed within their unit, and serves as the point of coordination between the unit and the central ITSM governance group.

Post-Incident Review (PIR)

A post-incident activity that analyzes past incidents to identify trends, root causes, and process improvements

Incident Prioritization

The process of determining the relative importance of an incident by assessing its impact (scope of damage) and urgency (speed required for resolution).

Business Impact

Evaluates the incident's effect on business processes, revenue, and compliance. It determines how critical the incident is to overall organizational operations.

User Impact

Evaluates the number of users affected by an incident or the severity of the functional limitation for an individual.

KnowledgeBase

A central repository containing documentation like FAQs, how-to articles, and troubleshooting guides.

Service Value System (SVS)

The ITIL SVS describes how all the components and activities of the organization unite as a system to enable value co-creation. 



Keywords:
outage, service event, degradation, unplanned, down, unavailable, incident, incidents, AAR, SEL, process, Incident Record, report, service offering, SMO, ESMO 
Doc ID:
86165
Owned by:
ESMO G. in University of Illinois Technology Services
Created:
2018-10-01
Updated:
2026-06-15
Sites:
University of Illinois Technology Services