Enterprise Service Management Office, Incident Management Standards

This document is for all university organizations operating or managing IT services and defines the minimum standard for managing IT Incidents at the University of Illinois Urbana-Champaign.

Purpose

This document is for all university organizations operating or managing IT infrastructure and defines the minimum standard for managing IT Incidents.

Goal of Incident Management

Incident Management minimizes the negative impact of IT Incidents by restoring normal service operations as quickly as possible. Resolving disruptions efficiently protects the university’s core operations, including teaching, learning, research, and essential administrative functions, by maintaining continuity and stability across the institution.

Incident Management Standards

Discovery, Logging, and Prioritization

Technical Support teams must record all incidents in the University-approved IT Service Management (ITSM) system upon discovery.
Incident records must contain the following minimum information (as defined in the Incident Documentation Requirements) when initially logged:
- Title
- Affected Service
- Description
- Incident Priority
Technical Support teams must prioritize all incidents using the campus prioritization scale.
When multiple users report the same incident, Service Desk or user support teams must create a separate Incident Ticket in TDX for each user report. Then link all related tickets to the primary Incident Record using TDX’s parent/child relationship feature.

Escalation and Work Progress

Technical support teams must document escalation protocols for service offerings in a location accessible to the service desk.
- Service desk must follow the documented escalation protocols when handling reports of a potential incident.
Technical support teams must document all incident resolution steps taken in the incident record.
- Update the Incident status on the Incident Record as it changes throughout the life of the incident. (e.g., In Process, Resolved, Closed).
If an incident involves suspected security compromise, it must be categorized as a cybersecurity incident and immediately escalated per security procedures.
- Cybersecurity incidents are a subset of incidents, but they require different handling, escalation, and governance (often via a Security or SOC function).

Incident Communications

Unit Communication Plans Required. Each campus IT unit must maintain a documented Incident Communication Plan that identifies communication roles, approval paths (if applicable), and internal workflows. Unit plans must align with campus Incident Management standards and support timely updates to StatusHub or alternative communications channels.
Initial Notification: For incidents affecting Enterprise IT Service Offerings, the responsible technical team must post an initial notification to StatusHub as soon as the incident is detected, even if full details are not yet known. Early awareness is required to set user expectations and reduce duplicate reporting.
Priority-Based Update Cadence: Technical teams must provide ongoing updates in StatusHub according to the established communication and resolution targets for the incident priority level. Updates should continue at the defined intervals until the incident is resolved.
Resolution Communication: When service is restored, the responsible team must publish a resolution update in Status at Illinois. The resolution notice should clearly state that service has been restored and provide any relevant user guidance (for example, whether any user action is needed to use the affected service).
Required Content and Audience Clarity: All incident communications must:
- Identify the affected service by its official service name
- Describe the issue using clear, non-technical language appropriate for a broad campus audience
- Provide the expected timing of the next update
- Avoid internal troubleshooting details that do not aid user understanding
- Communications should be concise, transparent, and written for end users rather than technical staff.

Incidents that do not affect Enterprise IT Service Offerings or that impact only a small, defined user group must still be communicated through a predetermined, unit-approved channel (for example: unit email lists, Teams channels, local signage, or department websites), rather than posting to Status at Illinois. This standard ensures affected users receive timely information while preserving StatusHub for incidents with broader campus impact.
Cybersecurity incidents may override standard SLAs and workflows due to legal, regulatory, and risk considerations: Report a Cybersecurity Incident | Privacy & Cybersecurity | Office of the Chief Information Officer | Illinois

Resolution

Once service has been restored, update the incident record status and describe how the incident was resolved. Include links to relevant Knowledge Base articles and related records (changes, problems, incidents).
Communicate that the incident has been resolved according to the incident communications standards.
Ensure that any user tickets associated with the incident are closed.

Post-Incident Reviews (PIR)

Conduct PIRs for all critical and high-priority incidents. PIRs support continual improvement.
Technical Support teams must document the actions taken to resolve the incident, the root cause of the incident (if known), and any lessons learned or improvement opportunities identified during the PIR.
Track follow-up actions arising from PIRs, including assigned owners and target completion dates, to ensure they are completed.

Incident Record Requirements

The following information must be included in every Incident Record. Units may choose to collect additional information if needed.

Time and Lifecycle Tracking

This table outlines the Incident Record Requirements for Time and Lifecycle Tracking
Data Point	Definition	Purpose
Start Timestamp	The time the incident began.	Average incident duration
Discovery Timestamp	The time the incident was discovered.	Average discovery time
Response Timestamp	The time of the first action taken.	Average response time
Resolution Timestamp	The time that the incident was resolved.	Average resolution time
Status	Current lifecycle state (e.g., In Progress, Resolved, Closed).	Tracks incident progression

Roles and People

This table outlines the Incident Record Requirements for Roles and People
Data Point	Definition	Purpose
Affected Users	The users impacted by the incident.	User impact metrics and prioritization
Responsible	The group or technician currently assigned to the incident.	Identifies responsible parties throughout incident lifecycle

Context and Description

This table outlines the Incident Record Requirements for Context and Description
Data Point	Definition	Purpose
Title	Short name to refer to the incident	Meaningful way to refer to incident
Description	Clear summary of symptoms, impact, and relevant context.	Captures vital information about the incident and response
Affected Service	The IT service offering experiencing the unplanned interruption or degradation.	Enables service-based reporting and trend analysis
Incident Priority	Priority based on the campus Prioritization Scale.	Incident response context and baseline metrics
Actions Taken	High-level summary of actions taken to trouble-shoot and resolve the incident.	Assists with escalations and incident review

Incident Prioritization Scale

Incidents are prioritized using the campus scale which allows campus-wide incident management reporting. Incident priority is determined by impact and urgency, not solely by the number of users affected.

Incident Prioritization Scale

This table outlines the Incident Prioritization Scale
Priority	User Impact	Academic/Business Impact	Examples
Critical	Campus-wide impact	Incident prevents teaching, research, or essential business operations	Loss of internet connectivity across campus Canvas unavailable Campuswide Inability to send/receive email
High	Broad impact (multiple users, departments, or buildings)	Incident significantly interferes with teaching, research, or essential business operations, but work may continue in a degraded state	Building network outage Canvas grading unavailable Email delivery delayed
Medium	Limited impact (single user or small group)	Teaching, research, or business operations are impacted but can continue with a reasonable workaround	Slow Wi-Fi in the union Canvas video playback issues Outlook desktop client calendar not updating
Low	Minimal or no immediate user impact	No significant disruption to operations; issue is cosmetic, informational, or deferred maintenance	Failed generator for a communications node (risk, not current outage) Teams not displaying user avatars (cosmetic)

Communication and Resolution Targets

Targets

This table outlines the Communication and Resolution Targets
Incident Priority	Communication Target Communicate as soon as you have credible confirmation, even if details are minimal.	Resolution Target
Critical	Initial communication: within 10 minutes of discovery Update frequency: every 60 minutes or as communicated in prior communication	4 hours
High	Initial communication: within 10 minutes of discovery Update frequency: every 2 hours or as communicated in prior communication	8 hours
Medium	Initial communication: Within 30 minutes of discovery if the incident impacts multiple users, teams, or critical functions. For incidents with limited impact, communication may be targeted to affected users only. Update frequency: At least once per business day or as communicated in prior communication	2 business days
Low	Initial communication: Not required/targeted Update frequency: Not required; provide updates upon request or at resolution	5 business days

Key Point: Resolution targets apply to time under the control of the assigned support team. Time awaiting vendor or third-party action may be excluded when appropriately documented.

Measurement and Reporting

The following metrics are required for Incident reporting. Note: In these reports, an Incident refers to the actual service-affecting event, while an Incident Record refers to the ticket created in the ITSM system to track that event. Metrics are calculated based on Incident Records, unless otherwise specified.

Measurement and Reporting

This table outlines the Measurement and Reporting Requirements
Metric	Data	Reporting Requirement	Reporting Breakdown
Number of Incidents/Incident Volume	Count of incident records created to track incidents.	Total count of incident records.	By month By priority By service
Average Response Time	Time elapsed between incident record start timestamp and first response timestamp.	Sum of response times across incident records divided by number of incident records.
Average Resolution Time	Time elapsed between incident record start timestamp and resolution timestamp.	Sum of resolution times across incident records divided by number of incident records.
First Contact Resolution Percentage	Count of incident records resolved without escalation.	Incident records resolved without escalation divided by total incident records (percentage).

Information Technology Infrastruture Library (ITIL) Maturity Model

The ITIL Maturity Model is a tool that organizations can use to objectively and comprehensively assess their service management capabilities and the maturity of their Service Value System.

ITIL Maturity Model

This table outlines the ITIL Maturity Model
Level	Description
Level 1	The practice is not well organized; it is performed as initial/intuitive. It may occasionally or partially achieve its purpose thought an incomplete set of activities.
Level 2	The practice systematically achieves its purpose though a basic set of activities supported by specialized resources.
Level 3	The practice is well defined and achieves its purpose in an organized way, using dedicated resources and replying on inputs from other practices that integrated into a service management system.
Level 4	The practice achieves is purpose in a highly organized ways, and its performance is continually measured and assessed in the context of the service management system.
Level 5	The practice is continually improving organizational capabilities associated with its purpose.

Terminology

Incident

Any unplanned interruption to an IT Service offering or a reduction in the quality of an IT Service. If a service is not operating at the level of performance agreed upon in the service level agreement (SLA) or as defined by the service provider, it constitutes an incident and must be logged.

Incident Record

A documented set of data containing all details and the history of an incident—from initial reporting to resolution and closure—used to manage the incident lifecycle, track progress, and provide data for future improvement.

User

An individual who uses the IT services provided.

Incident Manager

The Incident Manager coordinates the end-to-end handling of Incidents, ensuring effective collaboration, clear communication, and timely resolution. They monitor Incident activity, lead reviews, and drive continual improvement of Incident processes, models, and practices.

Service Desk

First point of contact for users reporting incidents. Responsible for logging, categorizing, and prioritizing incidents, and providing initial diagnosis or resolution where possible.

Technical Support

Provides specialized investigation and resolution for incidents escalated from the Service Desk or detected by technical teams. Responsible for implementing fixes and collaborating with other support groups as needed.

Campus Unit ITSM Liaison

Ensures that campus-wide Incident Management processes are understood, adopted, and consistently followed within their unit, and serves as the point of coordination between the unit and the central ITSM governance group.

Post-Incident Review (PIR)

A post-incident activity that analyzes past incidents to identify trends, root causes, and process improvements

Incident Prioritization

The process of determining the relative importance of an incident by assessing its impact (scope of damage) and urgency (speed required for resolution).

Business Impact

Evaluates the incident's effect on business processes, revenue, and compliance. It determines how critical the incident is to overall organizational operations.

User Impact

Evaluates the number of users affected by an incident or the severity of the functional limitation for an individual.

KnowledgeBase

A central repository containing documentation like FAQs, how-to articles, and troubleshooting guides.

Service Value System (SVS)

The ITIL SVS describes how all the components and activities of the organization unite as a system to enable value co-creation.

Keywords:

outage, service event, degradation, unplanned, down, unavailable, incident, incidents, AAR, SEL, process, Incident Record, report, service offering, SMO, ESMO, status, StatusHub

Doc ID:

86165

Owned by:

ESMO G. in University of Illinois Technology Services

Created:

2018-10-01

Updated:

2026-06-29

Sites:

University of Illinois Technology Services

1 0 Comment Suggest new doc Subscribe to changes