Escalation Procedure (rev October 27, 2003)

Jump to Severity Three crisis management
Definitions
Assignment and Aging
Sample problems and their severity levels
Back to Main On Call Page

Escalation Definitions

Following are the definitions for the four severity levels referenced in this escalation procedure.
There is also a flow picture of how severity levels might change for a given problem.

Severity 0: lowest impact. Default for unassigned problems. Requires no active
assignment in a database. Definition:"Minor Failure [Single user, non-critical facility,
not clustered patient care]"

Severity 1: May be assigned by agent upon report. Definition: "Multiple Users, or
Single in Critical Area".

Severity 2: Equivalent of severity 1 with limited direct assignment by agent. This
level is typically a severity 1 problem with expected > 12 business hour repair
time escalated. Definition: Severity 1 definition with repair expected to be
greater than 12 business hours from the time of the report.

Severity 3: Major failure; CBX or Voicemail node down. Strategic business unit
network down. Backbone down. Internet connection down.

There are two e-mail lists used to communicate and escalate problems based on their severity. These messages are sent from the NTOC. One of the email lists sends the 33999 page and the other list is used to send a text message to a VIP list.

Escalation Flows

External Report’s Initial

Assignment and Ageing

0 1 3

2

Jump to Severity Three crisis management
Definitions
Assignment and Aging
Sample problems and their severity levels
Back to Main OnCall Page

Severity 0 : Minor Failure [Single user, non-critical facility, not clustered patient care]
Voice Examples:
NEC phone, Analog line, Metered Business Line (no dial tone, features not working, etc.);
VoiceMail box (can't access, lost messages);

A single individual's pager is not working.

A single student authorization code is not functioning.
Network Examples:
Soft failure degradation of departmental network performance (network functioning but
not efficiently), or intermittent component failure. Network problem clearly identified to be
within the department's internal network.

Patient Care Examples:
Single patient unit phone, or multiple patient unit phones where there is remaining service,
such that patient care has minimal impact.
Response time:
Average 4 elapsed hours, up to 12 business hours. Discretionary negotiation between
Manager On Call and patient unit leader. Staff phones serviced same evening. Patient
phones escalated through the patient phones telephone # x50143.

Notification / Business Hours:
Network and Telecommunications Operations Center (NTOC) > technician.
Problem owner is responsible for communication with customer and Network and Telecommunications Operations Center (NTOC).

Notification / After hours;
Network and Telecommunications Operations Center (NTOC) via Answering Service >
Manager On Call >
customer >
Network and Telecommunications Operations Center (NTOC) evening reporting number @ 3-1159.
tech >
customer >
Manager On Call >
Closure report to Network and Telecommunications Operations Center (NTOC) @ 3-1159.

Escalation:
At any point, if trouble is determined to have propagated to more than one user, trouble
will be immediately escalated to a "severity 2 or 1" as appropriate by the technician
dispatched. Technician and Manager On Call discuss escalation to telecommunications
engineer on call.

Response > 12 business hours? = escalation to severity 1

Comments:
Customer and Network and Telecommunications Operations Center (NTOC) will be advised of status if problem is not projected to be
resolved within 12 business hours. Long standing problem status will be provided to
Network and Telecommunications Operations Center (NTOC) Triage position and customer by EOB day by the problem owner**1

Jump to Severity Three crisis management
Definitions
Assignment and Aging
Sample problems and their severity levels
Back to Main OnCall Page

Severity 1: Multiple Users, or Single in Critical Area
Voice Examples:
Can't access area code (software config); several phones down in an area (CBX group);
T line down to medical off site location; any problem, single critical user or area. Multiple
critical phones out of service in given area;

A single pager is not operational for a person who is on-call.

Network Examples:
Hard failure of important network components (department LAN, network router interfaces,
SLA devices, single switch, internet connectivity or single modem pool server).

Security Examples:
Limited display of individualized security concerns on individual, isolated machines. See Incident Response Team or other actions following the generalized response time section.

See response section below.
Patient Care Examples:
Three or more patient unit phones, or multiple staff phones in patient unit where business
is impacted and patient care may degrade as a result.
Response time:
Immediate dispatch for escalated trouble ticket (>12 business hours). On site within 1.5 to 2 hours (for non-escalated trouble tickets). Average response in 4 elapsed hours, up to 6 hours before escalation to severity two or three, as appropriate. Discretionary negotiation between Manager On Call and patient unit leader. Staff phones serviced same evening. Patient phones escalated through the patient phones telephone #. Critical phones on the Emergency Preparedness list require response with two way
radios (or cellular, if the customer prefers). Immediate response by technician who troubleshoots the system suspected. Contacts the customer if required for further information to assess the problem. Determine problem, advise Network and Telecommunications Operations Center (NTOC), resolve.
Technician/engineer assigned will be dedicated to problem until it is resolved.
Network and Telecommunications Operations Center (NTOC) immediately advises customers. If after one hour from start of problem determination, problem cause not identified, consult with switch tech and/or
engineering. Switch tech assumes ownership of the problem (even if engineering consulted). Switch Tech and Engineer assess the problem, advise the Network and Telecommunications Operations Center (NTOC), fix the problem. If appropriate, trouble escalated to vendor.

Security Incident Response Team Actions:
Upon receiving notification of a serious security vulnerability, the Engineer On Call e-mails notification to mailto: and works with the Incident Response Team as well as any desktop support staff to address the localized problem and keep it localized.
Notification / Business Hours:
Network and Telecommunications Operations Center (NTOC) > technician
Problem owner is responsible for communication with customer, Manager On Call,
and Triage at Network and Telecommunications Operations Center (NTOC).
Notification / After hours;
Network and Telecommunications Operations Center (NTOC) via Answering Service >
Manager On Call >
customer >
------assessment of severity two, potential escalation
------and notification per severity one guidelines
----- if affecting patient care, potential consult with
----- AOC to validate impact and whether a need to
----- escalate
Network and Telecommunications Operations Center (NTOC) evening reporting number @ 3-1159.
tech and engineer >
customer >
Manager On Call >
Closure report to Network and Telecommunications Operations Center (NTOC) @ 3-1159.
If severity two, the Network and Telecommunications Operations Center (NTOC) or Manager On Call may opt to initiate a severity page following the NTOC response process (at the end of this section). After-hours the Manager On Call may choose to initiate a call-tree.
Escalation:
At any point, if trouble is determined to have propagated to strategic business
units (e.g. Emergency or event locations) multiple users, trouble will be immediately
escalated to a severity two or three, as appropriate assessed by the technician and/or
engineer on site. Technician and Manager On Call discuss escalation to
telecommunications engineer on call. Consultation with AOC may occur to assess.
Comments:
Customer and Network and Telecommunications Operations Center (NTOC) will be advised of status if problem is not projected to be
resolved within 12 business hours. Long standing problem status will be provided to
Network and Telecommunications Operations Center (NTOC) Triage position and customer by EOB day by the problem owner*1

Jump to Severity Three crisis management
Definitions
Assignment and Aging
Sample problems and their severity levels
Back to Main On Call Page

Severity 2: Multiple Users, or Single in Critical Area with response time likely to exceed the initial 12 hours from time of report.

Voice Examples:
Can't access area code (software config); several phones down in an area (CBX
group); T line down to medical off site location; any problem, single critical user
or area. Multiple critical phones out of service in given area.

Network Examples:
Hard failure of important network components (department LAN, network router
interfaces, SLA devices, single switch, internet connectivity or single modem
pool server).
Security Examples:
Display of security concerns on machines with somewhat limited impact to the business environment. Limited danger to University assets. Presents a level of inconvenience. See Incident Response Team or other actions following the generalized response time section. Specific examples include port scans, large numbers of virus infected e-mail messages, reports of attempted exploit originating from the UofR. See security response section below.
Patient Care Examples:
Three or more patient unit phones, or multiple staff phones in patient unit where
business is impacted and patient care may degrade as a result.

Response time:
Immediate dispatch. On site within 1.5 to 2 hours with sniffers or other complex diagnostic devices. Average response in 4 elapsed hours, up to 6 hours before escalation to severity three. Discretionary negotiation between Manager On Call and patient unit leader. Staff phones serviced same evening. Patient phones escalated through the patient phones telephone #. Critical phones on the Emergency Preparedness list require response with two way radios (or cellular, if the customer prefers). Immediate response by technician who troubleshoots the
system suspected. Contacts the customer if required for further information to assess the problem. Determine problem, advise Network and Telecommunications Operations Center (NTOC), resolve. Technician/engineer assigned will be dedicated to problem until it is resolved.
Network and Telecommunications Operations Center (NTOC) immediately advises customers. ITS web site is updated. If after one hour from start of problem determination, problem cause not identified, consult with switch tech and/or engineering. Switch tech assumes ownership of the problem (even if engineering consulted). Switch Tech and Engineer assess the problem, advise the Network and Telecommunications Operations Center (NTOC),
fix the problem. If appropriate, trouble escalated to vendor.

Security Incident Response Team Actions:
Upon receiving notification of a serious security vulnerability as previously described, the Engineer On Call e-mails notification to mailto: and works with the Incident Response Team as well as any desktop support staff to address the problem reduce further spread. Consider whether escalation to severity three is called for. If so, determine whether we communicate via and mail lists.

Notification for other than a Security Issue / Business Hours:
Network and Telecommunications Operations Center (NTOC) > technician
Problem owner is responsible for communication with customer, Manager On
Call, and Triage at Network and Telecommunications Operations Center (NTOC).
Notification for other than a Security Issue / After hours;
Network and Telecommunications Operations Center (NTOC) via Answering Service >
Manager On Call >
customer >
------assessment of severity two, potential escalation
------and notification per severity one guidelines
----- if affecting patient care, potential consult with
----- AOC to validate impact and whether a need to
----- escalate
Network and Telecommunications Operations Center (NTOC) evening reporting number @ 3-1159.
tech and engineer >
customer >
Manager On Call >
Closure report to Network and Telecommunications Operations Center (NTOC) @ 3-1159.
If severity two, the Network and Telecommunications Operations Center (NTOC) or Manager On Call may opt to initiate a severity page following the NTOC response process (at the end of this section). After-hours the Manager On Call may choose to initiate a call-tree.

Escalation:
At any point, if trouble is determined to have propagated to strategic business units (e.g. Emergency or event locations) multiple users, trouble will be immediately escalated to a "severity 3" as appropriated by the technician and/or engineer on site. Technician and Manager On Call discuss escalation to telecommunications engineer on call. Consultation with AOC may occur to assess.

Comments:
Customer and Network and Telecommunications Operations Center (NTOC) will be advised of status if problem is not projected to be resolved within 12 business hours. Long standing problem status will be provided to Network and Telecommunications Operations Center (NTOC) Triage position and customer by EOB day by the problem owner*1.

Jump to Severity Three crisis management
Definitions
Assignment and Aging
Sample problems and their severity levels
Back to Main On Call Page

Severity 3 : Major Failure; CBX or VoiceMail Node down. Strategic (critical) business unit network down. Backbone down. Internet connection down. Serious information security problem affects multiple clients and has non-trivial impact creating outages versus inconvenience. Information security problems that include serious disruption to business activities, exemplified by problems such as worms and security vulnerability exploits, especially those launched from the UofR community.

IMMEDIATE Organizational Actions, if voice services impacted.

  1. Upon receipt of a severity three condition, during daytime business hours, the Network and Telecommunications Operations Center (NTOC) notifies a Senior Manager of the severity one and assigns a scribe to record events in a chronology. During non-business hours, the Manager On Call notifies a Senior Manager. Notifier clearly states "severity three emergency condition".
  2. Chain of command defined.
  3. The Senior Manager determines a designated location for the response team.
  4. Manager presence in the affected area, as appropriate.
  5. Lead Technical Engineer assigned.
  6. Two way radios are distributed as follows:
    a) Network and Telecommunications Operations Center (NTOC);
    b) Manager;
    c) Emergency Operations Center (EOC) rep;
    d) Lead technical engineer;
    e) Comm Center (as appropriate);
    and / or
    f) Area runner and walkabout. Two-way radios will be available locked in the Network and Telecommunications Operations Center (NTOC) for this purpose.
    g) Each holder of a radio must be familiar with two-way radio guidelines. Note that the key to the cabinet holding the radios is in the Network and Telecommunications Operations Center (NTOC) Team Leader's desk drawer. It is labeled radios.
  7. Emergency Operations Center phone number list distributed to individuals (either x50500, or two-way radio). Non-University phone service at this location.
  8. 15 minute updates initiated.
  9. Debrief document available within 72 hours of re-institution of service.

Response Time:
Immediate and continuous effort. Immediate technician dispatch and engineering involvement. Immediate response by technician to Network and Telecommunications Operations Center (NTOC) who advises status every 15 minutes. Work begins immediately. Identify, report and resolve problem. Technician/engineer assigned to this problem will be dedicated to its resolution until fixed. Complex diagnostic gear are immediate brought to both ends of communications points.