Upload
a4agarwal
View
221
Download
0
Embed Size (px)
Citation preview
8/14/2019 Service Desk Incident Triage Matrix
1/23
Incident Management Process:
24x7 Response and Control
April 6, 2005
V1.12
8/14/2019 Service Desk Incident Triage Matrix
2/23
Revision History
Revision History
Version Date Author Notes
1.08 23 Feb 2005 Nan McKenna (Initial tracked version)
1.09 15 Mar 2005 Erik Cummings
Extract return to work as Appendix C, add proposed
15/30 minute response times. Add Revision Historypage.
1.10 22 March 2005 Erik CummingsDifferentiate between Initial PCG IncidentClassification and Final Incident Classification.Added PCG Process Flowchart
1.11 23 March 2005 Bruce Campbell
Updated Revision History Table
From header, removed Draft
In header, body of document, moved OperationsExcellence to top left margin, placed IncidentManagement Process top, right margin
Re-applied styles, numbering, and organization
Added On-Call to Appendix EAdded Management On-Call to Appendix F
Turned-off numbering in Appendixes E & F
Re-organized appendixes so that process flowdiagrams were one-after-the-other
Updated references to the various appendixesthroughout document
Reworded Section 2.3 a Note
1.12 04 April 2005 Erik Cummings
Removed Appendix C (PCG Process)Renumbered Appendix D C and any references toit.Changed Appendix D (now C! - CommunicationsMatrix). Removed Contact Action, 1 st and 2 nd LevelNotification columns. Added Client Comm Intervaland SME Work Started columns.Added new Appendix D Priority and InternalResponse Time CommitmentsAdded new definitions Priority, Impact, Urgency
Table 1 Revision History
8/8/2008 v1.12 Page ii
8/14/2019 Service Desk Incident Triage Matrix
3/23
8/14/2019 Service Desk Incident Triage Matrix
4/23
8/14/2019 Service Desk Incident Triage Matrix
5/23
8/14/2019 Service Desk Incident Triage Matrix
6/23
List of Tables
List of Tables
Table 1 Revision History................................................................................................ ............... ............ii
Table 2 Detailed Incident Control Process ...................................................................................... ......11
Table 3 Explanation of High-Level Incident Management Process Flow............................... .............13
Table 4 Incident Level Classification Matrix........................................................................ ................ ..17
Table 5 Return-To-Work Guidelines...................................................................................................... ..21
8/8/2008 v1.12 Page iv
8/14/2019 Service Desk Incident Triage Matrix
7/23
Operations Excellence Incident Management Process
1.0 Executive Summary1.1Document Contents
1.a. This document contains processes, through the use of which the newProduction Control Group will be able to quickly and efficiently respond to,
manage, and resolve incidents. Documentation includes on-calldefinitions and guidelines, escalation processes, process flow diagrams,and data tables, sets general expectations, defines roles andresponsibilities, and provides general guidelines.
1.2Intended Audience
1.a. This document is directed at and intended for executive level andmanagement personnel, ITSS personnel, including all of those areincluded in this process, such as: Subject Matter Experts (SMEs)Technical Leads, Line Managers, Systems Administrators, DBAs, projectleaders, and facilities personnel.
8/8/2008 v1.12 Page 5 of 21
8/14/2019 Service Desk Incident Triage Matrix
8/23
Operations Excellence Incident Management Process
2.0 BackgroundIt is expected that most services supported by ITSS are available 24x7. As a result of this expectation, it is in the best interest of ITSS Shared Services workgroups and ITSSas a whole to develop and establish a combined staff the Production Control Group(PCG) dedicated to proactively managing and responding to events as they occur.
Eventually, the role of the PCG will include incident evaluation, and depending on theseverity of the event, escalate to upper management. In some situations, the moreexperienced level technical personnel will take action to effect repairs and/or restoreservices.As the PCG acquires experience, and as ITSS adds monitoring and troubleshootingcapability, they will assume additional incident response responsibilities.2.1Primary Responsibilities of the Production Control Group
1.a. Managing and controlling a widespread service outage, including incidentreporting and escalation.
2.2Incident reporting and escalation techniques will:
1.a. Specify a point-of-contract (owner) for all issues and ensure that servicesare restored through the prudent use of departmental resources, includingdocumentation of the incident from beginning to its resolution.
1.b. Effectively manage the communication of information within ITSS whenthere are issues that actually or potentially impact ITSS-supportedservices or facilities.
1.c. Pro-actively respond to issues that impact ITSS-supported services andfacilities; evaluate, classify, escalate, and manage service restorationefforts efficiently and as expeditiously as possible, up through incidentresolution.
2.3Additional Responsibilities of the Production Control Group
1.a. Note: It is anticipated that any single-shift of the PCG will NOT beconsumed by continuously resolving issues. Because of this,supplemental duties and tasks, detailed below, will be assigned.
1 Assist offsite Subject Matter Experts by performing requested tasks,such as visual inspections of hardware and recycling the power onequipment as instructed.
2 Manage and prepare magnetic media for rotation, offsite shipmentand storage, including organizing and filing transmittal logs.
3 Control building and facility access, escort vendors to restricted areasfor the purposes of inspection, maintenance, and repair of equipment.
4 Monitor building/facility/ data center environmentals, such as: air conditioning, fire suppression system, lighting, and so on, log timesand results of the monitoring activity.
5 After normal working hours, perform 1st tier triage of reported issues,classify and escalate as necessary.
6 Receive and log calls from end users, and generate Remedy tickets,escalate as necessary.
7 Set up Video/Telephone conferences.
8/8/2008 v1.12 Page 6 of 21
8/14/2019 Service Desk Incident Triage Matrix
9/23
Operations Excellence Incident Management Process
8 Accept and sign for emergency delivery of replacement parts fromvendors.
9 Perform other tasks deemed necessary by department supervision.
3.0 Roles and Definitions
Account Manager A member of the ITSS Account Management team in ClientSupport who is responsible for the relationship with one or several key clients (e.g.GSB, H&S, Libraries)
Client A primary paying customer of ITSS services and support End User Person who directly uses a service. An end user could be an internal or
external to ITSS. End users are directly impacted during an outage, and generallyhave an established relationship with the Client or Service Owner
Impact Level of effect or impact on the Stanford Campus. This is relative to theCampus as a whole, not specifically to the client. (Values= Campus-Wide, Major School or Dept wide, Minor Group or Single User, and Non-Service Affecting)
Incident Manager The Shared Services Line Manager who is designated as
responsible for a specific incident Incident/Event/Problem/Issue For the purposes of this document, these terms
are intended to mean a failure of any component of any system or service, and areused interchangeably throughout this document
ITSS Client Support Group which does client relations, account management,functional analysis, sales & marketing, documentation, software licensing, end user training, and Help Desk and CRC support
ITSS Engineering and Projects Group which does technology R&D, serviceenhancements, new product and service projects
ITSS Shared Services Group which does operations ITSS Strategic Planning Includes technology strategy & architecture and finance
groups Line Manager Workgroup managers in ITSS Shared Services On-Call Subject Matter Expert (SME) SME (see below) who is designated to be
available to respond to reported outages, triage the incident, perform the neededtasks to restore services, assist other workgroups in the restoration process, or determine which other members within their own workgroup are needed to assist inservice restoration
Operations Owner The ITSS staff person who has the ultimate authority for aservice including its functionality and approval for any changes to the service
Priority Level of response and effort directed towards resolving an incident. It isdetermined by the inherent service level commitment of the service, as well as acombination of Urgency and Impact. Priority is sometime referred to as severity.(Values = Urgent, High, Medium, Low)
Product Manager Own product quality and client satisfaction for a service Production Control Group (PCG) Group which will perform monitoring and basic
problem determination and evaluation, escalation, communication and in somecases, incident resolution
Subject Matter Expert (SME) Any technical ITSS staff person whose job requiresextensive technical knowledge of network and service components and their related
8/8/2008 v1.12 Page 7 of 21
8/14/2019 Service Desk Incident Triage Matrix
10/23
Operations Excellence Incident Management Process
requirements. SMEs are considered experts and possess a detailed knowledge of service functionality, restoration, component/service repair.
Satellite Operations Center (SOC) The SOC is a partner with the UniversityEmergency Operations Center (EOC) during Level 2 (major building fire, extendedpower outage) or Level 3 (major earthquake or extensive flooding) emergencies.The ITSS SOC team provides real-time field information to the EOC as well ascoordinating and directing emergency responses.
Urgency End user or clients assessment of the importance and/or urgency of theissue as it affects their ability to perform their work. This value is provided by thecustomer. (Values = Urgent, High, Medium, Low)
8/8/2008 v1.12 Page 8 of 21
8/14/2019 Service Desk Incident Triage Matrix
11/23
Operations Excellence Incident Management Process
4.0 Process Review4.1Process Outline
1.a. Note; There are six major steps in this process, from the time of incidentdetection through root cause analysis and implementing preventative
measures.4.2Incident Detection and Reporting
1.a. An incident can be detected by:
1 From an end-user
2 From a client
3 From an SME
4 From automated monitoring
1.b. It is important that the sharing of information occur between and amonggroups.
1.c. The process of reporting of problems is different between normalworking hours, 8:00 A.M. to 5:00 P.M., M-F, and after those hours.
4.3Incident Level Classification: See Appendices C and D
1.a. This includes assigning a severity level to the incident, and its subsequententry into the Remedy incident tracking system.
4.4Incident Notification
1.a. This includes notification to an ITSS Incident Manager and clients, andincludes outage information posted on the SU Web site, Cable TV,informational messages left on the designated voice mail box, and emailsent to designated personnel and other client notification as deemedappropriate.
4.5Incident Escalation
1.a. This includes escalation to the ITSS Incident Manager, and anysubsequent escalation calls deemed necessary. Note that the severitylevel will dictate who in the management chain of command to contact,and when to provide them status reports. Additionally, the PCG willdetermine whether or not the incident needs to be escalated to the SOC.
4.6Incident Resolution
1.a. This covers work performed during the incident itself, with responsibilitiesas follows:
1.b. The Incident Manager is responsible and accountable for the overallrecovery effort, performing the following functions:
1 Establishing recovery priorities
2 Coordinating and delegating responsibilities as they relate to therecovery effort.
3 Issuing requests for additional resources
8/8/2008 v1.12 Page 9 of 21
8/14/2019 Service Desk Incident Triage Matrix
12/23
Operations Excellence Incident Management Process
4 Ensuring the participation of critical internal and external supportgroups and vendors, such as the recall of media from the off-sitestorage vendor, or the purchase of replacement parts and equipment
5 Reviewing and approving tactical plans
6 Communicating incident status to ITSS management/executives asneeded
7 Working with Client Support to approve and authorize the release of information to other schools and departments
1.c. SMEs and Line Managers are responsible for analyzing technicalproblems and making technical decisions, implementing tactical plans,and communicating to other SMEs as well as the Incident Manager.
1.d. The PCG is responsible for coordination of the incident resolution effortand for communication as deemed necessary.
4.7Post-Incident Activities
1.a. This covers the activities after the incident is resolved.
1 The first task is to ensure that any post-incident cleanup is completed
2 Perform root cause analysis of the incident,
3 To avoid similar, future incidents, determine what processimprovements and preventative measures that can be put into place.
4 Implement changes in process or technical support as appropriate.
5 Ensure that PCG receives feedback and input from the user community,
6 Perform client follow-up and ensure that an incident response qualitysurvey form is available for end-user and client feedback.
8/8/2008 v1.12 Page 10 of 21
8/14/2019 Service Desk Incident Triage Matrix
13/23
Operations Excellence Incident Management Process
5.0 Detailed Incident Control Process5.1Detailed Process Flow Explanation Table. Reference Appendix A
Process # Process Name Detailed Description Action ByIncident Detection and Reporting
1 Problem Reporting:End Users
End-users will call 5-HELP or use the web athttp://helpsu.stanford.edu/ . Telephone calls aredirected to the ITSS Help Desk where the problem isevaluatedIf the Help Desk (any tier) determines that this isan urgent incident, the call/ticket should bedirectly escalated to the PCG
End-User
2 Problem Reporting:Clients
In most cases, clients should call 5-HELP or use theweb at http://helpsu.stanford.edu/ . In some specialcases, clients may have direct access to the PCG for reporting problems and receiving updates. In this case,skip to step 12.
Client
3
Problem Reporting:
End Users After Hours
If an end-user calls 5-HELP after hours, the user will getthe recorded phone tree. Users can choose to getthrough to the PCG directly, or leave a recorded
message. For after hours calls, the PCG will determinewhether call is urgent. If the issue is not urgent, thePCG will enter a ticket in Remedy for review thefollowing business day.
End User, PCG
4 Problem Reporting:Monitoring to SMEs
In some cases, monitoring may notify a SME or aproblem before a user, client or the PCG. If the issue isurgent, escalate directly to the PCG for coordinationand entry into Remedy.
SME
5 Problem Reporting:Monitoring to PCG Monitoring reports information directly to PCG PCG
6 Resolve? Help Desk assesses whether the ticket can be resolvedat this point. If so, the Help Desk will resolve and close. Help Desk
7 Urgent?If the ticket cannot be resolved, Help Desk to determinewhether the ticket should be forwarded to SME/HelpDesk Tier 2 or to the PCG
Help Desk
8 Forward To SME If the case does not appear to be severity Urgent/High,forward to SME Help Desk
9 Resolve Quickly? Can the case be resolved by the SME and is it SeverityLevel Medium/Low? SME
10 Enter Solution InRemedyIf the SME can quickly resolve the case, enter solutionin Remedy and close ticket. SME
11 Forward To PCGIf the SME determines that there is impact beyond asimple fix and the Severity Level is Urgent/High, notifythe PCG.
SME/PCG
Classification
12Assign SeverityLevel
Assign a severity level to the incident; using thestandard ITSS categories (see Appendix C and D). Theseverity levels govern:Level of action to be taken by the Production Control
GroupNotification and escalation guidelinesTime intervals in which to provide status reportsTime intervals in which to initiate escalation andmanagement decision processes
PCG
13 Enter In Remedy Enter a ticket for the incident into the Remedy HelpDesk application. PCG
Table 2 Detailed Incident Control Process
8/8/2008 v1.12 Page 11 of 21
http://helpsu.stanford.edu/http://helpsu.stanford.edu/http://helpsu.stanford.edu/http://helpsu.stanford.edu/http://helpsu.stanford.edu/8/14/2019 Service Desk Incident Triage Matrix
14/23
Operations Excellence Incident Management Process
6.0 High-Level Incident Process Explanation6.1Detailed Process Explanation: See Appendix B
NotificationSME Notify appropriate SME(s) if necessary, using AMCOM on-call system PCGUpdate itss-service-alerts@lists
Send a message to [email protected] PCG
Post Messages To Web,Phone, TV
Message information will include: the date and time, a brief description of the problem, and if available, the estimated time of resolution/restoration.
Web: Update status on down.stanford.edu
Telephone: In the event of a major network failure, update the designatedvoicemail box: 7-DOWN
SU Cable TV ITSS can have pre-worded messages set for broadcast,where the group can just fill in the blanks.
PCG
Escalation
Notify Line Manager Contact the Shared Services Line Manager of the affected system. If aLine Manager is unavailable, use the AMCOM system to determine thebackup.
PCG
Determine IncidentManager
If the incident falls into the area of a single Line Manager, that LineManager will contact the Incident Manager. If multiple Line Managers areinvolved, they must determine a single Incident Manager.
SharedServicesLineManagers
Send Email
Send first email to appropriate lists/clients, based on Service LevelAgreements. Use the [email protected] list for campus-wide outages; the Incident Manager should approve any messages whichgo to this list.
PCG,IncidentManager
Escalate To Senior Management
The Severity Level (see Appendix C and D) will determine the escalationto management PCG
Resolution
Incident Management
The Incident Manager will take ownership of the problem and manage the
incident. Responsibilities:Establish priorities
Coordinate and delegate responsibilities in regards to the recovery effort
Request additional internal or external resources
Ensure and manage the participation of critical internal and externalsupport groups and vendors
Review and approve tactical plans
Communicate incident status to ITSS management/executives as needed
Work with Client Support to release information as needed to clients/users
across campusResolve Incident SMEs are responsible for analyzing technical problems, implementingtactical plans, and communicating to other SMEs and with the PCG. SMEs
8/8/2008 v1.12 Page 12 of 21
mailto:itss-service-alerts@listsmailto:[email protected]:[email protected]:[email protected]:itss-service-alerts@listsmailto:[email protected]8/14/2019 Service Desk Incident Triage Matrix
15/23
Operations Excellence Incident Management Process
Post ResolutionInformation To Web,Phone, TV
Message information will include: the date and time, a brief description of the problem, and if available, the estimated time of resolution/restoration.
Web: Update status on down.stanford.edu
Telephone: In the event of a major network failure, update the designatedvoicemail box: 7-DOWN
SU Cable TV ITSS can have pre-worded messages set for broadcast,where the group can just fill in the blanks
PCG
Post Incident Analysis
Complete Cleanup Tasks Determine whether cleanup is required, and identify who will own andperform the additional clean-up tasks SME, PCG
Root Cause Analysis
It is the responsibility of the manager of the PCG to initiate root causeanalysis, collecting as much information as possible, and to ensure thatany information which will help in resolving future incidents is entered intothe related Remedy ticket for future use.
PCGManager
Incident Prevention Determine processes which can be implemented to prevent a repeat of the incident.
SharedServicesManagers,SMEs
Client/User Follow-upEnsure selected members of the recovery team make follow up calls tothe affected users, to solicit their constructive comments. Share results of the analysis with workgroups and clients where appropriate.
PCG
Quality SurveyITSS will make an on-line survey available for user/client feedback, andfor ITSS staff. The PCG is responsible for tallying survey results andmaking them available to the appropriate ITSS staff and managers.
PCG
Table 3 Explanation of High-Level Incident Management Process Flow
8/8/2008 v1.12 Page 13 of 21
8/14/2019 Service Desk Incident Triage Matrix
16/23
Operations Excellence Incident Management Process
7.0 Outstanding Issues7.1A common paging system is required
1.a. AMCOM for manual paging
1.b. What to use for automated paging from monitoring systems?
7.2Definition of Service Hours
7.3Definition of availability, outage, and service degradation
7.4Service-level procedures for client notification
8/8/2008 v1.12 Page 14 of 21
8/14/2019 Service Desk Incident Triage Matrix
17/23
Incident Detection & Reporting
PCGClient Help DeskAutomatedMonitoringSMEEnd User
Report Problem:HelpSU/5HELP
Report Problem:HelpSU/5HELP
Resolve?
Report Problem
Urgent?
NoForward To SME
(Help DeskTier 2) For AdditionalAnalysis
No
Forward DirectlyTo PCG
ResolveQuickly?
Enter Solution In
Remedy
Yes
No
1 1 4
Calls5-HELP After
Hours
Calls5-HELP After
Hours
3 3
7
6
8
9
10
11
Report Problem
5
Enter IncidentTicket InRemedy
Report Problem:Directly To PCG
Yes
2
DetermineSeverity
Level
12
13
Operations Excellence Incident Management Process
Appendix A Incident Management Process FlowchartReference Table 1 Detailed Incident Control Process
1.a. Note that the circle numbers in the flowchart correspond to the numberson table 2, page 10.
Figure 1 Incident Detection and Reporting
8/8/2008 v1.12 Page 15 of 21
8/14/2019 Service Desk Incident Triage Matrix
18/23
Operations Excellence Incident Management Process
Appendix B High-Level Incident Management Process Flow
Figure 2 High-Level Incident Management Process Flow
8/8/2008 v1.12 Page 16 of 21
ProductionControl Group
Subject Matter Expert
Monitoring
Line Manager
7-DOWN End User
Client
System Status
End User Client
HelpSU/5-HELP
Help DeskTier 1
DetectionReporting
Classification
N o
t i f y
U p
d a
t e
Notification
Escalation
Resolution
Post Incident Activities Production
Control GroupSME
ProductionControl Group
U p d a t e
Self-Service
Classify Incident Level & E nter in Remed y
AccountManager
SME
C o m m u n i c a t e
RemedyDatabase
Communicate
ProductionControl Group
Duty Manager
U p d a t e W i t h S o l u t i o n
U p d a t e
E m e r g
e n c y
RemedyDatabase
Up d a te
PCG Manager
U p d a t e w i t h S o l u t i o n I n f o r m a t o n
Classify Incident Level & Enter i n Remedy
C o m m un ica te
C o m m u n i c a t e
Line Manager
Duty Manager
RemedyDatabase
SOC/EOC
Liaison
8/14/2019 Service Desk Incident Triage Matrix
19/23
Operations Excellence Incident Management Process
Appendix C Incident Level Communications MatrixLevel Description Incident Examples Client UpdateInterval
SME WorkStarted w/in:*
Urgent
A major service outagewith significant and
immediate businessimpact and noworkaround.
Large number of users
Outage of significant length
No availableworkaround
Mission/ businesscritical
Fire suppression system activation indata center
Loss of electrical power Entire network switch, closet and/or building outagesFailure of 1 or more high priorityservices e.g. Exchange, OracleFinancials, HRMS, PeopleSoftLarge denial of service attacks/;successful hacking; loss or altering of data; theft of data, simultaneous virusinfections
SU telephony systems
Initial Immediate.
Notification on-going:
hour
30 minutes
High
A major service outageor degradation with
significant businessimpact and anunsustainableworkaround.
Multiple users Work performance
reduced Mission/ business
critical
Failure of Storage system (storage areanetwork SAN)Failure of a server of a sensitive clientor user
Severely degraded performance
Smaller denial of service attacks
Initial Immediate.
Notification on-going:
1 hour
1hour
Medium
A service outage or degradation with anacceptable workaround.
Service-affecting Minimal
performancedegradation
Affects non-criticalbusiness function
Cannot connect to the internet, send or receive email
Hardware failure, cannot access data,cannot print
Degraded performance
As applicable. By
SME working issue.4 business hours
Low
Non service-affecting. Cosmetic problem System
enhancement
Previously requested enhancements toa system
Upon issue resolutionor as applicable with.By SME workingissue.
1 business day
Table 4 Incident Level Classification Matrix
* Note: This column indicates the most amount of time that will transpire before a technician beginsworking on an Incident. Times will generally be much faster for all severities.
8/8/2008 v1.12 Page 17 of 21
8/14/2019 Service Desk Incident Triage Matrix
20/23
Operations Excellence Incident Management Process
Appendix D Priorities and Internal Response Times
Note: The following table refers to Priority, not to Urgency or Impact. Priority is a combination of the combined Urgency, Impact, and existing Service Level Commitments for the service in question. Thisis an important concept to adhere to Urgency is offered by the customer, Priority is assigned by theHelpdesk, PCG, and/or SME involved from a system-wide perspective.
Usage: These Priority levels (and the associated Urgency and Impact values) are used to trackincidents as they are reported and worked on. Each of Priority, Urgency, and Impact relate directly toRemedy ticket fields.
8/8/2008 v1.12 Page 18 of 21
Priority DescriptionCommitted
ServiceHours
PCG CallInitiate
SME CallResponse
EscalationInterval
SME WorkStarted
Urgent
A major service outagewith significant andimmediate business
impact and noworkaround.
Large number of users
Outage of significant length
No availableworkaround
Mission/ businesscritical
24x7 Immediate 15 Minutes 10 minutes 30 minutes
High
A major service outageor degradation withsignificant businessimpact and an
unsustainableworkaround. Multiple users Work performance
reduced Mission/ business
critical
24x7 Immediate 15 Minutes 10 Minutes 1 hour
Medium
A service outage or degradation with anacceptable workaround.
Service-affecting Minimal
performancedegradation
Affects non-criticalbusiness function
8-5, M-FTicket
Assignment/eMail
Asappropriate
(workbegins, workupdate, workcompleted)
StandardSME Group
Remedysettings
4 businesshours
Low
Non service-affecting. Cosmetic problem System
enhancement
8-5, M-FTicket
Assignment/eMail
Asappropriate
(workbegins,
informationrequired,
workcompleted)
StandardSME Group
Remedysettings
1 businessday
8/14/2019 Service Desk Incident Triage Matrix
21/23
Operations Excellence Incident Management Process
Appendix E On-Call GuidelinesGuideline Purpose
To generally define and standardize:
On-call duties and responsibilities
A methodology for communications andengagement of problem determination andresolution
On-call scheduling
Response expectations/guidelines and generalescalation processes in the event 24 X7 on-sitegroup is engaged in an on-going event or incident.
System generated notifications will continue to be handled within the requiredtime frames by the individual SME groups.
DutiesRequirements for on-call responsibility must be identified inthe appropriate job descriptions, including: carrying apager, cell phone, availability of the employees homephone number, and email.
Responsibilities
Share on-call responsibilities with other members of thework group
Begin working on the event as soon as notified
This may require working from home or traveling to work. The decision to make aphysical appearance at work depends on thecircumstances of the event, such as:swapping hardware components or, an on-siteappearance by a vendor.
Communications
Teleconference Phone Bridge Telecom will have ateleconference number available to technical personnel,and the PCG. This will be used when the expertise of multiple SMEs is required to resolve an incident. It willalso permit the technical staff the capability tocommunicate as a group. Additionally, first-hand, the PCGwill be able to determine the status of the incident andkeep management informed without them actually beinginvolved in the conference call.
The AMCOM system will be the primary contactinformation/procedures lookup and paging tool for the 24 X7 on-site groups.
Staff will provide and track individual work group on-callschedules.
8/8/2008 v1.12 Page 19 of 21
8/14/2019 Service Desk Incident Triage Matrix
22/23
Operations Excellence Incident Management Process
The work group establishes the rotation.
Members of the work groups are responsible for maintaining and keeping current, the contact and coverageinformation on the on-call database.
Communications Elements
Required communications devices: pager or cell phone,personal phone.
Additional communications devices as recommended bythe SME groups: DSL, Treo, wireless-laptop, email.
Notification Protocol
Initial outgoing page
Re-page in 10 minutes
If a call-back is NOT received from the designated on-callSME within 15-minutes, begin escalation to the next on-call
person, including re-contacting the primary on-call personand the on-call Shared Services manager on allsubsequent pages.
Recipient to confirm garbled pages, follow call-backprotocol.
Initial Communications Tracking
Use AMCOM system for initial communications tracking
Response Protocol
15 minute call-back
Within 30 minutes, be actively engaged in problemdetermination and resolution
Actively engaged via:
Home system
Wireless laptop
On-site
SME groups may establish accelerated response profilesbased upon their response criticality
Scheduling
By SME group designSME schedule to be established and published in AMCOMsystem
SME contact instructions to be included
8/8/2008 v1.12 Page 20 of 21
8/14/2019 Service Desk Incident Triage Matrix
23/23
Operations Excellence Incident Management Process
Appendix F Management On-Call GuidelinesReturn-To-Work Guidelines
These guidelines are for Management to consider if extended hours have been worked due to outage/issue by
an on-call representative.These guidelines should be used to ensure there is alwaysan effective on-call representative, while protecting the on-call SME from overly extensive work-time.
If the primary on-call SME has already worked consecutiveextended hours, or multiple shifts, and a new event hasoccurred:
Either the manager will provide a backup andnotify the backup of their modified on-callstatus, or the entire group of SMEs will make adecision on the selection of an alternate SME to
be used in this situation.To allow staff members who are involved with an after hour call-out on Sunday through Thursday to obtain adequaterest, the following is provided as a sample set of guidelinesfor a return-to-work policy:
On-Call SME works until Report to work no later than0200 11000300 12000400 13000500 Take rest of day off
Table 5 Return-To-Work Guidelines