16
OCIO Leader On-Call Training

OCIO Leader On-Call Training. Agenda Definitions, Roles, Key Timelines IT Incident Process Communication Tools Helpful Hints Incident Reporting Review

Embed Size (px)

Citation preview

OCIO Leader On-Call Training

Agenda• Definitions, Roles, Key Timelines• IT Incident Process• Communication Tools• Helpful Hints• Incident Reporting• Review Scenarios

DefinitionsFrom MUHA C170 Information Technology Incident Management Policy

• Information Technology Incident: a period of time when an information system or service is either unavailable or functionality is slowed or severely restricted. Incidents include Downtime (system not available) and Service Interruptions (functionality is slowed or severely restricted.)

• Service interruption: Information system functionality is slowed or severely restricted due to system or network failure even though the system may not technically be “down”.

• Service Interruption, Part 2: Any incident or event that results in a user’s inability to reasonably perform their normal duties. The system may not be down, but poor system performance or reduced functionality.

IT Incident RolesFrom MUHA C170 Information Technology Incident Management Policy

• OCIO Leader On Call: (Formerly OCIO Director on Call). The OCIO Leader On Call is responsible for leading communication with operational leaders and ensuring technical coordination until the incident is resolved.

• OCIO Technical Leader: The OCIO director or manager for the technical team that is the primary owner of the incident. The OCIO Technical leader is responsible for incident resolution. “Ownership” of the incident may transition from one leader to another with agreement from the OCIO Leader On Call. The OCIO Technical Leader will engage resources from other teams as needed to analyze and resolve the incident.

• Administrator-on-Call: Designated MUHA on-call leader who is the senior operational point of contact to facilitate assessment of the impact or potential impact of information technology incidents or other disruptions to essential services and spearhead the notification / escalation processes.

Key Timelines• 5-15 minutes from Incident start

– Need to send desktop alert or notify Operations to send the alert.

• 30 minutes from Incident start– If system is still impacted, need to initiate the conference call.

• 45 minutes from Incident start– Should have conference call initiated by this time.

• Within One Hour of Incident start– Clear instructions sent to end users with downtime procedures and

specific incident information.

OCIO Service Interruption Process Potential Serious Interruption: -Data Center Facility -Hardware -Network -Interface Symptoms Persist

Monitor Alert or OCIO Help Desk

“notifies OCIO on-call”

Potential Serious Interruption: -Application Function -Interface Symptoms Intermittent

Monitor Alert or user(s) call OCIO

Help Desk

OCIO Help Desk “notifies OCIO

on-call”

Escalation Process -Team Leader ---Manager -----Director -------OCIO Leader*

no

OCIO leader leads the 1st Service Interruption Conference Call

Send All Clear Status to Service

Interruption Group

Admin On-Call or Business Unit

assumes Ownership of

Conference Calls

Fix Complete in 15 mins?

Further Diagnosis Needed yes

Internal OCIO Conference

Call 876-9699

Desktop Alert call Operations 876-5000 and

dictate message

Rel: 1/21/15

End

yes

no

OCIO Leader contact impacted

Business Unit

*OCIO Leader On-Call

30 to 45 minutes into Interruption

MUSC or MUHA/ MUSC-P

MUSC

OCIO Leader call 792-2123 ask to page Admin On-Call

OCIO Leader call 792-2123 ask to page I/S Downtime Group

yes

no MUHA/MUSC-P

End

send email to [email protected] Make sure entry put into downtime log

IT Incident Conference Call• The conference call can be initiated by calling Hospital

Communications at 2-2123 and ask for the Supervisor. Tell the supervisor that you need to convene an IS Downtime Conference Call.

• If you have not consulted with the MUHA Administrator on Call, give them a courtesy call to let that person know what is going on and why you are having the conference call.

• Talking to MUHA Admin on Call should be initiated by the Call Center. The Call Center Operator will page the MUHA AOC to call the Call Center. The operator will conference the AOC with the LOC. This is needed to ‘verify’ the admin on call responded and when.

Conference Call Participants MSSG ID NAME

17870 ADMIT-TRANSFER CTR

11523 Burke, Kay

20602 OCIO/DIRECTOR-ON-CALL

12029 Deweese, Duane

12951 Becker, Robin

13105 Forinash, Melissa

17812 HS-ADULT- UH

12854 Hoffman, Jack

20545 PATHOLOGY & LABORATORY MEDICINE/LABORATORY INFORMATION SERVICES (LIS)

20650 ADMINISTRATOR ON CALL/MUHA ADMIN

11599 Garrison, Kelli

18091 OCIO DATA CENTER OPERATIONS

20701 PHARMACY and PHARM D/PHARMACY INFORMATION SYSTEMS

18229 RADI PACS EMERGENCY

12884 Seyfried, Brett

11734 Walsh, Tasia

14799 Warren, Robert

OCIO LOC Conference Call Responsibilities• The LOC needs to participate and make sure there are

OCIO resources on the call that can clearly explain what is down, the impact to the end user, and the estimated time for full functionality to be restored.

• Perform roll call – Check roll call sheet. If missing anyone, call them directly.

• Help determine what communication needs to be sent to the users.

• Determination if whether a full conference call is needed again, or if a smaller group of individuals will monitor the situation and keep others informed.

• Determine timing of next conference call.

Communication Tools

• Desktop Alerts (also known as eAlerts)• Email• Epic Message of the Day• Simon Paging• Downtime website

http://www.musc.edu/downtime• Rave Alert• Microsoft Lync

Using Desktop Alerts

•Go to https://iapps.muschealth.com/muscalertsadmin/Login.aspx

Headline: What the users will see. Keep brief. 100 character

limit

Link URL: Can put a link to a website. When a user clicks on the desktop alert, the website will pop up on their desktop.

Active for: How long this desktop message will stay active on the desktop. If Auto Close is

checked, will automatically remove the alert in the time

specified.

Comments: More detailed info can be put here. If the user

clicks on the alert, this is what they will read.

This is the last message that was sent out.

MUHA Daily Check In • OCIO LOC should attend the Daily Check in call for all days

in his/her week on call.– The calls are at 7:45a on non-holiday weekdays– The calls are held at 8:45a on weekends and holidays and

typically last 5-10 minutes.– Call in for audio: On campus: 6-8002. Off campus: 876-8002– Conference collaboration code: 293366

• Summarize for the leaders on the call the items in the Daily Operations Report – current system/application issues and notable issues during the previous 24 hours.

Planned Downtimes/Scheduled Changes• OCIO LOC should check to see what scheduled

changes and planned downtimes are going on the week of their on call.– Change Control notifications– Downtime website (http://www.musc.edu/downtime)– Courtesy notification from the team responsible

• Exercise Unplanned downtime procedures for Planned Downtimes that go over the scheduled window. Use best judgment.

Helpful Points• Handoff - To ensure for good handoff, the LOC is asked to

contact the oncoming LOC prior to 8:00am on Friday. Need to brief on anything that needs to be monitored or any ongoing situations.

• Contact MUHA Administrator on Call. Let that person know you are the OCIO LOC and make sure you both have each other’s contact info. Go over any ongoing situations or anything that needs to be monitored.

• The Call rotation for the OCIO LOC will be kept in Simon. Need to verify that Simon is correct.

• If someone is covering for you, make sure the MUHA AOC knows and that Simon reflects the change.

Incident Reporting

• Verify that an email describing the incident is sent to [email protected].

• Verify that an entry was put in the OCIO Downtime log in Remedy (http://remedyprod.musc.edu). This should be done after a root cause analysis has been completed.

Next Steps

• Determine schedule and update Simon, Remedy(?)• Send email to OCIO internal• Develop CATTS training module for OCIO on downtime procedure?• Training with AOC.• Meet once a quarter to discuss previous downtimes and discuss what

went right and what could have been done better?