Upload
jessie-short
View
212
Download
0
Embed Size (px)
Citation preview
Lessons Learned from disaster recovery
Jinny Chien
April 20, 2009
6th APGridPMA in Taipei
Motivation
• ASGCCA encountered the accident in February and how to avoid the same situation happening
• Do we have sufficient backup procedure• Is our CA server at the safe place• Do we have the standard policy or incident
response procedure to introduce the recovery
ASGCCA Event
• Time : 9:00 Feb-25-2009 UTC• Event description : the unexpected
accident on the part of data center at ASGC, all on-line services were shut down including ASGCCA web server.
• Result : ASGCCA certificate activities were down. CRL did not publish in time.
Process
9:00 Feb 25 UTC : Sent an EGEE broadcast to all ROC managers, VO managers, WLCG users, APGridPMA members, ASGCCA users
2:00 Feb 26 UTC : Sent an announcement to IGTF-RAT and IGTF-general lists. Try to recover ASGCCA web page
12:00 Feb 26 UTC: Moved ASGCCA web and CA server (offline) to the safe place and connected to the Internet.
16:00 Feb 26 UTC : ASGCCA web site was up. Sent the announcement to IGTF-RAT, IGTF-general and ASGCCA user lists
Review the process
Feb-25-2009 UTC ASGCCA web was down and sent an announcement to APGridPMA, ASGCCA users, IGTF-RAT, IGTF-general
Feb 26 UTC : Recovered and ASGCCA web site was up. Sent the final announcement and checked all CA activities well.
Total process is two days
Basic Recovery procedure
• Evaluate the scope of this disaster and how many days to recover
• Send the notification to IGTF-RAT, IGTF-general, APGridPMA members and your end entities. The matter should be described the disaster and schedule
• Recovery activities• Check all CA activities well and CRL will be published
regularly• Re-work and send the final announcement to IGTF-
RAT, APGridPMA member, IGTF and your end entities.
IGTF-RAT
• The International Grid Trust Federation (IGTF) Risk Assessment Team (RAT) is responsible for assessing risk and setting time and deadlines for response and action for concerns and vulnerabilities.
• Email address: [email protected]• Members:
• APGridPMA: Yoshio Tanaka , Jinny Chien • EUGridPMA: Jens Jensen, Willy Weisz , David Groep, Sajjad
Asghar • TAGPMA: Jim Basney , Vinod Rebello, Jim Marsteller
• Public webpage • http://tagpma.es.net/wiki/bin/view/IGTF-RAT
Conclusion
• Please backup the CA server and web regularly
• The backup archive should be kept at the safe place
• Write down the recovery procedure for your CA activities
Discussion
• if the CA server and web destroy at the same time?• To evaluate the disaster and plan a
schedule• Ask for help to IGTF-RAT
• Should we have the incident response procedure ?
• What is the time range if CA encounters any accident?
Thanks for your listening