10
Lessons Learned from disaster recovery Jinny Chien April 20, 2009 6 th APGridPMA in Taipei

Lessons Learned from disaster recovery Jinny Chien April 20, 2009 6 th APGridPMA in Taipei

Embed Size (px)

Citation preview

Page 1: Lessons Learned from disaster recovery Jinny Chien April 20, 2009 6 th APGridPMA in Taipei

Lessons Learned from disaster recovery

Jinny Chien

April 20, 2009

6th APGridPMA in Taipei

Page 2: Lessons Learned from disaster recovery Jinny Chien April 20, 2009 6 th APGridPMA in Taipei

Motivation

• ASGCCA encountered the accident in February and how to avoid the same situation happening

• Do we have sufficient backup procedure• Is our CA server at the safe place• Do we have the standard policy or incident

response procedure to introduce the recovery

Page 3: Lessons Learned from disaster recovery Jinny Chien April 20, 2009 6 th APGridPMA in Taipei

ASGCCA Event

• Time : 9:00 Feb-25-2009 UTC• Event description : the unexpected

accident on the part of data center at ASGC, all on-line services were shut down including ASGCCA web server.

• Result : ASGCCA certificate activities were down. CRL did not publish in time.

Page 4: Lessons Learned from disaster recovery Jinny Chien April 20, 2009 6 th APGridPMA in Taipei

Process

9:00 Feb 25 UTC : Sent an EGEE broadcast to all ROC managers, VO managers, WLCG users, APGridPMA members, ASGCCA users

2:00 Feb 26 UTC : Sent an announcement to IGTF-RAT and IGTF-general lists. Try to recover ASGCCA web page

12:00 Feb 26 UTC: Moved ASGCCA web and CA server (offline) to the safe place and connected to the Internet.

16:00 Feb 26 UTC : ASGCCA web site was up. Sent the announcement to IGTF-RAT, IGTF-general and ASGCCA user lists

Page 5: Lessons Learned from disaster recovery Jinny Chien April 20, 2009 6 th APGridPMA in Taipei

Review the process

Feb-25-2009 UTC ASGCCA web was down and sent an announcement to APGridPMA, ASGCCA users, IGTF-RAT, IGTF-general

Feb 26 UTC : Recovered and ASGCCA web site was up. Sent the final announcement and checked all CA activities well.

Total process is two days

Page 6: Lessons Learned from disaster recovery Jinny Chien April 20, 2009 6 th APGridPMA in Taipei

Basic Recovery procedure

• Evaluate the scope of this disaster and how many days to recover

• Send the notification to IGTF-RAT, IGTF-general, APGridPMA members and your end entities. The matter should be described the disaster and schedule

• Recovery activities• Check all CA activities well and CRL will be published

regularly• Re-work and send the final announcement to IGTF-

RAT, APGridPMA member, IGTF and your end entities.

Page 7: Lessons Learned from disaster recovery Jinny Chien April 20, 2009 6 th APGridPMA in Taipei

IGTF-RAT

• The International Grid Trust Federation (IGTF) Risk Assessment Team (RAT) is responsible for assessing risk and setting time and deadlines for response and action for concerns and vulnerabilities.

• Email address: [email protected]• Members:

• APGridPMA: Yoshio Tanaka , Jinny Chien • EUGridPMA: Jens Jensen, Willy Weisz , David Groep, Sajjad

Asghar • TAGPMA: Jim Basney , Vinod Rebello, Jim Marsteller

• Public webpage • http://tagpma.es.net/wiki/bin/view/IGTF-RAT

Page 8: Lessons Learned from disaster recovery Jinny Chien April 20, 2009 6 th APGridPMA in Taipei

Conclusion

• Please backup the CA server and web regularly

• The backup archive should be kept at the safe place

• Write down the recovery procedure for your CA activities

Page 9: Lessons Learned from disaster recovery Jinny Chien April 20, 2009 6 th APGridPMA in Taipei

Discussion

• if the CA server and web destroy at the same time?• To evaluate the disaster and plan a

schedule• Ask for help to IGTF-RAT

• Should we have the incident response procedure ?

• What is the time range if CA encounters any accident?

Page 10: Lessons Learned from disaster recovery Jinny Chien April 20, 2009 6 th APGridPMA in Taipei

Thanks for your listening