16
SharePoint 2010 Disaster Recovery This document covers the backup recovery scenarios in a typical SharePoint 2010 environment. Document Information Status Release Release Date 27.05.2011 Version 1.0 Filename TVD SharePoint 2010 Disaster Recovery.docx Category Trivadis Best Practices Product SharePoint 2010 Author [email protected];[email protected] Customer Trivadis AG

Disaster Recovery 01

Embed Size (px)

Citation preview

SharePoint 2010 Disaster Recovery This document covers the backup recovery scenarios in a typical SharePoint 2010 environment.

Document Information

Status Release

Release Date 27.05.2011

Version 1.0

Filename TVD SharePoint 2010 Disaster Recovery.docx

Category Trivadis Best Practices

Product SharePoint 2010

Author [email protected];[email protected]

Customer Trivadis AG

Document Versioning

Version Date Author Comments

1.0 27.5.2011 Stephan Hurni, Stephan Jola

Contents

Conventions and Features used in this document ................................................................. 1

Text conventions ....................................................................................................................... 1

Responsible ................................................................................................................................ 1

Referenced Documents............................................................................................................. 1

1 Introduction ..................................................................................................................... 1

2 Key concepts and terms .................................................................................................. 2

2.1 Disaster Recovery (DR) ................................................................................................................................ 2

2.2 Recovery time objective (RTO) ................................................................................................................. 2

2.3 Recovery point objective (RPO) ............................................................................................................... 2

2.4 Architectural overview ................................................................................................................................. 3

2.5 Disaster Recovery Manager ....................................................................................................................... 3

3 Backup objective ............................................................................................................. 4

3.1 Resources .......................................................................................................................................................... 4

3.1.1 Documentation ............................................................................................................................................... 4

3.2 Backup Targets ............................................................................................................................................... 6

3.2.1 Database Backup and Maintenance ....................................................................................................... 6

3.2.2 SharePoint Databases .................................................................................................................................. 7

3.2.3 Service Applications ...................................................................................................................................... 8

4 Restore objective ............................................................................................................. 9

4.1 Restore Targets general .............................................................................................................................. 9

4.2 Priority List........................................................................................................................................................ 9

4.2.1 Windows Servers............................................................................................................................................ 9

4.2.2 SQL Server databases ................................................................................................................................... 9

4.2.3 SharePoint Servers ........................................................................................................................................ 9

4.3 Restore Targets SharePoint .................................................................................................................... 10

4.3.1 SharePoint configuration databases ................................................................................................... 10

4.3.2 SharePoint content databases ............................................................................................................... 10

4.3.3 SharePoint Configuration ........................................................................................................................ 10

4.3.4 Service Applications ................................................................................................................................... 10

5 Testing objective ........................................................................................................... 11

5.1 System Tests ................................................................................................................................................. 11

5.2 Recovery complete .................................................................................................................................... 11

Appendix .................................................................................................................................. 12

Conventions and Features used in this document

This document uses special text and design conventions to make it easier for you to find the

Information you need.

Text conventions

On the Insert tab, the galleries include items that are designed to coordinate with the overall

look of your document.

Convention Feature

Italicized type Italicized type is used to indicate a reference to an object on the document (Figure, Chapter)

Italicized type with underline This type is used to reference to an external source (document)

“Apostrophe” Is used for references to company, organization, clubs and institutions

Responsible

Name Job Title/ Company Responsibilities

Stephan Hurni Principal Consultant, Trivadis SQL Server

Stephan Jola Consultant, Trivadis SharePoint

Referenced Documents

ID Document Title Owner Status

Chapter 1: Introduction

- 1 -

1 Introduction

With the installation, configuration and operation of a SharePoint 2010 Farm, typically the IT

operations team is responsible for the SharePoint servers deployed in the corporate IT

infrastructure. But the mission to be accomplished is much more than monitoring and

optimizing the platform. What happens if a disaster strikes a SharePoint Server, or even worse,

the entire farm? Do you have a rock-solid disaster recovery? Are the Servers, Services,

Applications and Configurations well documented and are the documents up to date?

This document covers the essential points to ensure a solid backup recovery plan in case of

an unexpected failure or data corruption, which requires the intervention of the according IT

administrators.

If you take into account, regarding this guide, you’re prepared. But this takes serious testing

in your environment either! If you are uncertain, in terms of this recommendations, do not

hesitate to contact the authors.

Chapter 2: Key concepts and terms

- 2 -

2 Key concepts and terms

2.1 Disaster Recovery (DR)

While speaking of a disaster Recovery, this means that the SharePoint farm is in a failure state

and cannot be brought back online in the expected amount of time. As a part of the DR plan, it

must be specified as a part of the business continuity specifications, defined by the business

owners, how much unplanned downtime the organization tolerates before experiencing

significant negative business impacts.

2.2 Recovery time objective (RTO)

The RTO defines the time to get the system or data back to the operational status after a data

corruption or disaster. During the RTO, all required restore or recovery steps, including the

corresponding actions, are performed by the relevant responsible persons.

2.3 Recovery point objective (RPO)

If a disaster happens (data corruption, including unintentional data deletion or manipulation),

what is the maximum acceptable amount of data loss? The RPO defines the time between the

last data backup and the disaster.

Identified and declared disaster

Last BackupFull Recovery done, back

online

Time KABOOM!!!

RPO

RTO

Figure 1 RPO and RTO for a SharePoint farm

The implementation of a disaster recovery as illustrated in Figure 1 RPO and RTO for a

SharePoint farm requires the specification of the RTO and RPO as a part of the SharePoint

business continuity plan (BCP).

Chapter 2: Key concepts and terms

- 3 -

2.4 Architectural overview

Professional server environments are deployed in different stages. Typically, this is

Development, Integration/Testing and Production. For SharePoint environments, these stages

can be optimally used to do regular restore operations from production data to the

Integration/Testing stage. This will help to identify the amount of time in the different steps to

recover and get better knowledge about the RTO in a disaster recovery case. A further benefit

is that people are being trained and have the required knowledge for a successful restore of

the SharePoint farm in system incidents.

If a disaster strikes the SharePoint environment, it will be crucial to identify what parts of the

whole system are affected. This paper assumes that all tiers of the SharePoint system are

affected. If in a specific case only parts of the system are in a nonfunctional state, exact analysis

is essential to bring the system back to a flawless state. The different layers are presentation,

application, service and data.

2.5 Disaster Recovery Manager

When the disaster recovery is initiated, we recommend identifying a person who takes over the

Recovery Manager Role during the recovery process. If the SharePoint Farm is damaged or

offline, there is a high pressure from the management to bring the farm back online as soon as

possible. And things can go wrong, especially when no professional coordination is done on

the process.

The Disaster Recovery Manager must be defined before a disaster strikes. He is the owner of

the process. We do not recommend assigning this role to a technical person, depending on the

farm size or involved instances to successful recover the farm.

He is responsible for the following tasks:

Regularly checks the technical responsible persons in the company (they are probably in

holidays, or has left the company).

Checks the system documentation, are they up to date?

Specifies the communication matrix (to whom, how and interval).

Coordinates the resources in the recovery process.

Collects the issues and monitors the ongoing steps to estimate the progress.

He is the single point of contact, protects the technical persons from disrupting they’re

tasks.

Chapter 3: Backup objective

- 4 -

3 Backup objective

3.1 Resources

It is in the nature of a failure event, that things can go “head over heels” if they are not

properly planned and defined. Therefore all the resources have to be named and periodically

confirmed. This is basically:

System documentations (logically and physically), including product versions

Software and Product Keys

Certificates, passphrases to build the farm connectivity

Decision-Makers: Domain Administrators, Server Administrators, Backup Administrators,

Database Administrators, SharePoint Farm Administrators, Site Collection owner for the

business critical sites, Firewall and Network Administrators

Well Documented test scenarios

3.1.1 Documentation

Logical Architecture

Ensure that, the logical architecture of the farm is well documented and up to date. The logical

architecture model of a system describes the logical components of the system, the role of

each components and how they interact with each other. The following architectural aspects

should be at least included:

IIS application pools

SharePoint web applications

SharePoint service applications

Zones and associated alternate access mappings

Web application policies

Content databases

Site Collections (incl. host name site collections)

Sites

My Sites

Chapter 3: Backup objective

- 5 -

Physical Deployment

As the logical architecture documentation is, the documentation of the physical deployment is

another important part of the game. This documentation primarily focuses on the hardware

used in the farm. Most commonly well documented elements are:

Physical servers that both SharePoint and SQL Server use

Storage equipment

Networking equipment

Firewalls that sits between SharePoint servers

Hardware load balancers or similar specialty equipment

Windows Active Directory Controllers

Chapter 3: Backup objective

- 6 -

3.2 Backup Targets

To achieve the goal to effective be able restoring a SharePoint system, safe procedures to

backup all items in the system are essentially!

3.2.1 Database Backup and Maintenance

To be able to restore SQL Server databases, this includes all SharePoint databases as well as all

SQL Server system databases, to a point in time, database backups and transaction log backups

are essential. A proper deployed maintenance contains all the necessary steps and tasks to

achieve a secure database operation for all databases in the specific instance.

Trivadis has developed Maintenance Scripts to achieve a trouble-free and automated SQL

Server maintenance containing all necessary tasks.

Table 1 SQL Server Job List

Task/Job Description Schedule and interval

Full backup Full Database Backup Daily

Transaction Log backup Transaction Log backup Every 5-15 min

Index Maintenance Reorganize or rebuild indexes based on different

parameters

Daily

Statistics update For existing Statistics but auto create statistics

should be off for SharePoint Databases

Daily

Clean up Clean up History Logs an old Backup files Daily

To fulfill the point in time restore requirements, the transaction log backup chain of every

database must be supplied. If any TX log backup file is missing, the chain is broken and the

restore is only possible to the last available TX log backup in the chain.

Figure 2 Transaction log chain (where subsequent Diff-Backup or Full-Backup can be missing but no TX Log Backup)

Chapter 3: Backup objective

- 7 -

3.2.2 SharePoint Databases

SharePoint has several SQL server databases, containing configuration data and user content

from SharePoint web applications, site collections, sites, libraries and lists. The backup of the

SharePoint databases varies from configuration databases to service application databases to

content databases.

Configuration Databases

Each SharePoint farm has 1 configuration and 1 central administration content database. Make

regularly backup of those databases. This ensures in case of disaster to retrieve configuration

settings.

Create configuration database backups within the SharePoint central administration or better

with PowerShell scripts. PowerShell scripts could be integrated in the above mentioned SQL

Server Backup Jobs. Doing so will assure, that the contemporary states of the backup items are

maintained.

But do never ever make a SQL Server based restore of the configuration databases. This will

damage your SharePoint farm for sure!

Content Databases

Backup of SharePoint content database is done with regular SQL Server database backup as

described before. From a SQL Server point of view, these databases are user databases as every

other user database.

Therefore, the backup plan and duration depends heavily on the SQL Backup Restore

environment. Typically, a well architected SharePoint farm has content databases with a size

limit up to 500GB. This ensures a well performing backup/restore procedure. A larger database

requires automatically longer backup/restore time windows.

Service Application Databases

Some, but not all, of the service applications depend on their own databases. In fact, that

service applications are not dynamically, we recommend to backing up the service applications

databases within the regular SQL server backup job and/or PowerShell scripts for service

applications that do not have a database in place.

Important note: Some of the service applications require passwords. The secure store service

application for example, requires a password while configuring. This password must be

available to recover the service application.

Table 2 Recommended SharePoint database backup schedule from the SQL server perspective

Database Backup Type Schedule and interval Tools

SharePoint configuration Full Daily SQL / PowerShell

SharePoint Central Administration

Content

Full Daily SQL / PowerShell

SharePoint Service Application Databases Full Daily SQL / PowerShell

SharePoint Content Databases Full Daily SQL

for every Database Transaction log Every 5 to 15min1 SQL

1 Must reflect the required RPO

Chapter 3: Backup objective

- 8 -

3.2.3 Service Applications

User Profile Service Application

The User Profile Service Application is one of the most tricky service applications in SharePoint.

One of the most important steps is to document the initial configuration steps used to get the

service application successfully up and running.

Next, the backup if this service application differs from the other service applications. We

recommend using a PowerShell script to back up the two required components of the User

Profile Service Application:

Service Application Name

Service Application Proxy Name

The associated databases of the Service Application can also be backed up with SQL

Management Studio. This requires additional steps to ensure, that the backup is restore aware1.

Search Service Application

The Search Service Application can be backed up with either PowerShell or SharePoint Central

Administration.

SharePoint Server 2010 starts a SQL Server backup of the Search administration database,

crawl databases, and property databases, and also backs up the index partition files in parallel.

Secure Store Service

When the Secure Store Service is configured the first time, a passphrase is entered. It is

important to keep the passphrase in a secure location.

Every time you change or manipulate the Secure Store Service, the Secure Store Service

Application Database is automatically re-encrypted. Therefore, backing up the Secure Store

Service Application ensures the automatic synchronization of the Master Key and the database.

1 http://technet.microsoft.com/en-us/library/gg576965.aspx

Chapter 4: Restore objective

- 9 -

4 Restore objective

4.1 Restore Targets general

The Restore of the backed up targets it listed below in chronological order. Tasks can be

parallelized where it is appropriate.

4.2 Priority List

Each SharePoint WebApp or Site Collection has different business continuity requirements. For

the disaster task force it is important to have a list of WebApp and Site Collection, where the

priority of the item(s) is specified. This ensures the right, or better: optimized, order of the

recovery steps. Business critical items are faster online than other non-critical items.

4.2.1 Windows Servers

To restore the different Server Roles, every Server in the SharePoint system should be built or

deployed as a new, clean setup. Using imaging or bare-metal solutions can lead to inconsistent

situations and server states. Therefore it is recommended to use proper Server deployment

scenarios, add the needed roles and features and install the desired software on top. Then the

specific configuration of the services and applications can be applied based on the

requirements and documentation.

4.2.2 SQL Server databases

After the Windows server to hold the database role is back online and connected to the active

directory domain, the installation of the SQL Server Instance has to be done. Setup and

configure the SQL Server instance with exactly the same service pack and cumulative update

level as the former server was.

This is followed by restoring the SQL Server system databases to get all system objects,

principals and permissions back to the state of the selected point in time.

4.2.3 SharePoint Servers

Each SharePoint Server, independent of its SharePoint Role, requires a new complete

SharePoint binaries installation in case of a catastrophic failure of the SharePoint farm. Follow

this rule:

Keep the SharePoint source binaries in a central location (and not on the SharePoint server

itself).

Keep the used product key in a central location.

Ensure the identical product level is installed.

Keep the Installed SharePoint Solutions in a central location.

Chapter 4: Restore objective

- 10 -

4.3 Restore Targets SharePoint

The Restore of the backed up SharePoint targets it listed below in chronological order.

4.3.1 SharePoint configuration databases

Lost or undocumented settings about the SharePoint farm can be retrieved from a restored

copy of the configuration database within the SQL server management studio.

Do never restore a SharePoint configuration database within SQL Server restore operations!

This will fail and damage the farm with unknown side effects!

Recovering a SharePoint farm configuration must be done with the native SharePoint farm

backup file in Central Admin or within the SharePoint Management Shell.

4.3.2 SharePoint content databases

If a content database is corrupted or otherwise damaged, the database can be restored. There

are situations, where only a part of the content may be affected. Therefore, it is not acceptable

to restore the whole database, instead of only the desired part within the database.

Simple speaking, we distinguish two types of restoring SharePoint content:

Catastrophic failure. The content database is damaged in a form of useless state. This

requires the SQL full restore operation of the content database.

Data failure. The content database is operational, but the content has been modified in a

way, where a part of the content is useless or manipulated. Then, the content database is

restored to SQL server to the specified RPO (point in time recovery). The restored database

requires a different name. The next step is; the SharePoint farm administrator restores the

content from the database with the unattached database recovery feature in SharePoint.

When finished, the restored content database can be removed from the SQL server.

4.3.3 SharePoint Configuration

The SharePoint farm configuration is only restored within Central Administration, or with

PowerShell. Do not recover the SharePoint configuration database with SQL native restore jobs.

4.3.4 Service Applications

Restore the Service Applications only with PowerShell or in the Central Administration of

SharePoint.

Chapter 5: Testing objective

- 11 -

5 Testing objective

5.1 System Tests

Before opening the SharePoint farm to End Users, ensure that every required task is finished.

This is important to prevent Server reboots after a green light has been committed to the End

Users.

Typically, the System test is divided to several task and responsible persons. The IT

Administrator ensures that all systems are set up with the correct operating system. The

SharePoint Administrator is responsible to finish all required tasks including checks in the

SharePoint log file.

Each WebApp or Site Collection Owner ensures that everything is ok and approves the

functionality of its site collection.

5.2 Recovery complete

The SharePoint Disaster Recovery Manager collects each response of the disaster recovery

team. He is responsible to coordinate the tasks during the recovery phase and also intervenes

on problems. Additionally he documents the identified issues during the recovery process,

delegates the resources and communicates with the business.

He is responsible to specify the Recovery complete state.

Chapter 5: Testing objective

- 12 -

Appendix

Plan for disaster recovery (SharePoint Server 2010):

http://technet.microsoft.com/en-us/library/ff628971.aspx

Backup a service application:

http://technet.microsoft.com/en-us/library/ee428318.aspx

Restore a service application (SharePoint Server 2010):

http://technet.microsoft.com/en-us/library/ee428305.aspx