Upload
sawahab1
View
214
Download
0
Embed Size (px)
Citation preview
SharePoint 2010 Disaster Recovery This document covers the backup recovery scenarios in a typical SharePoint 2010 environment.
Document Information
Status Release
Release Date 27.05.2011
Version 1.0
Filename TVD SharePoint 2010 Disaster Recovery.docx
Category Trivadis Best Practices
Product SharePoint 2010
Author [email protected];[email protected]
Customer Trivadis AG
Contents
Conventions and Features used in this document ................................................................. 1
Text conventions ....................................................................................................................... 1
Responsible ................................................................................................................................ 1
Referenced Documents............................................................................................................. 1
1 Introduction ..................................................................................................................... 1
2 Key concepts and terms .................................................................................................. 2
2.1 Disaster Recovery (DR) ................................................................................................................................ 2
2.2 Recovery time objective (RTO) ................................................................................................................. 2
2.3 Recovery point objective (RPO) ............................................................................................................... 2
2.4 Architectural overview ................................................................................................................................. 3
2.5 Disaster Recovery Manager ....................................................................................................................... 3
3 Backup objective ............................................................................................................. 4
3.1 Resources .......................................................................................................................................................... 4
3.1.1 Documentation ............................................................................................................................................... 4
3.2 Backup Targets ............................................................................................................................................... 6
3.2.1 Database Backup and Maintenance ....................................................................................................... 6
3.2.2 SharePoint Databases .................................................................................................................................. 7
3.2.3 Service Applications ...................................................................................................................................... 8
4 Restore objective ............................................................................................................. 9
4.1 Restore Targets general .............................................................................................................................. 9
4.2 Priority List........................................................................................................................................................ 9
4.2.1 Windows Servers............................................................................................................................................ 9
4.2.2 SQL Server databases ................................................................................................................................... 9
4.2.3 SharePoint Servers ........................................................................................................................................ 9
4.3 Restore Targets SharePoint .................................................................................................................... 10
4.3.1 SharePoint configuration databases ................................................................................................... 10
4.3.2 SharePoint content databases ............................................................................................................... 10
4.3.3 SharePoint Configuration ........................................................................................................................ 10
4.3.4 Service Applications ................................................................................................................................... 10
5 Testing objective ........................................................................................................... 11
5.1 System Tests ................................................................................................................................................. 11
5.2 Recovery complete .................................................................................................................................... 11
Appendix .................................................................................................................................. 12
Conventions and Features used in this document
This document uses special text and design conventions to make it easier for you to find the
Information you need.
Text conventions
On the Insert tab, the galleries include items that are designed to coordinate with the overall
look of your document.
Convention Feature
Italicized type Italicized type is used to indicate a reference to an object on the document (Figure, Chapter)
Italicized type with underline This type is used to reference to an external source (document)
“Apostrophe” Is used for references to company, organization, clubs and institutions
Responsible
Name Job Title/ Company Responsibilities
Stephan Hurni Principal Consultant, Trivadis SQL Server
Stephan Jola Consultant, Trivadis SharePoint
Referenced Documents
ID Document Title Owner Status
Chapter 1: Introduction
- 1 -
1 Introduction
With the installation, configuration and operation of a SharePoint 2010 Farm, typically the IT
operations team is responsible for the SharePoint servers deployed in the corporate IT
infrastructure. But the mission to be accomplished is much more than monitoring and
optimizing the platform. What happens if a disaster strikes a SharePoint Server, or even worse,
the entire farm? Do you have a rock-solid disaster recovery? Are the Servers, Services,
Applications and Configurations well documented and are the documents up to date?
This document covers the essential points to ensure a solid backup recovery plan in case of
an unexpected failure or data corruption, which requires the intervention of the according IT
administrators.
If you take into account, regarding this guide, you’re prepared. But this takes serious testing
in your environment either! If you are uncertain, in terms of this recommendations, do not
hesitate to contact the authors.
Chapter 2: Key concepts and terms
- 2 -
2 Key concepts and terms
2.1 Disaster Recovery (DR)
While speaking of a disaster Recovery, this means that the SharePoint farm is in a failure state
and cannot be brought back online in the expected amount of time. As a part of the DR plan, it
must be specified as a part of the business continuity specifications, defined by the business
owners, how much unplanned downtime the organization tolerates before experiencing
significant negative business impacts.
2.2 Recovery time objective (RTO)
The RTO defines the time to get the system or data back to the operational status after a data
corruption or disaster. During the RTO, all required restore or recovery steps, including the
corresponding actions, are performed by the relevant responsible persons.
2.3 Recovery point objective (RPO)
If a disaster happens (data corruption, including unintentional data deletion or manipulation),
what is the maximum acceptable amount of data loss? The RPO defines the time between the
last data backup and the disaster.
Identified and declared disaster
Last BackupFull Recovery done, back
online
Time KABOOM!!!
RPO
RTO
Figure 1 RPO and RTO for a SharePoint farm
The implementation of a disaster recovery as illustrated in Figure 1 RPO and RTO for a
SharePoint farm requires the specification of the RTO and RPO as a part of the SharePoint
business continuity plan (BCP).
Chapter 2: Key concepts and terms
- 3 -
2.4 Architectural overview
Professional server environments are deployed in different stages. Typically, this is
Development, Integration/Testing and Production. For SharePoint environments, these stages
can be optimally used to do regular restore operations from production data to the
Integration/Testing stage. This will help to identify the amount of time in the different steps to
recover and get better knowledge about the RTO in a disaster recovery case. A further benefit
is that people are being trained and have the required knowledge for a successful restore of
the SharePoint farm in system incidents.
If a disaster strikes the SharePoint environment, it will be crucial to identify what parts of the
whole system are affected. This paper assumes that all tiers of the SharePoint system are
affected. If in a specific case only parts of the system are in a nonfunctional state, exact analysis
is essential to bring the system back to a flawless state. The different layers are presentation,
application, service and data.
2.5 Disaster Recovery Manager
When the disaster recovery is initiated, we recommend identifying a person who takes over the
Recovery Manager Role during the recovery process. If the SharePoint Farm is damaged or
offline, there is a high pressure from the management to bring the farm back online as soon as
possible. And things can go wrong, especially when no professional coordination is done on
the process.
The Disaster Recovery Manager must be defined before a disaster strikes. He is the owner of
the process. We do not recommend assigning this role to a technical person, depending on the
farm size or involved instances to successful recover the farm.
He is responsible for the following tasks:
Regularly checks the technical responsible persons in the company (they are probably in
holidays, or has left the company).
Checks the system documentation, are they up to date?
Specifies the communication matrix (to whom, how and interval).
Coordinates the resources in the recovery process.
Collects the issues and monitors the ongoing steps to estimate the progress.
He is the single point of contact, protects the technical persons from disrupting they’re
tasks.
Chapter 3: Backup objective
- 4 -
3 Backup objective
3.1 Resources
It is in the nature of a failure event, that things can go “head over heels” if they are not
properly planned and defined. Therefore all the resources have to be named and periodically
confirmed. This is basically:
System documentations (logically and physically), including product versions
Software and Product Keys
Certificates, passphrases to build the farm connectivity
Decision-Makers: Domain Administrators, Server Administrators, Backup Administrators,
Database Administrators, SharePoint Farm Administrators, Site Collection owner for the
business critical sites, Firewall and Network Administrators
Well Documented test scenarios
3.1.1 Documentation
Logical Architecture
Ensure that, the logical architecture of the farm is well documented and up to date. The logical
architecture model of a system describes the logical components of the system, the role of
each components and how they interact with each other. The following architectural aspects
should be at least included:
IIS application pools
SharePoint web applications
SharePoint service applications
Zones and associated alternate access mappings
Web application policies
Content databases
Site Collections (incl. host name site collections)
Sites
My Sites
Chapter 3: Backup objective
- 5 -
Physical Deployment
As the logical architecture documentation is, the documentation of the physical deployment is
another important part of the game. This documentation primarily focuses on the hardware
used in the farm. Most commonly well documented elements are:
Physical servers that both SharePoint and SQL Server use
Storage equipment
Networking equipment
Firewalls that sits between SharePoint servers
Hardware load balancers or similar specialty equipment
Windows Active Directory Controllers
Chapter 3: Backup objective
- 6 -
3.2 Backup Targets
To achieve the goal to effective be able restoring a SharePoint system, safe procedures to
backup all items in the system are essentially!
3.2.1 Database Backup and Maintenance
To be able to restore SQL Server databases, this includes all SharePoint databases as well as all
SQL Server system databases, to a point in time, database backups and transaction log backups
are essential. A proper deployed maintenance contains all the necessary steps and tasks to
achieve a secure database operation for all databases in the specific instance.
Trivadis has developed Maintenance Scripts to achieve a trouble-free and automated SQL
Server maintenance containing all necessary tasks.
Table 1 SQL Server Job List
Task/Job Description Schedule and interval
Full backup Full Database Backup Daily
Transaction Log backup Transaction Log backup Every 5-15 min
Index Maintenance Reorganize or rebuild indexes based on different
parameters
Daily
Statistics update For existing Statistics but auto create statistics
should be off for SharePoint Databases
Daily
Clean up Clean up History Logs an old Backup files Daily
To fulfill the point in time restore requirements, the transaction log backup chain of every
database must be supplied. If any TX log backup file is missing, the chain is broken and the
restore is only possible to the last available TX log backup in the chain.
Figure 2 Transaction log chain (where subsequent Diff-Backup or Full-Backup can be missing but no TX Log Backup)
Chapter 3: Backup objective
- 7 -
3.2.2 SharePoint Databases
SharePoint has several SQL server databases, containing configuration data and user content
from SharePoint web applications, site collections, sites, libraries and lists. The backup of the
SharePoint databases varies from configuration databases to service application databases to
content databases.
Configuration Databases
Each SharePoint farm has 1 configuration and 1 central administration content database. Make
regularly backup of those databases. This ensures in case of disaster to retrieve configuration
settings.
Create configuration database backups within the SharePoint central administration or better
with PowerShell scripts. PowerShell scripts could be integrated in the above mentioned SQL
Server Backup Jobs. Doing so will assure, that the contemporary states of the backup items are
maintained.
But do never ever make a SQL Server based restore of the configuration databases. This will
damage your SharePoint farm for sure!
Content Databases
Backup of SharePoint content database is done with regular SQL Server database backup as
described before. From a SQL Server point of view, these databases are user databases as every
other user database.
Therefore, the backup plan and duration depends heavily on the SQL Backup Restore
environment. Typically, a well architected SharePoint farm has content databases with a size
limit up to 500GB. This ensures a well performing backup/restore procedure. A larger database
requires automatically longer backup/restore time windows.
Service Application Databases
Some, but not all, of the service applications depend on their own databases. In fact, that
service applications are not dynamically, we recommend to backing up the service applications
databases within the regular SQL server backup job and/or PowerShell scripts for service
applications that do not have a database in place.
Important note: Some of the service applications require passwords. The secure store service
application for example, requires a password while configuring. This password must be
available to recover the service application.
Table 2 Recommended SharePoint database backup schedule from the SQL server perspective
Database Backup Type Schedule and interval Tools
SharePoint configuration Full Daily SQL / PowerShell
SharePoint Central Administration
Content
Full Daily SQL / PowerShell
SharePoint Service Application Databases Full Daily SQL / PowerShell
SharePoint Content Databases Full Daily SQL
for every Database Transaction log Every 5 to 15min1 SQL
1 Must reflect the required RPO
Chapter 3: Backup objective
- 8 -
3.2.3 Service Applications
User Profile Service Application
The User Profile Service Application is one of the most tricky service applications in SharePoint.
One of the most important steps is to document the initial configuration steps used to get the
service application successfully up and running.
Next, the backup if this service application differs from the other service applications. We
recommend using a PowerShell script to back up the two required components of the User
Profile Service Application:
Service Application Name
Service Application Proxy Name
The associated databases of the Service Application can also be backed up with SQL
Management Studio. This requires additional steps to ensure, that the backup is restore aware1.
Search Service Application
The Search Service Application can be backed up with either PowerShell or SharePoint Central
Administration.
SharePoint Server 2010 starts a SQL Server backup of the Search administration database,
crawl databases, and property databases, and also backs up the index partition files in parallel.
Secure Store Service
When the Secure Store Service is configured the first time, a passphrase is entered. It is
important to keep the passphrase in a secure location.
Every time you change or manipulate the Secure Store Service, the Secure Store Service
Application Database is automatically re-encrypted. Therefore, backing up the Secure Store
Service Application ensures the automatic synchronization of the Master Key and the database.
1 http://technet.microsoft.com/en-us/library/gg576965.aspx
Chapter 4: Restore objective
- 9 -
4 Restore objective
4.1 Restore Targets general
The Restore of the backed up targets it listed below in chronological order. Tasks can be
parallelized where it is appropriate.
4.2 Priority List
Each SharePoint WebApp or Site Collection has different business continuity requirements. For
the disaster task force it is important to have a list of WebApp and Site Collection, where the
priority of the item(s) is specified. This ensures the right, or better: optimized, order of the
recovery steps. Business critical items are faster online than other non-critical items.
4.2.1 Windows Servers
To restore the different Server Roles, every Server in the SharePoint system should be built or
deployed as a new, clean setup. Using imaging or bare-metal solutions can lead to inconsistent
situations and server states. Therefore it is recommended to use proper Server deployment
scenarios, add the needed roles and features and install the desired software on top. Then the
specific configuration of the services and applications can be applied based on the
requirements and documentation.
4.2.2 SQL Server databases
After the Windows server to hold the database role is back online and connected to the active
directory domain, the installation of the SQL Server Instance has to be done. Setup and
configure the SQL Server instance with exactly the same service pack and cumulative update
level as the former server was.
This is followed by restoring the SQL Server system databases to get all system objects,
principals and permissions back to the state of the selected point in time.
4.2.3 SharePoint Servers
Each SharePoint Server, independent of its SharePoint Role, requires a new complete
SharePoint binaries installation in case of a catastrophic failure of the SharePoint farm. Follow
this rule:
Keep the SharePoint source binaries in a central location (and not on the SharePoint server
itself).
Keep the used product key in a central location.
Ensure the identical product level is installed.
Keep the Installed SharePoint Solutions in a central location.
Chapter 4: Restore objective
- 10 -
4.3 Restore Targets SharePoint
The Restore of the backed up SharePoint targets it listed below in chronological order.
4.3.1 SharePoint configuration databases
Lost or undocumented settings about the SharePoint farm can be retrieved from a restored
copy of the configuration database within the SQL server management studio.
Do never restore a SharePoint configuration database within SQL Server restore operations!
This will fail and damage the farm with unknown side effects!
Recovering a SharePoint farm configuration must be done with the native SharePoint farm
backup file in Central Admin or within the SharePoint Management Shell.
4.3.2 SharePoint content databases
If a content database is corrupted or otherwise damaged, the database can be restored. There
are situations, where only a part of the content may be affected. Therefore, it is not acceptable
to restore the whole database, instead of only the desired part within the database.
Simple speaking, we distinguish two types of restoring SharePoint content:
Catastrophic failure. The content database is damaged in a form of useless state. This
requires the SQL full restore operation of the content database.
Data failure. The content database is operational, but the content has been modified in a
way, where a part of the content is useless or manipulated. Then, the content database is
restored to SQL server to the specified RPO (point in time recovery). The restored database
requires a different name. The next step is; the SharePoint farm administrator restores the
content from the database with the unattached database recovery feature in SharePoint.
When finished, the restored content database can be removed from the SQL server.
4.3.3 SharePoint Configuration
The SharePoint farm configuration is only restored within Central Administration, or with
PowerShell. Do not recover the SharePoint configuration database with SQL native restore jobs.
4.3.4 Service Applications
Restore the Service Applications only with PowerShell or in the Central Administration of
SharePoint.
Chapter 5: Testing objective
- 11 -
5 Testing objective
5.1 System Tests
Before opening the SharePoint farm to End Users, ensure that every required task is finished.
This is important to prevent Server reboots after a green light has been committed to the End
Users.
Typically, the System test is divided to several task and responsible persons. The IT
Administrator ensures that all systems are set up with the correct operating system. The
SharePoint Administrator is responsible to finish all required tasks including checks in the
SharePoint log file.
Each WebApp or Site Collection Owner ensures that everything is ok and approves the
functionality of its site collection.
5.2 Recovery complete
The SharePoint Disaster Recovery Manager collects each response of the disaster recovery
team. He is responsible to coordinate the tasks during the recovery phase and also intervenes
on problems. Additionally he documents the identified issues during the recovery process,
delegates the resources and communicates with the business.
He is responsible to specify the Recovery complete state.
Chapter 5: Testing objective
- 12 -
Appendix
Plan for disaster recovery (SharePoint Server 2010):
http://technet.microsoft.com/en-us/library/ff628971.aspx
Backup a service application:
http://technet.microsoft.com/en-us/library/ee428318.aspx
Restore a service application (SharePoint Server 2010):
http://technet.microsoft.com/en-us/library/ee428305.aspx