Upload
vananh
View
214
Download
0
Embed Size (px)
Citation preview
Amazon Web Services (AWS) Solution Cookbook
Amazon Web Services (AWS) Solution Cookbook
E B O O K
Introduction
Datos IO RecoverX is the industry-first
scale-out data protection software-only
product to deliver scalable and reliable
backup and recovery solutions for next-
generation applications built on scale-out
databases. With RecoverX, enterprises
can deploy business-critical applications
with confidence in the recoverability of
data and the ability to maintain high
application uptime.
The Deployment Steps Will Include:
• Creating and configuring an Instance
from the shared AMI
• Provisioning AWS S3 storage to host the
version of backed up MongoDB data
• Launching RecoverX from the Instance
• Configuring RecoverX to Protect
MongoDB Data Sources
This solution cookbook will describe
deployment and configuration of
RecoverX in AWS to backup and
recover MongoDB clusters running in
AWS. Datos IO simplifies the installation
of RecoverX by delivering RecoverX
as a shared AMI. The RecoverX AMI
is an easy way install and configure
RecoverX with your database clusters.
Once installation is complete, a user can
connect to the RecoverX GUI or console
and immediately begin to configure
and manage data sources, versioning
policies, version storage, and perform
recovery operations.
Launch RecoverX from the AMI
The first step is to locate the AMI in your AWS EC2
Management Console. Datos IO will enable access to
this AMI with RecoverX using pre-installed user-id that
will authenticated using an SSH key that you configure
during provisioning of an instance from this AMI.
Click on “launch” to launch an AWS
instance based on this AMI.
2EBOOK www.datos.io [email protected]
You will be asked to “choose and instance type” with a list of standard AWS EC2 machine configurations. Choose the
required EC2 instance for running RecoverX: m4.2xlarge.
Click next (in the lower right hand corner) to choose “instance details”.
Be sure and select the correct network VPC and subnet that enables network connectivity between the RecoverX
instance being created and existing Mongo cluster instances.
Click next to “add storage”.
The storage choices will already be configured, a 15GB root volume and a 256GB. This is the required minimum
configuration for running RecoverX (for capacity and storage IOPs). Please accept this configuration by clicking “next”.
3EBOOK www.datos.io [email protected]
Click next, and on the next screen and add any tags for your environment.
Click next to configure “security group” firewall rules for the instance.
Configure Security Group firewall rules, for your AWS environment, that enable access to the following incoming ports on,
on the Public address of the RecoverX instance you are creating:
• SSH/22
• TCP/9090
Click on next to review and accept your new instance configured for RecoverX.
It will take a few minutes for the RecoverX EC2 instance to initialize. After the RecoverX instance is running, log into the
instance using the SSH key provided by Datos and username (“cloud-user”) and follow the steps below to accept the
end-user agreement and initialize RecoverX.
4EBOOK www.datos.io [email protected]
Page Through End User License Agreement and Type “Y” to Accept It on the Final Screen
5EBOOK www.datos.io [email protected]
Start Datos Server
Now that RecoverX is provisioned and initialized, the next series of steps are necessary to configure the MongoDB Data
Source nodes to be accessed and managed by RecoverX and for RecoverX to orchestrate backups to/from backup
version storage.
Configure Data Source
Create Datos IO user (e.g. “datos”) account on each Data Source node for running commands to extract data from the
cluster. Be sure and perform this step on every data MongoDB Data Source node, (including arbiters in unsharded
Replica Sets.
We recommend adding the new user to the existing mongo group.
6EBOOK www.datos.io [email protected]
Configure Authentication for the “datos” user using one of these authentication methods (we recommend using #3):
1. Username and Password
2. Username and Passphrase
3. Username and SSH Access key
Debian Version 6.0 and Centos Version 6.0 and above have typically have OpenSSL installed by default. If not
installed, the following commands can be used to install OpenSSL.
Debian/Ubuntu:
sudo apt-get install openssl
Centos/RHEL
sudo yum install openssl
This is a prerequisite that you will need when adding the MongoDB Data Source(s) to RecoverX.
Give <datos_db_user> write permission to its home directory (/home/<datos_db_user>) on all Data Source nodes.
Configure Maximum SSH Sessions
For each node, edit the following file /etc/ssh/sshd_config file to set sshd parameters MaxSessions to “500” and
MaxStartups to “500:1:500”. You can verify the values of these parameters as follows:
Setting these parameters as shown enables more parallel RecoverX processes to extract and version data. If these
parameters are not set as shown, set them by editing /etc/ssh/sshd_config.
7EBOOK www.datos.io [email protected]
Add Hostnames to RecoverX
Mongo node hostnames need to be resolvable from RecoverX server. To do so, connect to the config server and get
hostnames of all mongos instances in the MongoDB cluster.
Next, on RecoverX server, add a hostname entry for each mongos with its hostname and IP address from above.
Configure Network Settings
Ensure that the following ports are allowed by the Linux system and AWS firewalls:
Provision Version Storage
A storage system is required to store the backup copies of the data source that is protected by RecoverX. The most
common way to enable this storage target is by provisioning Amazon S3 bucket(s) to be used as the secondary storage.
Create an S3 bucket, using a memorable name, and select the appropriate AWS region for your environment.
8EBOOK www.datos.io [email protected]
When assigning an S3 bucket as storage, ensure that all data source nodes and RecoverX nodes have Read+Write
access to the bucket (or directory).
To authenticate S3 with RecoverX, you may configure Identity and Access Management (IAM) roles. Follow the Amazon
IAM documentation to create an IAM role with S3 bucket privileges (R+W) and assign this role to EC2 instances on which
RecoverX is installed. This is the recommended method for authentication.
Alternatively, you may use Access Key and Secret Key for authentication.
Whatever choices you make to you provision S3 and assign access management in your AWS environment, the AWS
userid used to configure S3 access in RecoverX needs to have these S3 access rights, which can also be set in an S3
bucket policy for the specifically created S3 bucket:
• AbortMultipartUpload
• DeleteObject
• GetObject
• ListBucket
• ListBucketMultipartUploads
• ListMultipartUploadParts
• PutObject
Example S3 bucket policy
{
“Id”: “Policy1501517025250”,
“Version”: “2012-10-17”,
“Statement”: [
{
“Sid”: “Stmt1501516967728”,
“Action”: [
“s3:AbortMultipartUpload”,
“s3:DeleteObject”,
“s3:GetObject”,
“s3:ListBucket”,
“s3:ListBucketMultipartUploads”,
“s3:ListMultipartUploadParts”,
“s3:PutObject”
],
“Effect”: “Allow”,
“Resource”: “arn:aws:s3:::s3-bucket-for-ami”,
“Principal”: {
“AWS”: [
“arn:aws:iam::999999999999:user/DickDale”
]
}
}
]
}
9EBOOK www.datos.io [email protected]
Configure Passwordless ssh on the RecoverX Instance and SSH Authentication for the User “Datos” on Each of the Mongo Nodes
1. Create an SSH key for user “datos” on the RecoverX instance.
2. Copy the contents of the public key, id_rsa.pub.
Copy the contents of Id_rsa.pub.
3. On each mongo node paste the copied contents of id_rsa.pub to a file named “authorized_keys in the .ssh directory
of the datos user. If there is no .ssh directory for the user “datos,” create it.
Insert the copied contents in the in the “authorized_keys” file and repeat this for each mongo node in any MongoDB
cluster that will be protected by RecoverX.
Configuring RecoverXs
RecoverX is managed using a consumer-grade graphical user interface accessed through a web-based console. To log
into the console, follow these steps:
1. Use a web browser and the public IP address of the AWS instance where RecoverX is deployed. to connect to the
console with the following URL: https://<IP_address>:9090/#/dashboard.
2. At the login screen, enter the default username “admin” and default password “admin”. On successful login, the
home page should appear.
3. After logging in for the first time, change the password for the administrator account by clicking the Settings menu
and choosing CHANGE PASSWORD.
You may change the password at any time. The username, however, is not changeable at this time.
10EBOOK www.datos.io [email protected]
RecoverX GUI
The RecoverX Home Page
Configure Version Storage
The first step of configuring RecoverX is to configure access to the S3 bucket
secondary storage for data protection.
Click on the “configuration” drop down and click on “version stores”.
Select S3 storage type and assign a name to this version store. An S3 bucket can
contain one or more version stores.
11EBOOK www.datos.io [email protected]
Configure the bucket assigned to this version store via url, the combination of http://<bucket name>.s3.amazonaws.com.
In our example, the access is authenticated via access key/secret key for a specific AWS user with access rights to the
S3 bucket. Based on your configuration of IAM roles, it may not be necessary to use an access key id — secret key
combination in this step.
Once a version store or version stores have been configured, the GUI display a list of configured version stores, the
version store configuration and the capacity currently utilized by each one.
12EBOOK www.datos.io [email protected]
Add a MongoDB Source Cluster
The next step in configuring RecoverX is to add source data MongoDB cluster that contains 1 or more collections
to be protected.
The source can be a cluster configured as a single MongoDB primary, a single replica set or multiple shards or
sharded replica sets.
Click on the “configuration” option in the top menu and select “Data Sources”.
This will open the “Add Data Source” configuration screen.
You will be configuring/adding the following selections:
• Source Data Type — select “MongoDB” from the drop down.
• Cluster Name: enter a name you want to assign to this MongoDB cluster data source. The name cannot contain space.
• Configuration IP Hostname: for a single Primary or unshared replica set, enter the IP
Address or Hostname of the Primary node. For a sharded cluster enter the IP Addresses)
or Hostnames of the 3 cluster Configuration Server replica set members.
• Configuration Port: enter configured MongoDB port number.
• Source Authentication: from the dropdown select from “password”, “Key String” or “Key File Path”. This is the
authentication the RecoverX will utilize to access the Mongo nodes for orchestrated backup and recovery.
The best practice and most reliable source of authentication is to use
the ssh key, as configured earlier in this cookbook.
In the example below the previously configured user “datos” is the username and the previously
key file path to the previously configured ssh private key for the user “datos” is utilized.
13EBOOK www.datos.io [email protected]
• MongoDB Driver Authentication: if the MongoDB cluster is configured for MongoDB driver
authentication enter a valid MongoDB user, with appropriate rights assigned in the Mongo Roles
and Permissions file (see RecoverX installation guide reference) or a Mongo “Super User”.
• Ignore Secondaries: enter a list of secondaries (by IP Address) that are excluded if
not located in the same DC as the Mongo Primaries or are inaccessible.
When all the entries in the “Add Data Source” menu are completed, click on the “Add Data Sources” button.
RecoverX will display a window that shows the completion of a series of validation steps ending in successfully adding
the source cluster.
14EBOOK www.datos.io [email protected]
After completion of adding a data source, the “Data Sources” screen will show the details of this added data source and
any other previously added data sources.
15EBOOK www.datos.io [email protected]
The details include a listing of the cluster nodes by name or IP address, their port number and node status.
“Source Details” will display the status of each collection in the cluster, including collection name, number of documents,
collection size and protection status (including protection policy name if backups are configured for that collection).
Configure a Protection “Policy”
The final step in configuring RecoverX to protect data in a MongoDB cluster is adding a protection “Policy”.
The window for creating a policy can be activated either by clicking going the action “protect” in the listing of
unprotected collections in “Data Source Protection Status” on the “Data Sources” screen, or by clicking on the
“Configuration” Drop down at the top or the main screen and selecting “Policies”.
Either method will open the “Add Policy” window.
You will configure the following selections to complete adding a policy:
• Policy Name: Enter a name, that you choose, for the policy.
• Data Source Type: Click on the selection and select MongoDB from the drop down list.
• Cluster Name: select MongoDB clusters added as a data source which contains
the collection or collections you want to protect with this “Policy”.
• Consistency Type: Accept the default “one”.
• Version Store: Click on the selection and select the Version Store into which this policy while write its backups.
• Management Objects: Click on this selection and select the collections that will be protected by this “Policy”.
16EBOOK www.datos.io [email protected]
• Policy Start Date: Select a Start Date from the drop down calendar. The default is the current date.
• Policy Start Time: Select a Start Time from the drop down. The default is the current time.
• Schedule Name: Create a name for the Schedule. If there are already schedules previously configured they will appear
in the drop down list. The schedule name will be stored available as a drop down when creating additional policies.
• Version Interval: Choose a value, from the drop down list for hours, days, weeksor months. The best practice
most frequent interval is 4 hours.
• Data Retention Period: Choose a value, from the drop down list, for days, weeks or months. The minimum
retention time is 1 week.
17EBOOK www.datos.io [email protected]
• Once the configuration selections have been configured, click “add policy”.
• A window will display for successful completion of adding a policy.
Monitoring RecoverX Activity
After a protection “Policy” is created you can monitor the status of the initial backup
started by the addition of the policy and monitor the status of other RecoverX protection
and restore activity.
To monitor activity, click on the “Monitoring” menu choose in the RecoverX main
screen and select “Tasks”.
This will open a list of active and prior tasks. In this example below we can see that “initial synch” (essentially the first full)
backup was started and has completed and a subsequent “delta synchs” (backup of incremental data) is scheduled per
the policy schedule.
18EBOOK www.datos.io [email protected]
The RecoverX “Dashboard” also presents a summary of task status on the bottom half of the screen and presents a
summary of Data Sources, Policies, Version Store and Capacity Savings on the top half of the screen.
19EBOOK www.datos.io [email protected]
About Datos IO
Datos IO is the application-centric data management company for the multi-cloud world. Our flagship Datos IO RecoverX
delivers a radically novel approach to data management helping organizations embrace the cloud with confidence by
delivering solutions that protect, mobilize, and monetize their data — at scale. Datos IO was recently awarded Product of
the Year by Storage Magazine, and was recognized by Gartner in the 2016 Hype Cycle for Storage Technologies. Backed
by Lightspeed Venture Partners and True Ventures, Datos IO is headquartered in San Jose, California.