Amazon Web Services (AWS) Solution Cookbook - …datos.io/wp-content/uploads/2017/10/datosio_AWS-eBook_FINAL.pdf · Amazon Web Services (AWS) Solution Cookbook EBOOK ... Debian Version

Amazon Web Services (AWS) Solution Cookbook

Amazon Web Services (AWS) Solution Cookbook

E B O O K

Introduction

Datos IO RecoverX is the industry-first

scale-out data protection software-only

product to deliver scalable and reliable

backup and recovery solutions for next-

generation applications built on scale-out

databases. With RecoverX, enterprises

can deploy business-critical applications

with confidence in the recoverability of

data and the ability to maintain high

application uptime.

The Deployment Steps Will Include:

• Creating and configuring an Instance

from the shared AMI

• Provisioning AWS S3 storage to host the

version of backed up MongoDB data

• Launching RecoverX from the Instance

• Configuring RecoverX to Protect

MongoDB Data Sources

This solution cookbook will describe

deployment and configuration of

RecoverX in AWS to backup and

recover MongoDB clusters running in

AWS. Datos IO simplifies the installation

of RecoverX by delivering RecoverX

as a shared AMI. The RecoverX AMI

is an easy way install and configure

RecoverX with your database clusters.

Once installation is complete, a user can

connect to the RecoverX GUI or console

and immediately begin to configure

and manage data sources, versioning

policies, version storage, and perform

recovery operations.

Launch RecoverX from the AMI

The first step is to locate the AMI in your AWS EC2

Management Console. Datos IO will enable access to

this AMI with RecoverX using pre-installed user-id that

will authenticated using an SSH key that you configure

during provisioning of an instance from this AMI.

Click on “launch” to launch an AWS

instance based on this AMI.

2EBOOK www.datos.io [email protected]

You will be asked to “choose and instance type” with a list of standard AWS EC2 machine configurations. Choose the

required EC2 instance for running RecoverX: m4.2xlarge.

Click next (in the lower right hand corner) to choose “instance details”.

Be sure and select the correct network VPC and subnet that enables network connectivity between the RecoverX

instance being created and existing Mongo cluster instances.

Click next to “add storage”.

The storage choices will already be configured, a 15GB root volume and a 256GB. This is the required minimum

configuration for running RecoverX (for capacity and storage IOPs). Please accept this configuration by clicking “next”.


Click next, and on the next screen and add any tags for your environment.

Click next to configure “security group” firewall rules for the instance.

Configure Security Group firewall rules, for your AWS environment, that enable access to the following incoming ports on,

on the Public address of the RecoverX instance you are creating:

• SSH/22

• TCP/9090

Click on next to review and accept your new instance configured for RecoverX.

It will take a few minutes for the RecoverX EC2 instance to initialize. After the RecoverX instance is running, log into the

instance using the SSH key provided by Datos and username (“cloud-user”) and follow the steps below to accept the

end-user agreement and initialize RecoverX.


Page Through End User License Agreement and Type “Y” to Accept It on the Final Screen


Start Datos Server

Now that RecoverX is provisioned and initialized, the next series of steps are necessary to configure the MongoDB Data

Source nodes to be accessed and managed by RecoverX and for RecoverX to orchestrate backups to/from backup

version storage.

Configure Data Source

Create Datos IO user (e.g. “datos”) account on each Data Source node for running commands to extract data from the

cluster. Be sure and perform this step on every data MongoDB Data Source node, (including arbiters in unsharded

Replica Sets.

We recommend adding the new user to the existing mongo group.


Configure Authentication for the “datos” user using one of these authentication methods (we recommend using #3):

1. Username and Password

2. Username and Passphrase

3. Username and SSH Access key

Debian Version 6.0 and Centos Version 6.0 and above have typically have OpenSSL installed by default. If not

installed, the following commands can be used to install OpenSSL.

Debian/Ubuntu:

sudo apt-get install openssl

Centos/RHEL

sudo yum install openssl

This is a prerequisite that you will need when adding the MongoDB Data Source(s) to RecoverX.

Give <datos_db_user> write permission to its home directory (/home/<datos_db_user>) on all Data Source nodes.

Configure Maximum SSH Sessions

For each node, edit the following file /etc/ssh/sshd_config file to set sshd parameters MaxSessions to “500” and

MaxStartups to “500:1:500”. You can verify the values of these parameters as follows:

Setting these parameters as shown enables more parallel RecoverX processes to extract and version data. If these

parameters are not set as shown, set them by editing /etc/ssh/sshd_config.


Add Hostnames to RecoverX

Mongo node hostnames need to be resolvable from RecoverX server. To do so, connect to the config server and get

hostnames of all mongos instances in the MongoDB cluster.

Next, on RecoverX server, add a hostname entry for each mongos with its hostname and IP address from above.

Configure Network Settings

Ensure that the following ports are allowed by the Linux system and AWS firewalls:

Provision Version Storage

A storage system is required to store the backup copies of the data source that is protected by RecoverX. The most

common way to enable this storage target is by provisioning Amazon S3 bucket(s) to be used as the secondary storage.

Create an S3 bucket, using a memorable name, and select the appropriate AWS region for your environment.


When assigning an S3 bucket as storage, ensure that all data source nodes and RecoverX nodes have Read+Write

access to the bucket (or directory).

To authenticate S3 with RecoverX, you may configure Identity and Access Management (IAM) roles. Follow the Amazon

IAM documentation to create an IAM role with S3 bucket privileges (R+W) and assign this role to EC2 instances on which

RecoverX is installed. This is the recommended method for authentication.

Alternatively, you may use Access Key and Secret Key for authentication.

Whatever choices you make to you provision S3 and assign access management in your AWS environment, the AWS

userid used to configure S3 access in RecoverX needs to have these S3 access rights, which can also be set in an S3

bucket policy for the specifically created S3 bucket:

• AbortMultipartUpload

• DeleteObject

• GetObject

• ListBucket

• ListBucketMultipartUploads

• ListMultipartUploadParts

• PutObject

Example S3 bucket policy

{

“Id”: “Policy1501517025250”,

“Version”: “2012-10-17”,

“Statement”: [

{

“Sid”: “Stmt1501516967728”,

“Action”: [

“s3:AbortMultipartUpload”,

“s3:DeleteObject”,

“s3:GetObject”,

“s3:ListBucket”,

“s3:ListBucketMultipartUploads”,

“s3:ListMultipartUploadParts”,

“s3:PutObject”

],

“Effect”: “Allow”,

“Resource”: “arn:aws:s3:::s3-bucket-for-ami”,

“Principal”: {

“AWS”: [

“arn:aws:iam::999999999999:user/DickDale”

]

}

}

]

}


Configure Passwordless ssh on the RecoverX Instance and SSH Authentication for the User “Datos” on Each of the Mongo Nodes

1. Create an SSH key for user “datos” on the RecoverX instance.

2. Copy the contents of the public key, id_rsa.pub.

Copy the contents of Id_rsa.pub.

3. On each mongo node paste the copied contents of id_rsa.pub to a file named “authorized_keys in the .ssh directory

of the datos user. If there is no .ssh directory for the user “datos,” create it.

Insert the copied contents in the in the “authorized_keys” file and repeat this for each mongo node in any MongoDB

cluster that will be protected by RecoverX.

Configuring RecoverXs

RecoverX is managed using a consumer-grade graphical user interface accessed through a web-based console. To log

into the console, follow these steps:

1. Use a web browser and the public IP address of the AWS instance where RecoverX is deployed. to connect to the

console with the following URL: https://<IP_address>:9090/#/dashboard.

2. At the login screen, enter the default username “admin” and default password “admin”. On successful login, the

home page should appear.

3. After logging in for the first time, change the password for the administrator account by clicking the Settings menu

and choosing CHANGE PASSWORD.

You may change the password at any time. The username, however, is not changeable at this time.


RecoverX GUI

The RecoverX Home Page

Configure Version Storage

The first step of configuring RecoverX is to configure access to the S3 bucket

secondary storage for data protection.

Click on the “configuration” drop down and click on “version stores”.

Select S3 storage type and assign a name to this version store. An S3 bucket can

contain one or more version stores.


Configure the bucket assigned to this version store via url, the combination of http://<bucket name>.s3.amazonaws.com.

In our example, the access is authenticated via access key/secret key for a specific AWS user with access rights to the

S3 bucket. Based on your configuration of IAM roles, it may not be necessary to use an access key id — secret key

combination in this step.

Once a version store or version stores have been configured, the GUI display a list of configured version stores, the

version store configuration and the capacity currently utilized by each one.


Add a MongoDB Source Cluster

The next step in configuring RecoverX is to add source data MongoDB cluster that contains 1 or more collections

to be protected.

The source can be a cluster configured as a single MongoDB primary, a single replica set or multiple shards or

sharded replica sets.

Click on the “configuration” option in the top menu and select “Data Sources”.

This will open the “Add Data Source” configuration screen.

You will be configuring/adding the following selections:

• Source Data Type — select “MongoDB” from the drop down.

• Cluster Name: enter a name you want to assign to this MongoDB cluster data source. The name cannot contain space.

• Configuration IP Hostname: for a single Primary or unshared replica set, enter the IP

Address or Hostname of the Primary node. For a sharded cluster enter the IP Addresses)

or Hostnames of the 3 cluster Configuration Server replica set members.

• Configuration Port: enter configured MongoDB port number.

• Source Authentication: from the dropdown select from “password”, “Key String” or “Key File Path”. This is the

authentication the RecoverX will utilize to access the Mongo nodes for orchestrated backup and recovery.

The best practice and most reliable source of authentication is to use

the ssh key, as configured earlier in this cookbook.

In the example below the previously configured user “datos” is the username and the previously

key file path to the previously configured ssh private key for the user “datos” is utilized.


• MongoDB Driver Authentication: if the MongoDB cluster is configured for MongoDB driver

authentication enter a valid MongoDB user, with appropriate rights assigned in the Mongo Roles

and Permissions file (see RecoverX installation guide reference) or a Mongo “Super User”.

• Ignore Secondaries: enter a list of secondaries (by IP Address) that are excluded if

not located in the same DC as the Mongo Primaries or are inaccessible.

When all the entries in the “Add Data Source” menu are completed, click on the “Add Data Sources” button.

RecoverX will display a window that shows the completion of a series of validation steps ending in successfully adding

the source cluster.


After completion of adding a data source, the “Data Sources” screen will show the details of this added data source and

any other previously added data sources.


The details include a listing of the cluster nodes by name or IP address, their port number and node status.

“Source Details” will display the status of each collection in the cluster, including collection name, number of documents,

collection size and protection status (including protection policy name if backups are configured for that collection).

Configure a Protection “Policy”

The final step in configuring RecoverX to protect data in a MongoDB cluster is adding a protection “Policy”.

The window for creating a policy can be activated either by clicking going the action “protect” in the listing of

unprotected collections in “Data Source Protection Status” on the “Data Sources” screen, or by clicking on the

“Configuration” Drop down at the top or the main screen and selecting “Policies”.

Either method will open the “Add Policy” window.

You will configure the following selections to complete adding a policy:

• Policy Name: Enter a name, that you choose, for the policy.

• Data Source Type: Click on the selection and select MongoDB from the drop down list.

• Cluster Name: select MongoDB clusters added as a data source which contains

the collection or collections you want to protect with this “Policy”.

• Consistency Type: Accept the default “one”.

• Version Store: Click on the selection and select the Version Store into which this policy while write its backups.

• Management Objects: Click on this selection and select the collections that will be protected by this “Policy”.


• Policy Start Date: Select a Start Date from the drop down calendar. The default is the current date.

• Policy Start Time: Select a Start Time from the drop down. The default is the current time.

• Schedule Name: Create a name for the Schedule. If there are already schedules previously configured they will appear

in the drop down list. The schedule name will be stored available as a drop down when creating additional policies.

• Version Interval: Choose a value, from the drop down list for hours, days, weeksor months. The best practice

most frequent interval is 4 hours.

• Data Retention Period: Choose a value, from the drop down list, for days, weeks or months. The minimum

retention time is 1 week.


• Once the configuration selections have been configured, click “add policy”.

• A window will display for successful completion of adding a policy.

Monitoring RecoverX Activity

After a protection “Policy” is created you can monitor the status of the initial backup

started by the addition of the policy and monitor the status of other RecoverX protection

and restore activity.

To monitor activity, click on the “Monitoring” menu choose in the RecoverX main

screen and select “Tasks”.

This will open a list of active and prior tasks. In this example below we can see that “initial synch” (essentially the first full)

backup was started and has completed and a subsequent “delta synchs” (backup of incremental data) is scheduled per

the policy schedule.


The RecoverX “Dashboard” also presents a summary of task status on the bottom half of the screen and presents a

summary of Data Sources, Policies, Version Store and Capacity Savings on the top half of the screen.


About Datos IO

Datos IO is the application-centric data management company for the multi-cloud world. Our flagship Datos IO RecoverX

delivers a radically novel approach to data management helping organizations embrace the cloud with confidence by

delivering solutions that protect, mobilize, and monetize their data — at scale. Datos IO was recently awarded Product of

the Year by Storage Magazine, and was recognized by Gartner in the 2016 Hype Cycle for Storage Technologies. Backed

by Lightspeed Venture Partners and True Ventures, Datos IO is headquartered in San Jose, California.

Documents

Amazon Web Services (AWS) Solution Cookbook - …datos.io/wp-content/uploads/2017/10/datosio_AWS-eBook_FINAL.pdf · Amazon Web Services (AWS) Solution Cookbook EBOOK ... Debian Version