19
© Copyright 2020. All Rights Reserved. HVR Software. © Copyright 2020. All Rights Reserved. HVR Software. hvr-software.com [email protected] HVR High Availability Replication on AWS WHITEPAPER

HVR High Availability Replication on AWS...an Amazon Web Services (AWS) environment. While some prescriptions are provided and architectural considerations covered, this document is

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: HVR High Availability Replication on AWS...an Amazon Web Services (AWS) environment. While some prescriptions are provided and architectural considerations covered, this document is

© Copyright 2020. All Rights Reserved. HVR Software.© Copyright 2020. All Rights Reserved. HVR Software.

hvr-software.com

[email protected]

HVR High AvailabilityReplication on AWS

WHITEPAPER

Page 2: HVR High Availability Replication on AWS...an Amazon Web Services (AWS) environment. While some prescriptions are provided and architectural considerations covered, this document is

© Copyright 2020. All Rights Reserved. HVR Software.

Table of Contents

Document Overview 3

AWS Services Used When Making HVR Highly Available 3

HVR Architecture Overview 4

HVR Hub 4

HVR Stateless Capture and Integrate Agents 5

HA for the HVR Hub 6

HA for the HVR Hub Repository 6

HA for the HVR Hub Server 7

Same Instance Recovery 8

Mitigating Against and Recovery From Full Disk Volumes 8

Recovery from a Stopped HVR Scheduler Daemon/Service 10

Same Availability Zone Recovery 12

Recovery From Availability Zone and Region Outages 13

HA for HVR Stateless Agents 16

Conclusion 18

Page 3: HVR High Availability Replication on AWS...an Amazon Web Services (AWS) environment. While some prescriptions are provided and architectural considerations covered, this document is

© Copyright 2020. All Rights Reserved. HVR Software.

Document OverviewWhile HVR's high performance replication and data validation solution is both reliable and fault tolerant, as with any component of your mission critical data infrastructure it is important to take the necessary steps to make it highly available (HA). This document describes how to mitigate against and recover from downtime in your change data capture (CDC) pipelines when running some or all components of HVR in an Amazon Web Services (AWS) environment. While some prescriptions are provided and architectural considerations covered, this document is not intended to be an exhaustive step-by-step guide, nor provide advice on how to be best architect hybrid-cloud environments.

The primary area of focus for making HVR highly available is the HVR hub (explained below in the architecture overview section), which has both a file system and database components. The secondary area of focus is making the HVR stateless agents, if used, highly available. There are numerous viable methods, products and services to make the HVR file system, database repository and networking highly available, this document will focus on using native AWS features and services.

AWS Services Used When Making HVR Highly Available• Amazon CloudWatch. Used to monitor the

health of servers and file systems upon which HVR runs, this service will also be used to initiate corrective actions.

• Amazon CloudWatch Agent. Allows the collection of additional metrics (such as free disk space) from Amazon EC2 instances (and on-premises servers) that are not otherwise available to the CloudWatch service.

• Amazon EBS. Enterprise Block Storage are the storage volumes that will be connected to the EC2 instances and contain the HVR installation, operating systems and additional libraries.

• Amazon EC2. Elastic Compute instances is where HVR software will run the HVR hub and optional remote stateless services.

• Amazon Elastic IP. This will be used to remap the IP address to another instance in the event of a failover.

• Amazon Elastic Load Balancer. Distributes HVR traffic across multiple Amazon EC2 instances. This will be used for both scale and availability of the HVR stateless agent.

• Amazon Lambda. A compute service that lets you run code on-demand without provisioning or managing servers.

• Amazon RDS. The HVR hub repository will be located in an Amazon RDS instance.

• Amazon Route 53. Domain Name System (DNS) service used to route network traffic help perform DNS failover.

• Amazon S3. The storage system will be used to store and retrieve snapshots of EBS volumes.

• Amazon VPC. Virtual Private Connections will be used to logically isolate a section of AWS for security purposes.

• Amazon VPC. Allows you to route traffic between two VPCs using private IP addresses.

Page 4: HVR High Availability Replication on AWS...an Amazon Web Services (AWS) environment. While some prescriptions are provided and architectural considerations covered, this document is

© Copyright 2020. All Rights Reserved. HVR Software.

HVR Architecture OverviewTo enable the flow of change data between any number of source and target data locations HVR requires a minimum of one installation, which will be designated as the hub. Any given data pipeline, or channel, will have one HVR hub and this will be the primary focus in the discussion of making HVR highly available.

HVR Hub

An HVR data channel can consist of one or many HVR installations, with one and only one of these installations designated as the hub for a given channel. An HVR hub can contain multiple channels and a channel can serve as a data pipeline between two or more endpoints. The hub acts as the central command and control location for one or more data channels. It will manage all aspects of replication from configuration to deployment, data queuing to auto recovery, historical statistics recording to real-time alerting. The hub-only architecture of HVR behaves similar to an "as a service," establishing remote capture connections to sources and remote integrate (apply) connections to targets directly from the hub.

The hub can be run directly on the source, target or an independent location. When installed in AWS, the hub and all endpoints typically reside in the same Virtual Private Connection (VPC). Figure 1 shows the three possible architectures for a hub-only (agentless) configuration. In AWS deployments where the source and targets are in the same region, the first architecture is the most commonly deployed.

The HVR hub stores information both on disk and in a backend database repository, or catalogue. The database repository stores channel configurations, object metadata and replication status and statistics. The disk stores runtime data including recovery state files for all end points and stores highly compressed queued data until those data are successfully delivered to all destinations.

Figure 1 Options for a hub-only architecture.

Source HVR Hub

Server C

Target

1

Server A Server B

VPC

HVR Hub

Server A

Source Target

2

Server B

VPC

TargetSource HVR Hub

3

Server BServer A

VPC

Page 5: HVR High Availability Replication on AWS...an Amazon Web Services (AWS) environment. While some prescriptions are provided and architectural considerations covered, this document is

© Copyright 2020. All Rights Reserved. HVR Software.

Figure 1 Options for a hub-only architecture.

HVR Stateless Capture and Integrate Agents

In enterprise hybrid-cloud and inter-region environments, connecting data endpoints through an architecture of distributed HVR installations is common for superior throughput and security. In such scaled out and highly secured environments, one HVR installation in a channel is designated the hub while any others are managed by the hub as stateless services, or light agents. A single stateless agent can spawn multiple processes to be used for initial bulk loads, change data capture (CDC), data integration (apply), or as a secure data proxy, often setup in an on-premises DMZ (demilitarized zone). Communications between HVR installations is secured using TLS 1.2 with AES 256-bit encryption and data are compressed, often in the 10-20X range, using custom compression designed for efficiency at high volumes. If stateless agents are not used, then the hub will perform the capture and integrate duties as needed.

The stateless agents can be run directly on the source database server, target database server or from their own independent location. Having a stateless agent as close as possible to the source and target systems will help boost performance and optimize security. For example, for data architectures that span on-premises and cloud, multi-cloud vendors, or regions within the same cloud vendor, then having a stateless agent in the same data center or availability zone will prove much more performant than if distant remote connections were made directly to data endpoints from remote locations.

The agents should be in the same VPC as the hub and the endpoint to which they connect. While having stateless agents installed directly on the source (usually for on-premises end points) or in the same region or availability zone provides tremendous advantages, it is not required. The HVR hub, stateless agents and all AWS endpoints typically reside in the same Virtual Private Connection (VPC).

Figure 2 shows three possible architectures with a hub (always required) and one or two agents. This rounds out all six configuration types for a single source to single target replication channel.

Note: A single HVR hub can consist of channels that connect hundreds of source and target endpoints using any combination of agent and agentless configurations simultaneously, supporting a vast array of scalability and security needs.

HVR HubHVR Capture Service

Server A Server B

Target

4

VPC

Source

HVR Integrate ServiceHVR Hub

Server A Server B Server C

Target

VPC

Source

5

HVR Capture Service

Server A

HVR Integrate Service

Server C

TargetHVR Hub

Server B

VPC

Source

6

Page 6: HVR High Availability Replication on AWS...an Amazon Web Services (AWS) environment. While some prescriptions are provided and architectural considerations covered, this document is

© Copyright 2020. All Rights Reserved. HVR Software.

HA for the HVR HubBecause the HVR hub controls all aspects of replication, including the state of any remote HVR services, it is central to the HA discussion. Steps and considerations for making this highly available follows.

HA for the HVR Hub Repository

The HVR hub database repository contains replication configurations, metadata for replicated objects, status of replication jobs, performance statistics and an audit history of user actions. It is not included in the HVR installation, rather the hub repository database is installed independently from HVR and then HVR is configured to connect to it. The database can be local or remote to the hub server. Figure 2 shows the options for the hub repository placement.

In cloud environments the repository is often a managed service and therefore remote from the hub server. The repository database and hub server can be in different regions provided the database response times does not increase more than 150 milliseconds (ms) from being in the same region. Generally, it is good for the repository database to have a 200ms response time from the HVR hub. Having the hub server and repository database split between on-premises and the cloud or between two cloud vendors is not recommended.

The hub repository can be most any database that HVR supports including, but not limited to, MySQL/MariaDB, PostgreSQL, SQL Server and Oracle. For a list of supported databases that can be used for the HVR hub repository see:

https://www.hvr-software.com/docs/installing-and-upgrading-hvr

For their high level of availability, managed databases such as Amazon RDS and Amazon Aurora databases are encouraged in AWS deployed environments. This reduces the additional hardware and administrative tasks needed to make the repository database highly available.

Figure 1 Options for a hub-only architecture.

HVR Hub HVR Hub Repository

1

Server A

HVR Hub RepositoryHVR Hub

Server A

2

Server B

VPC

Page 7: HVR High Availability Replication on AWS...an Amazon Web Services (AWS) environment. While some prescriptions are provided and architectural considerations covered, this document is

© Copyright 2020. All Rights Reserved. HVR Software.

Figure 4 shows a typical Amazon RDS instance with a standby replica instance in the same Region. The performance of the RDS instance is such that it does not need to be in the same Availability Zone and the failover of that instance is automated by AWS and transparent to HVR.

HA for the HVR Hub Server

The HVR hub server maintains data on disk and communicates with the database repository as well as replication end points. These endpoints are communicated with either directly or via stateless agents, running as a service or daemon, which sometimes require third party software (e.g. database clients or ODBC drivers) to connect to those data end points. Areas that need to be made highly available for the hub server are as follows:

1. Disk volumes. Only one disk volume is needed per HVR hub server so a single Amazon Elastic Block Store (EBS) can accommodate all directories needed for an HVR hub. HVR works directly from three directories: HVR_HOME, HVR_CONFIG and HVR_TMP. In addition to these directories HVR needs access to third party libraries as well as the core operating system libraries. The directories, and files therein, that will reside on this EBS volume are as follows:

• HVR_HOME. Static HVR installation files on disk. The installation binary files and the HVR license file (hvr.lic) are contained here.

• HVR_CONFIG. Dynamic data and other generated files. These include temporarily queued compressed data, runtime job files, process state and recovery information, and log files. HVR replication can always pick up where it left off if this directory and its files are available on the original or failover system.

Figure 4 HVR Repository HA when using Amazon RDS.

Region

Availability Zone

HVR Hub

EC2 instance

HVR Hub Repository

Availability Zone

RDS instance

Availability Zone

HVR Hub Repository

RDS standby

Page 8: HVR High Availability Replication on AWS...an Amazon Web Services (AWS) environment. While some prescriptions are provided and architectural considerations covered, this document is

© Copyright 2020. All Rights Reserved. HVR Software.

• HVR_TMP. If processing requires more RAM than what HVR has been configured to use, then data will spill to disk in this directory. If this directory is not defined, then HVR will create a subdirectory within HVR_CONFIG directory structure.

• Third party connection libraries. The libraries are used by HVR to connect to replication end points as well as the database repository. Examples; ODBC libraries, database clients and security libraries.

• OS library files. HVR installations are compiled in C and sensitive to the OS type. Generally, there is only one build per OS and that build can run on several major and minor versions of a given OS type.

2. Network access. Whether the hub is connecting directly to end points (sources and targets) or connecting through a remote HVR stateless service, those connections will be made using IP addresses or DNS names. Therefore, appropriate network routing and security access permissions must be in place. This is often addressed by Amazon Elastic IP and Elastic Load Balancers.

Same Instance Recovery

Most failures will not require failover of the HVR hub server. HVR's data queuing and state record keeping on the hub allow it to automatically handle source, destination and network outages as well as killed processes and hard reboots. If endpoint become unavailable by default HVR will automatically retry to connect after 10 seconds and double the wait between retries until the wait reaches 20 minutes. The wait times between retries is separately configurable for each end point and retries will continue forever. Once the connection is re-established, based on the save state information on the hub and in the target endpoint, HVR will automatically resume where it left off.

There are two runtime situations where replication can stop that require additional attention: full disk volumes and a stopped HVR Scheduler daemon.

Mitigating Against and Recovery From Full Disk Volumes

If the HVR_CONFIG directory has no space to temporarily queue compressed data, capture processes will timeout and enter retry mode. This effectively stops replication and can occur when capacity planning did not correctly account for:

• Extended target system outages during continuous replication, or

• Longer than expected target system load times during the initial bulk load.

During these periods, change data is queued on the hub until the target is again available or the initial bulk load has completed. Note that once the HVR integrate process begins applying the accumulated change data, backlog of queued data will be automatically deleted from the hub. If the disk volume becomes full, then change data capture will stop. If the capture has stopped for enough time, then upon startup it may have to rewind into the source system's backup or archived transaction log files (from where change data capture reads changed data). If those files are no longer available, then a gap in change data will occur. At such a time there are two options ensure both source and target systems are in sync:

• Restore the required backup or archived transaction log files, or

• Perform a new bulk load into the target

Page 9: HVR High Availability Replication on AWS...an Amazon Web Services (AWS) environment. While some prescriptions are provided and architectural considerations covered, this document is

© Copyright 2020. All Rights Reserved. HVR Software.

The simplest way to avoid a full disk volume is allocate enough space ahead of time. The amount of space to be allocated is the rate of accumulation times the maximum time tolerance for not applying changes to the target. HVR will not apply changes to the target if:

• There is an initial bulk data load (refresh) in progress for one or more tables in a channel.

• If target is offline

• If the integrate process is suspended

The bulk load period will depend on amount of data being copied, the level of parallelization, network throughput, capability of the source to unload the data and the capability of the target to load the data. Running empirical tests on a subset of data and extrapolating those metrics is the best method for providing reasonable estimates on load times. The target can be offline or suspended for maintenance activities, during a failover or if the database is rejecting changes and HVR has not been configured to handle such errors.

Simply running a capture only test and accumulating change data on the hub for an extended duration is a valid method to determine the amount of space needed to queue data for a given period of target unavailability. This will allow a general ratio of source transaction volume generation to HVR capture data volume to be determined. Common disk utilities can be used to measure the volume of data accumulated on disk. Alternatively, and for use in an ongoing basis, HVR keeps track of the amount of transient data that are being queued on the hub as statistics in the repository.

If more disk space is needed, then EBS volumes can be resized without taking downtime. The high-level steps are:

1. Request the volume modification.

2. Monitor the progress of the volume modification.

3. Once modification completes, extend the volume's file system via OS commands.

For more detailed steps on extending the EBS volume see the following links:

• https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html

• https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/requesting-ebs-volume-modifications.html

• https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/recognize-expanded-volume-linux.html

• https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/recognize-expanded-volume-windows.html

Monitoring disk full situations can be done directly using custom cron shell (e.g. bash) scripts or Amazon CloudWatch alarms, and indirectly using native HVR latency alerts. Using cron to schedule the execution of shell scripts is the simplest method to detect the need for and then dynamically allocate additional disk space. Amazon CloudWatch alarms requires using the CloudWatch agent. HVR provides the capability to send latency alerts via Amazon SNS, SNMP, email and Slack when the delay between source and target data have drifted beyond a user defined time-based threshold. Administrators can then take corrective action such as allocating more target resources or disk space to the EBS volume attached to the EC2 instance running the HVR hub.

Page 10: HVR High Availability Replication on AWS...an Amazon Web Services (AWS) environment. While some prescriptions are provided and architectural considerations covered, this document is

© Copyright 2020. All Rights Reserved. HVR Software.

Amazon CloudWatch alarms can be set for the EBS disk becomes full. Thresholds of more than 90% used or less than 20GB free are reasonable settings in many situations. CloudWatch disk monitoring requires running the CloudWatch Agent on the EC2 instance and collecting the disk_used_percent and disk_free and metrics, respectively.

For more information on setting up the Amazon CloudWatch Agent see:

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Install-CloudWatch-Agent.html

The CloudWatch alarm cannot directly invoke the resizing of the EBS volume. Instead, the resizing of the EBS volume can be done manually by an administrator when deemed necessary or automatically by way of an AWS Systems Manager (SSM) Run Command or Lambda function, either triggered through a CloudWatch event.

For more detail on how to send a shell command to an EC2 instance see:

https://docs.aws.amazon.com/systems-manager/latest/userguide/walkthrough-cli.html#walkthrough-cli-run-scripts

https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EC2_Run_Command.html

For examples of how to make Lambda functions to execute command on an EC2 instance see:

https://docs.aws.amazon.com/lambda/latest/dg/with-scheduled-events.html

https://medium.com/@Electricste/run-shell-commands-on-a-ec2-from-a-lambda-function-108994911064

If the HVR_CONFIG directory becomes completely unrecoverable, then an older EBS snapshot will need to be recovered and attached and then replication will need to be re-initialized. Re-initializing HVR is explained in the HVR documentation pages here:

https://www.hvr-software.com/docs/quick-start-guides

Recovery from a Stopped HVR Scheduler Daemon/Service

Stopping the HVR Scheduler daemon, or Service on Windows, will stop all replication processes. The HVR Scheduler can stop if an administrator issues a stop command, if the process crashes or if it experience a broken connection to the database repository due to a network timeout or a repository failover. Once the HVR Scheduler process re-starts it will automatically start all replication process that were in a Running state prior to the Scheduler stopping.

To ensure that the HVR Scheduler automatically starts if it has stopped for one of the above reasons it should be run as a systemd process on Linux or as a service in Windows. The following steps show how to make the process automatically restart on Linux and Windows.

Page 11: HVR High Availability Replication on AWS...an Amazon Web Services (AWS) environment. While some prescriptions are provided and architectural considerations covered, this document is

© Copyright 2020. All Rights Reserved. HVR Software.

For Linux:

1. Locate the service file for the HVR systemd process.

2. Add the following lines:

3. After saving the file, reload the service:After saving the file, reload the service:

For Windows:

1. Locate the HVR Scheduler service in the Windows Services dialogue (services.msc)

2. Navigate to the Recovery tab and choose Restart the Service for First, Second and Subsequent failures.

3. Keep the defaults for Rest fail count after and Restart service after at 0 (zero) days and 1 minutes, respectively. Windows does not support restarting after increments of less than one minute.

Restart=always

RestartSec=3

systemctl daemon-reload

Figure 5 Windows services Recovery tab settings for the HVR scheduler.

Page 12: HVR High Availability Replication on AWS...an Amazon Web Services (AWS) environment. While some prescriptions are provided and architectural considerations covered, this document is

© Copyright 2020. All Rights Reserved. HVR Software.

Same Availability Zone Recovery

The same prescriptions for handling process and disk recovery also apply to EC2 instance failovers, that are hosting the HVR hub server, in the same availability zone. In addition, those prescriptions are extended to include Amazon CloudWatch, Amazon EBS, Amazon Elastic IP and Amazon VPC.

The following recovery scenario is simple to setup but will incur some downtime while the new EC2 instance boots up and HVR processes perform automatic recovery. For a prescription that uses an Active-Passive failover where both HVR hub instances are "warm" (running) see the following section Recovery From Availability Zone and Region Outages.

An HVR hub server failover can be initiated by CloudWatch after failing a system status check (StatusCheckFailed_System).

System status checks can register failures because of the following reasons:

• There is a loss of network connectivity

• The system lost power

• Software issues cause the host to be unreachable

• Hardware issues cause the host to be unreachable

This will trigger a CloudWatch alarm action, which can be set to reboot the instance.

In addition we can also perform an instance recovery. An instance recovery is preferred, as it will allocate new CPU and RAM hardware in the same availability zone. The recovered instance will be identical to the primary instance, including the instance ID, private IP addresses, Elastic IP addresses and all metadata.

In the case of an instance recovery the EC2 Run Command would be used as a CloudWatch Target Event. This will allow the automation of the failover to the standby.

The following link describes configuring CloudWatch events for the run command:

• https://docs.aws.amazon.com/systems-manager/latest/userguide/rc-cwe.html

This link is to an AWS blog article for using this specifically for EC2:

• https://aws.amazon.com/blogs/aws/ec2-run-command-is-now-a-cloudwatch-events-target/

The sequence of events for a recovery are:

1. Detach the EBS volume(s) from the stopped instance.

2. Attach the EBS volume(s) to the stopped standby instance.

3. Attach the Elastic IP (EIP) to the network interface of the standby instance (this will remap the IP address from the primary instance to the failover instance).

4. Start the recovery instance.

Page 13: HVR High Availability Replication on AWS...an Amazon Web Services (AWS) environment. While some prescriptions are provided and architectural considerations covered, this document is

© Copyright 2020. All Rights Reserved. HVR Software.

5. The mount volume information of the new instance should be configured to mount the attached EBS volumes on boot. If not, this can be done manually by SSH into the instance to mount them. The following link describes this process:

• https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-using-volumes.html

6. All replication services will automatically start and pick up where they left off

For additional details and considerations, including race conditions, for automatically recovering EC2 instances see the following links:

• https://aws.amazon.com/premiumsupport/knowledge-center/automatic-recovery-ec2-cloudwatch/

• https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/UsingAlarmActions.html

Figure 6 shows the recovery of an HVR hub in the same Availability Zone using Amazon CloudWatch to initiate the recovery. The recovery EC2 instance uses the same EBS volume that was attached to the original instance.

Figure 6 Same Availability Zone recovery.

Recovery From Availability Zone and Region Outages

While it is unlikely that an entire Amazon Availably Zone or Region will experience an outage, there are a few methods can be used to minimize the impact of such failures. The following prescription is for the tolerating the least amount of downtime while isolating the HVR hub's data pipelines from the internet using a VPC or multiple VPCs connected through a VPC peering connection.

HVR Hub

Instance withCloudwatch

Elastic IP address

Amazon CloudWatchCloudWatch alarm triggers recovery from A to B

EBS Volume

Availability Zone B

HVR Hub

Instance withCloudwatch

Page 14: HVR High Availability Replication on AWS...an Amazon Web Services (AWS) environment. While some prescriptions are provided and architectural considerations covered, this document is

© Copyright 2020. All Rights Reserved. HVR Software.

In this setup the HVR_CONFIG, HVR_HOME and HVR_TEMP directories exist on an EFS system that is mounted using NFSv4. While an Amazon Route 53 health checkers are public and they can only monitor hosts with IP addresses that are publicly routable on the internet, a Route 53 health check record is associated with the private zone invoking a failover when the primary record is unhealthy. The subnet must be private and have access to the internet for the EC2 resource that is monitored.

Figure 7 shows the flow as an Amazon CloudWatch event call a Lambda function, which in turn sends a metric to CloudWatch. Then, the Route 53 health check uses a CloudWatch alarm to invoke the failover routing. Because both EC2 instances share the same file system and since both EC2 instances are active, failover is almost instant. Any running HVR jobs will automatically be restarted by the HVR Scheduler and pick up where they left off. While there are several networking options for communicating across the Availability Zones, one common method is to have them in a single security group communicating across a single shared private subnet, as shown below.

Details, as well as an Amazon CloudFormation template, for creating the Amazon Route 53 health check on a VPC with a Lambda function and CloudWatch can be found in the following link:

https://aws.amazon.com/blogs/networking-and-content-delivery/performing-route-53-health-checks-on-private-

resources-in-a-vpc-with-aws-lambda-and-amazon-cloudwatch/

For information on sharing subnets across Availability Zones in the same Region, see:

https://docs.aws.amazon.com/vpc/latest/userguide/vpc-sharing.html#vpc-sharing-share-subnet

If the HVR hub is available to the public internet then also see:

https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/dns-failover-types.html#dns-failover-types-active-passive

The inter-Region recovery is similar to the Availability Zone recovery with the addition of VPC peering. This allows VPC resources such as EC2 instances, EFS file systems, RDS databases and Lambda functions that run in different Regions to communicate using private IP addresses. Figure 8 shows an example of this architecture.

Figure 7 Failover recovery between Availability Zones.

EBS Snapshot

Elastic IP address

Amazon CloudWatchCloudWatch alarm triggers recovery from A to B

Region A

Availability Zone A

HVR Hub EBS Volume

Instance withCloudwatch

EBS Volume

Region B

Availability Zone B

HVR Hub

Instance withCloudwatch

Page 15: HVR High Availability Replication on AWS...an Amazon Web Services (AWS) environment. While some prescriptions are provided and architectural considerations covered, this document is

© Copyright 2020. All Rights Reserved. HVR Software.

Figure 8 Failover recovery between Regions.

For more information on VPC peering see:

https://docs.aws.amazon.com/vpc/latest/peering/what-is-vpc-peering.html

Region A

Availability Zone A

HVR Hub

Instance withCloudwatch

Region B

Availability Zone B

HVR Hub

Instance withCloudwatch

CloudWatch Event call LambdaLambda sends metric to CloudWatch

AWS Lambda Amazon CloudWatch Amazon Route 53

Route 53 health check uses ClouthWatch alarm

NFSv4 NFSv4

Page 16: HVR High Availability Replication on AWS...an Amazon Web Services (AWS) environment. While some prescriptions are provided and architectural considerations covered, this document is

© Copyright 2020. All Rights Reserved. HVR Software.

Figure 9 Shows on method of failover an HVR stateless agent using an Amazon Elastic Load Balancer.

HA for HVR Stateless AgentsLike most stateless services that run on Amazon EC2 instances, high availability (and scaling) can be achieved by using an Amazon Elastic Load Balancer. This allows the hub to automatically connect to a different stateless agent should the agent or server on which it runs become unavailable. If needed, the HVR hub will automatically take care of any lower level data pipeline recovery actions. This auto recovery may include some situations where a capture or integrate process has transient data in memory that could temporarily spill to disk.

Figure 8 below shows an Amazon ELB distributing connections made to HVR stateless agents for high availability. While out of scope for this document, it is notable that this same method can also be used with auto scaling groups to further scale HVR.

For steps on setting up an Amazon ELB, see the following link and note that the traffic will be TCP and the default port for the HVR stateless agent is 4343 (and configurable):

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-increase-availability.html

Note that the HVR agents do not require an HVR license file to operate. License files are required on the hub-only.

Elastic IP address

Amazon CloudWatchCloudWatch alarm triggers recovery from A to B

Region A

Availability Zone A

EBS VolumeHVRStateless Agent

Instance withCloudwatch

EBS Volume

Region B

Availability Zone B

Instance withCloudwatch

HVRStateless Agent

Page 17: HVR High Availability Replication on AWS...an Amazon Web Services (AWS) environment. While some prescriptions are provided and architectural considerations covered, this document is

© Copyright 2020. All Rights Reserved. HVR Software.

In HVR the data replication end point ("Location" in HVR terms) is identified by a DNS entry/IP address. When this location points to an ELB then multiple EC2 edge nodes, or edge nodes of variable sizes, can be allocated without any change to the definitions in HVR. This provides the ability to dynamically adjust resources available on edge nodes. Do note that the scaling is not currently automated out of the box (in line with the documentation provided).

And another comment:

Because HVR integrate agents are stateless and one agent can handle multiple connections to one or more target destinations, load balancers can be used to help scale parallel processing for bulk loads and continuous data streaming. For example, if you are about to on board new source systems feeding a data lake in AWS, you can register new target instances to your Amazon Elastic Load Balancer. In addition, Amazon Auto Scaling Groups could be added to add new EC2 instances running an HVR agent based on CloudWatch Agent alarms detect CPU or memory is at 90% capacity.

Some support links:

https://docs.aws.amazon.com/elasticloadbalancing/latest/application/application-load-balancer-getting-started.

html#add-targets

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/metrics-collected-by-CloudWatch-agent.html

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/cloudwatch_alarm_autoscaling.html

Page 18: HVR High Availability Replication on AWS...an Amazon Web Services (AWS) environment. While some prescriptions are provided and architectural considerations covered, this document is

© Copyright 2020. All Rights Reserved. HVR Software.

ConclusionWhile both reliable and fault tolerant, HVR's high performance replication and data validation solution can be used with AWS services to automate the next level of high availably. The HVR AMI image available in the Amazon Market Place, which can be used as an HVR hub and an HVR stateless agent, is a quick method to deploy replication services on EC2 instances. Additional Amazon services such RDS, EBS and EFS are the foundation for simplifying HVR hub highly available. Used in conjunction with Amazon Lambda, Route 53, VPC peering, the HVR hub can failover even faster as well as across Availability Zones and Regions. The Amazon ELB helps ensure the stateless agents are always up and running at scale. Combined with Amazon CloudWatch and native HVR alerting, your data replication pipelines can deliver mission critical data with minimal to no interruptions.

Page 19: HVR High Availability Replication on AWS...an Amazon Web Services (AWS) environment. While some prescriptions are provided and architectural considerations covered, this document is

© Copyright 2020. All Rights Reserved. HVR Software.© Copyright 2020. All Rights Reserved. HVR Software.

GETTING STARTED

If you’re looking to simplify high-volume real-time data movement, look no further than HVR.

Whether you’re seeking a flexible real-time, data integration and validation platform to create a

data lake or want to migrate your transactional database to a new system, set up a hot standby

system or share data across locations worldwide in real-time, HVR has you covered. Trusted by

organizations around the globe on their most critical systems, we offer a set of simple, powerful,

and comprehensive data integration and synchronization tools that lower your infrastructure costs

while allowing you to create new and enhanced services that improve your business insights.

For more information on how HVR can benefit your organization, visit us:

www.hvr-software.com/contact