Configuring a Disaster Recovery Host

From Alteeve Wiki
Revision as of 21:41, 19 September 2024 by Digimer (talk | contribs)
Jump to navigation Jump to search

 Alteeve Wiki :: How To :: Configuring a Disaster Recovery Host

A Disaster Recovery host, aka "DR Host", is a physical server that is installed in a physically different location from the production Anvil! cluster.

The purpose of the DR host is to provide a fall-back location to run servers should the production cluster suffer a catastrophic failure. Consider failure scenarios like;

  • Accidental fire suppression discharge
  • Transformer failure feeding the data-center
  • Localized fire in the cluster location

In such scenarios, the facility might still be perfectly able to function, but all cluster equipment is damaged or destroyed.

The DR host is often installed in an opposite corner of the facility, in another building on campus, or in an entirely different city. Where ever it happens to be, the DR host can be pressed into service!

Storage replication to the DR is streaming and ordered, but it is not synchronous. This way, the latency of the remote connection does not impact day to day performance, but the data being replicated to the DR host is ordered, even across multiple virtual disks. As such, the DR host may be allowed to fall a few seconds behind production, but the data will be contiguous.

What this means is that your servers will boot on the DR host, file system journals will replay, database write-ahead logs will work, and your applications will start, no different than if a machine had simply rebooted

The time to get the DR site online is measured in minutes. Far faster than recovering from even onsite backups with standby hardware!

DR Host Hardware Considerations

In an ideal configuration, there would be a dedicate DR host to match the hardware of each Anvil! node. In such a setup, a full fail-over to the DR site would be possible without any loss in performance.

For those with stricter budgets, a DR host's hardware can be sized such that a subset of core production servers are protected only.

Another possible configuration is to have one (or a few) much larger machines that each can provide DR hosting to 2 or more production nodes. Of course, the performance in such a configuration may be impacted. It is strongly advised to test that performance will be acceptable prior to deployment.

Connecting a DR Host to an Anvil Node

The first step is to link a DR host to a node. Initially, a DR host is "floating", meaning it's connected to the cluster but not assigned to any node.

Note: In a future release, DR functions will be moved into the Striker UI. Until then management is done via the command line.

We will use the command line tool anvil-manage-dr.

Note: The anvil-manage-dr tool can be run from any machine in the cluster.

We can check the current associations using the '--show' switch.

anvil-manage-dr --show
Anvil! Nodes
- Node Name: [an-anvil-01], Description: [Demo VM Anvil!]
 - No linked DR hosts yet.

-=] DR Hosts
- Name: [an-a01dr01.alteeve.com]

-=] Servers
- Server name: [srv01-bar] on Anvil! Node: [an-anvil-01]
- Server name: [srv02-win2019] on Anvil! Node: [an-anvil-01]
- Server name: [srv03-el6] on Anvil! Node: [an-anvil-01]
- Server name: [srv04-min] on Anvil! Node: [an-anvil-01]

In this example cluster, there is one node called "an-anvil-01", and one DR host called "an-a01dr01.alteeve.com". There are four servers that we'll protect later.

Linking a DR Host to a Node

The first step is to "link" the DR host "an-a01dr01" to the node "an-anvil-01". When a DR is linked, it simply tells the cluster that the DR host is a candidate target for protecting servers.

anvil-manage-dr --anvil an-anvil-01 --dr-host an-a01dr01 --link
The DR host: [an-a01dr01] has been linked to the Anvil! node: [an-anvil-01].

We can confirm that this link has been created by re-running "--show".

anvil-manage-dr --show
Anvil! Nodes
- Node Name: [an-anvil-01], Description: [Demo VM Anvil!]
 - Linked: [an-a01dr01], link UUID: [daea7173-748a-4874-8126-5858d6226e5b]

-=] DR Hosts
- Name: [an-a01dr01.alteeve.com]

-=] Servers
- Server name: [srv01-bar] on Anvil! Node: [an-anvil-01]
- Server name: [srv02-win2019] on Anvil! Node: [an-anvil-01]
- Server name: [srv03-el6] on Anvil! Node: [an-anvil-01]
- Server name: [srv04-min] on Anvil! Node: [an-anvil-01]

Excellent!

Adding a DR Host's Volume Group to a Node's Storage Group

In an Anvil! cluster, storage groups are groupings of LVM volume groups across subnodes and DR hosts. These are used to know where to create the backing logical volumes for a server's virtual hard drive.

When a server is "protected" by a DR host, a new logical volume is created on that DR host and it is then added to the server's replicated storage. In order to know which VG to use when creating these new LVs, the DR host's volume group(s) must be added to the node's storage group(s).

Consider this example;

Server: srv01-database on an-a01n01.
Storage Group Subnode/DR Host Volume Group
Storage Group 1 an-a01n01 an-a01n01_vg0
an-a01n02 an-a01n02_vg0

When we use anvil-manage-storage-groups, we can see this in detail;

anvil-manage-storage-groups --anvil an-a01n01
Anvil Node: [an-anvil-01] - Demo VM Anvil!
- Subnode: [an-a01n01] volume groups;
 - [an-a01n01_vg0], size: [248.41 GiB], free: [62.68 GiB], internal UUID: [Lzlhon-E4gr-2PEE-IInb-JmGr-uRID-gLrtDy]
- Subnode: [an-a01n02] volume groups;
 - [an-a01n02_vg0], size: [248.41 GiB], free: [62.68 GiB], internal UUID: [z2an5E-JHq9-p1Hl-ln02-17P7-ciZk-OtjDZI]
- Storage group: [Storage group 1], UUID: [c6f1b34d-052b-49e3-a4a2-b9e5b31f3280]
 - [an-a01n01]:[an-a01n01_vg0]
 - [an-a01n02]:[an-a01n02_vg0]

Disaster Recovery Hosts:
- DR Host: [an-a01dr01] VGs:
 - [an-a01dr01_vg0], size: [248.41 GiB], free: [170.53 GiB], internal UUID: [N9TvqT-IhEr-cjb0-Xs3l-2dQG-enDv-DZugyK]

This example is pretty simple, as there is only one volume group per subnode and there's only one volume group on the DR host. So adding the DR host's VG to the storage group is simple;

anvil-manage-storage-groups --anvil an-anvil-01 --group "Storage group 1" --add --member N9TvqT-IhEr-cjb0-Xs3l-2dQG-enDv-DZugyK
Added the volume group: [an-a01dr01_vg0] on the host: [an-a01dr01] to the storage group: [Storage group 1]. The new member UUID is: [ac276cff-982a-4b4a-8956-c3291c81ef73].

Now if we look again, the storage group now has the DR host's VG.


Server: srv01-database on an-a01n01.
Storage Group Subnode/DR Host Volume Group
Storage Group 1 an-a01n01 an-a01n01_vg0
an-a01n02 an-a01n02_vg0
an-a01dr01 an-a01dr01_vg0

The more detailed view;

anvil-manage-storage-groups --show
Anvil Node: [an-anvil-01] - Demo VM Anvil!
- Subnode: [an-a01n01] volume groups;
 - [an-a01n01_vg0], size: [248.41 GiB], free: [62.68 GiB], internal UUID: [Lzlhon-E4gr-2PEE-IInb-JmGr-uRID-gLrtDy]
- Subnode: [an-a01n02] volume groups;
 - [an-a01n02_vg0], size: [248.41 GiB], free: [62.68 GiB], internal UUID: [z2an5E-JHq9-p1Hl-ln02-17P7-ciZk-OtjDZI]
- Storage group: [Storage group 1], UUID: [c6f1b34d-052b-49e3-a4a2-b9e5b31f3280]
 - [an-a01dr01]:[an-a01dr01_vg0]
 - [an-a01n01]:[an-a01n01_vg0]
 - [an-a01n02]:[an-a01n02_vg0]

Disaster Recovery Hosts:
- DR Host: [an-a01dr01] VGs:
 - [an-a01dr01_vg0], size: [248.41 GiB], free: [170.53 GiB], internal UUID: [N9TvqT-IhEr-cjb0-Xs3l-2dQG-enDv-DZugyK]

We're not ready to protect servers!

Protecting a Server

Protecting a server is the process of configuring the cluster to allow for the server to run on a DR host. This involves copying over the server's "definition file" to the DR host, and extending the server's replicated storage to the DR host.

Replication Protocols

To extend the storage replication to the DR host, we must decide how we want the storage to replicate. The is controlled by selecting a "protocol".

The two subnodes in an Anvil! node always replicate using the 'sync' protocol. This is fine as there is a dedicated link between the subnodes to ensure that storage replication happens very quickly and it ensures no data loss in case of a catastrophic and unexpected fault.

Over DR, the replication link could be higher latency and/or lower bandwidth. In such cases, using the 'sync' protocol would cause a performance hit, as a write to disk would not be complete until the data reaches persistent storage on the DR host. For this reason, two alternative replication protocols are supported; 'short-throw' and 'long-throw'.

Note: The 'long-throw' protocol uses a closed-source utility and, as such, requires a licence. Contact us for more information.
DR Replication Protocols
Protocol Benefit Drawback
sync Maximum data protection Storage performance limited to slowest network latency/bandwidth
short-throw Minimal performance hit over DR link DR host "falls behind" production
long-throw Supports high-latency, low bandwidth links DR is allowed to fall further behind. Requires a license



 

Any questions, feedback, advice, complaints or meanderings are welcome.
Alteeve's Niche! Enterprise Support:
Alteeve Support
Community Support
© Alteeve's Niche! Inc. 1997-2024   Anvil! "Intelligent Availability®" Platform
legal stuff: All info is provided "As-Is". Do not use anything here unless you are willing and able to take responsibility for your own actions.