NSX-T Multisite Options Part 1

Intro

I’ve previously made a post which as it turns out is a pretty popular one being the most read post on my blog and thats the Multisite vs Federation post. The content is still very relevant however there is another way to do Multisite that as it turns out is pretty obvious but not covered in any VMware documentation unless you look at the VCF design and deployment guides and thats Multisite using vSphere HA.

First off let’s recap on the other methods and basic architecture of the first two methods then we’ll take a look at the HA method. In the second part of this post I’ll run through whats needed to set it up.

Manual Multisite failover

Use Case

The main use case is to allow a multisite configuration when you do not have an L2 stretched management network or stretched storage, vSphere HA is also not required.

Requirements

Each site has its own Management vCenter servers which stay in the same site these vCenters are added as compute managers to NSX-T.
NSX-T Managers must be associated with a DNS name with a short TTL.
All transport nodes (Edge nodes and hypervisors) must connect to the NSX Manager using their DNS name. To save time, you can optionally pre-install an NSX Manager cluster in the secondary site.

The Edge nodes can be VMs or bare metal.
The tier-0 gateway can be active-standby or active-active.
Edge node VMs can be installed in different vCenter Servers. No vSphere HA is required.

Site Failover

The biggest issue with the site failover is the need to manually recover the NSX Managers from backup. In reality I’ve seen customers use snapshot backups to do the recovery and while this may work it is not supported by VMware.

Management Plane Recovery

One that is done the T1’s are manually connected to the T0 in the recovery site.

Data Plane Recovery

Automated Multisite

Use Case

The main use case is to allow a multisite configuration when you do have an L2 stretched management network and stretched storage, vSphere HA is required.

Requirements

The NSX Manager cluster is deployed on the management VLAN and is physically in the primary site there is a single vCenter server for Management also in the primary site.
If there is a primary site failure, vSphere HA will restart the NSX Managers and the vCenter Server in the secondary site.
All the transport nodes will reconnect to the restarted NSX Managers automatically. This takes about 10 minutes and during this time, the management plane is not available but the data plane is not impacted.

The Edge nodes can be VMs or bare metal
The maximum latency between Edge nodes is 10 ms
The HA mode for the tier-0 gateway must be active-standby, and the failover mode must be preemptive.
Note: The failover mode of the tier-1 gateway can be preemptive or non-preemptive.

Site Failover

vSphere HA will move the NSX-T managers across to the recovery site and since the T0 gateways are setup as active standby and a failure domain is configured they will failover automatically to the recovery site as well.

Management Plane Recovery
Data Plane Recovery

HA Multisite

Use Case

The use case is to allow a multisite configuration when you do have an L2 stretched management network and stretched storage, vSphere HA is required. The big difference between the HA Multisite and the Automated Multisite is that the T0 can be Active-Active and the use of Edge Fault domains is not needed. Its also requires less edge nodes since they can also failover. I think the change came from the fact that later releases of NSX-T now support HA and vMotion for the Edge nodes along with improvements to the BGP functionality.

Requirements

The NSX Manager cluster is deployed on the management VLAN and is physically in the primary site there is a single vCenter server for Management also in the primary site.
If there is a primary site failure, vSphere HA will restart the NSX Managers the vCenter Server and the Edge nodes in the secondary site.

The Edge nodes can be VM Only.
The HA mode for the tier-0 gateway can be active-active or active-standby.
vSphere HA is required.

Site Failover

vSphere HA will move the NSX-T managers and the Edge nodes across to the recovery site so the recovery is 100 % hands off.

Management Plane Recovery

BGP peering was already in place to the recovery site before the failure so the edge nodes simply reconnect and away it goes.

Data Plane Recovery

OK so now we can see the differences and why we would use them lets build the HA multisite configuration.

Part 2 – NSX-T Multisite Options Part 2

Leave a Reply

Your email address will not be published. Required fields are marked *