Intro
Welcome to part two of the NSX-T multisite options, in part one we covered the original two ways to do multisite both manual and automated and also covered the HA method is this post I’ll cover how we set Up The HA Multisite. To read part one go here.
The Build
Lets get on and build the HA multisite failover configuration. Our starting point is a stretched vSphere cluster this is also know as a dual AZ deployment AZ being Availability Zone.
We have stretched storage, a stretched Management, Overlay and Uplink VLAN’s. It is possible to do this without stretched uplink VLANs.
For a production deployment of this configuration you will need to configure DRS rules to place the Edge nodes on hosts in AZ1 this needs to be a SHOULD and not a must rule otherwise the Edge nodes will not failover onto AZ2 hosts. Essentially you need a host group for the AZ1 ESXi hosts then create a VM group with the Edges Nodes as members, the rule is then applied. As can be seen in the screenshots below.
Starting Config
I have two Edge nodes deployed in Site A and they are currently BGP peered to the ToR switches in Site A.
Here are the routes that we are interested in on the ToR A switch in Site A
Network Next Hop Metric LocPrf Weight Path
*> 0.0.0.0/0 10.200.1.1 0 0 66000 i
*> 10.0.1.0/24 0.0.0.0 1 32768 ?
*= 10.2.0.0/24 10.50.1.11 0 0 65600 ?
*> 10.50.1.12 0 0 65600 ?
*= 10.20.0.0/24 10.50.1.11 0 0 65600 ?
*> 10.50.1.12 0 0 65600 ?
* 10.50.1.0/24 10.50.1.11 0 0 65600 ?
* 10.50.1.12 0 0 65600 ?
*> 0.0.0.0 1 32768 ?
*> 10.55.1.0/24 10.50.1.11 0 0 65600 ?
*= 10.50.1.12 0 0 65600 ?
*= 10.100.1.0/24 10.50.1.11 0 0 65600 ?
*> 10.50.1.12 0 0 65600 ?
*= 10.100.2.0/24 10.50.1.11 0 0 65600 ?
*> 10.50.1.12 0 0 65600 ?
*= 10.100.3.0/24 10.50.1.11 0 0 65600 ?
*> 10.50.1.12 0 0 65600 ?
*> 0.0.0.0 1 32768 ?
*> 10.165.1.0/24 10.200.1.3 0 66000 65200 ?
*> 10.175.1.0/24 10.200.1.1 0 66000 65200 ?
* 10.200.1.0/24 10.200.1.1 1 0 66000 ?
*> 0.0.0.0 1 32768 ?
*> 10.250.1.0/24 10.200.1.1 1 0 66000 ?
*= 15.10.0.0/24 10.50.1.11 0 0 65600 ?
*> 10.50.1.12 0 0 65600 ?
* 192.168.1.0/24 10.200.1.1 1 0 66000 ?
*> 0.0.0.0 1 32768 ?
* 192.168.10.0/24 10.200.1.1 1 0 66000 ?
*> 0.0.0.0 1 32768 ?
*> 192.168.15.0/24 10.200.1.1 0 66000 65200 ?
If I run get bgp neighbor <Neighbour IP? advertised-routes I can see the routes my first Edge Node is pushing to the neighbour.
Mul-EN01(tier0_sr)> get bgp neighbor 10.50.1.2 advertised-routes
BGP IPv4 table version is 23965
Local router ID is 10.50.1.11
Status flags: > - best, I - internal
Origin flags: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
> 10.2.0.0/24 0.0.0.0 0 100 32768 ?
> 10.20.0.0/24 0.0.0.0 0 100 32768 ?
> 10.50.1.0/24 0.0.0.0 0 100 32768 ?
> 10.55.1.0/24 0.0.0.0 0 100 32768 ?
> 10.100.1.0/24 0.0.0.0 0 100 32768 ?
> 10.100.2.0/24 0.0.0.0 0 100 32768 ?
> 10.100.3.0/24 0.0.0.0 0 100 32768 ?
> 10.170.1.0/24 0.0.0.0 0 100 0 65100 ?
> 10.250.1.0/24 0.0.0.0 0 100 0 65100 ?
> 15.10.0.0/24 0.0.0.0 0 100 32768 ?
The 10.100.x.x networks are the ones we are interested in here as they are the main connected networks where my 3 tier app is connected to the T1.
If I take a look at the core router I can also see the networks we want to look at this time the next hop are the ToRs
Network Next Hop Metric LocPrf Weight Path
*> 0.0.0.0/0 0.0.0.0 0 32768 i
*> 10.0.1.0/24 10.250.1.2 0 65100 ?
* 10.200.1.2 1 0 65100 ?
*> 10.2.0.0/24 10.200.1.2 0 65100 65600 ?
*= 10.250.1.2 0 65100 65600 ?
*> 10.10.0.0/24 10.200.1.2 0 65100 65500 ?
*= 10.250.1.2 0 65100 65500 ?
*> 10.11.0.0/24 10.200.1.2 0 65100 65500 ?
*= 10.250.1.2 0 65100 65500 ?
*> 10.20.0.0/24 10.200.1.2 0 65100 65600 ?
*= 10.250.1.2 0 65100 65600 ?
*> 10.30.0.0/24 10.200.1.2 0 65100 65500 ?
*= 10.250.1.2 0 65100 65500 ?
* 10.50.1.0/24 10.250.1.2 0 65100 ?
*= 10.200.1.3 1 0 65200 ?
*> 10.200.1.2 1 0 65100 ?
* 10.55.1.0/24 10.200.1.2 0 65100 65600 ?
*= 10.250.1.3 1 0 65200 ?
*> 10.250.1.2 1 0 65100 ?
*> 10.100.1.0/24 10.200.1.2 0 65100 65600 ?
*= 10.250.1.2 0 65100 65600 ?
*> 10.100.2.0/24 10.200.1.2 0 65100 65600 ?
*= 10.250.1.2 0 65100 65600 ?
*> 10.100.3.0/24 10.200.1.2 0 65100 65600 ?
*= 10.250.1.2 0 65100 65600 ?
Dual AZ config
Lets add the neighbours for the ToR in Site B. shown here in Orange highlight.
If we check the Core switch we can see the 10.100.x.x networks however notice that there are now four paths.
Two paths are via the ToRs is Site A the other two are via the ToRs in Site B so the core switch can see multiple paths to those networks over the ToR switches in both sites so we need a way to control the traffic flow to the preferred site.
Network Next Hop Metric LocPrf Weight Path
*> 0.0.0.0/0 0.0.0.0 0 32768 i
* 10.50.1.0/24 10.250.1.2 0 65100 65600 ?
*> 10.200.1.2 1 0 65100 ?
* 10.250.1.3 0 65200 65600 ?
*= 10.200.1.3 1 0 65200 ?
*= 10.55.1.0/24 10.250.1.2 1 0 65100 ?
* 10.200.1.2 0 65100 65600 ?
*> 10.250.1.3 1 0 65200 ?
* 10.200.1.3 0 65200 65600 ?
*= 10.100.1.0/24 10.250.1.2 0 65100 65600 ?
*> 10.200.1.2 0 65100 65600 ?
*= 10.250.1.3 0 65200 65600 ?
*= 10.200.1.3 0 65200 65600 ?
*= 10.100.2.0/24 10.250.1.2 0 65100 65600 ?
*> 10.200.1.2 0 65100 65600 ?
*= 10.250.1.3 0 65200 65600 ?
*= 10.200.1.3 0 65200 65600 ?
*= 10.100.3.0/24 10.250.1.2 0 65100 65600 ?
*> 10.200.1.2 0 65100 65600 ?
*= 10.250.1.3 0 65200 65600 ?
*= 10.200.1.3 0 65200 65600 ?
Preferred Site Traffic Steering Config
We’ll now configure the settings needed to steer the traffic the way we want.
IP Prefix Configuration
Navigate to Networking > Tier-0 Gateways now Edit your T0 by clicking the ellipsis, expand Routing, and then click on the link in IP Prefix Lists.
Click on Add IP PREFIX LIST.
Enter Any as the Prefix Name then click Set
In the Set prefixes dialog box, click ADD PREFIX and configure the following Under Network enter any set the Action as Permit. Click ADD
Click APPLY
Click SAVE then click ADD IP PREFIX LIST again.
This time set the Name to Default Route and click Set under Prefixes
Click ADD PREFIX and under Network enter 0.0.0.0/0 with the Action as Permit then click ADD
Now click APPLY
Finally SAVE and CLOSE
Creating Route Maps in the Tier-0 Gateway
The next step is to create some route maps to define which routes are redistributed in the domain using the prefix lists we just created. first off lets do the inbound route map for traffic coming into AZ 2 our Site B in this case.
Under the ROUTING section on the T0 click Set next to Route Maps
Click ADD ROUTE MAP give it a Name i.e. rm-in-az2 and click Set
Click ADD MATCH CRITERIA the Type should be IP Prefix click Set under Members
Select Default Route and click SAVE
Set the Local Preference to 80 and the Action to PERMIT click ADD
Click ADD MATCH CRITERIA again
Type should be IP Prefix. Click Set
Select Any and click SAVE
Set the Local Preference to 90 and the Action to PERMIT click ADD
Now click APPLY
Now click SAVE. Noe we will configure our outbound route map so click ADD ROUTE MAP again.
Give it a name i.e. rm-out-az2 and under Match Criteria click Set
Click ADD MATCH CRITERIA Type should be IP Prefix under Members click Set
Select the Any and click SAVE
Enter a value for the As Path Prepend I have just used the BGP ASN number for the T0, set the Local Preference to 100 and the Action to PERMIT click ADD
Click APPLY
Click SAVE and then CLOSE
Apply the Route maps
Now that we have our route maps we need to apply them to the relevant BGP neighbors in AZ2. To do this we must edit the BGP peer so navigate to the BGP section of the T0 config and under BGP Neighbors click the link.
Select the first AZ2 neighbor and click the Ellipsis and select Edit
In the Route Filter section click the link, if you are setting up a new neighbor this will be names Set otherwise it is 1 by default as the default filter is applied when none are selected during neighbor configuration.
If you are setting up a new neighbor then click ADD ROUTE FILTER if you already have an existing one then click the ellipsis and select Edit. The IP Address Family should be IPv4 it should be Enabled Under the Out Filter click Configure.
Select the out route map we created earlier and click SAVE
Now click on the Configure link under the In Filter.
This time select the in route map we created earlier and click SAVE
Click ADD and then APPLY and then SAVE on the neighbor edit.
A quick check on our core router and we can see the change so far. Note the difference between the the top two rows 10.200.1.3 and 10.250.1.3 these are the AZ2 ToRs the first one has an additional BGP ASN appended to it
Network Next Hop Metric LocPrf Weight Path
*> 0.0.0.0/0 0.0.0.0 0 32768 i
10.100.1.0/24 10.200.1.3 0 65200 65600 65600 ?
*= 10.250.1.3 0 65200 65600 ?
*> 10.200.1.2 0 65100 65600 ?
*= 10.250.1.2 0 65100 65600 ?
Now repeat the process for the second AZ2 neighbor.
We can now see the path prepending on the two AZ2 ToRs.
Network Next Hop Metric LocPrf Weight Path
*> 0.0.0.0/0 0.0.0.0 0 32768 i
* 10.100.1.0/24 10.200.1.3 0 65200 65600 65600 ?
* 10.250.1.3 0 65200 65600 65600 ?
*> 10.200.1.2 0 65100 65600 ?
*= 10.250.1.2 0 65100 65600 ?
* 10.100.2.0/24 10.200.1.3 0 65200 65600 65600 ?
* 10.250.1.3 0 65200 65600 65600 ?
*> 10.200.1.2 0 65100 65600 ?
*= 10.250.1.2 0 65100 65600 ?
* 10.100.3.0/24 10.200.1.3 0 65200 65600 65600 ?
* 10.250.1.3 0 65200 65600 65600 ?
*> 10.200.1.2 0 65100 65600 ?
*= 10.250.1.2 0 65100 65600 ?
Failover testing
After all that we are ready to do some failover testing to make sure it works!
First off a simple ping test to the backend server.
64 bytes from 10.100.1.11: icmp_seq=8 ttl=61 time=1.922 ms
64 bytes from 10.100.1.11: icmp_seq=9 ttl=61 time=1.998 ms
64 bytes from 10.100.1.11: icmp_seq=10 ttl=61 time=1.999 ms
64 bytes from 10.100.1.11: icmp_seq=12 ttl=61 time=1.933 ms
Now a traceroute
traceroute to 10.100.1.11 (10.100.1.11), 64 hops max, 52 byte packets
1 192.168.10.1 (192.168.10.1) 3.804 ms 2.138 ms 2.111 ms
2 192.168.10.252 (192.168.10.252) 0.506 ms 0.515 ms 0.490 ms
3 10.200.1.2 (10.200.1.2) 0.635 ms 0.634 ms 0.572 ms
4 10.50.1.11 (10.50.1.11) 1.112 ms 0.997 ms 0.788 ms
5 100.64.96.1 (100.64.96.1) 0.894 ms 0.925 ms 0.769 ms
6 10.100.1.11 (10.100.1.11) 1.522 ms 1.379 ms 1.526 ms
Hop 1 is my physical switch where my laptop is connected, hop 2 is the core virtual router, hop 3 is the ToR in AZ1, hop 4 is the Edge01 uplink 01, hop 5 is the T0-T1 interface and then hop 6 is the endpoint VM. All good so far.
Next I’ll power off the two ToR’s in AZ1 while running a continuous ping.
No pings were dropped hence no screen grab 😉 Lets run the traceroute again.
traceroute to 10.100.1.11 (10.100.1.11), 64 hops max, 52 byte packets
1 192.168.10.1 (192.168.10.1) 2.742 ms 2.854 ms 2.082 ms
2 192.168.10.252 (192.168.10.252) 0.539 ms 0.535 ms 0.538 ms
3 10.200.1.3 (10.200.1.3) 0.568 ms 0.826 ms 0.655 ms
4 10.50.1.11 (10.50.1.11) 0.988 ms 0.984 ms 0.911 ms
5 100.64.96.1 (100.64.96.1) 0.881 ms 0.940 ms 0.915 ms
6 10.100.1.11 (10.100.1.11) 1.647 ms 1.536 ms 1.652 ms
All looks the same only this time hop 3 is the ToR in AZ2. if we take a look at the routes on the core router here is what we now have. The only routes remaining are that of the AZ2 ToRs which is clearly visible as those are the ones with the additional ASN in the path Prepend
Network Next Hop Metric LocPrf Weight Path
*> 0.0.0.0/0 0.0.0.0 0 32768 i
*> 10.100.1.0/24 10.200.1.3 0 65200 65600 65600 ?
*= 10.250.1.3 0 65200 65600 65600 ?
*> 10.100.2.0/24 10.200.1.3 0 65200 65600 65600 ?
*= 10.250.1.3 0 65200 65600 65600 ?
*> 10.100.3.0/24 10.200.1.3 0 65200 65600 65600 ?
*= 10.250.1.3 0 65200 65600 65600 ?
Powering the ToRs back on again brings the original routes back and traffic then switches back to routing via AZ1.
While we have simulated a site failure by killing the AZ1 ToRs the result is the same as a real site failure. In a real site failure the NSX-T managers and the Edge nodes will move to the AZ2 hosts via vSphere HA, power back on, the Edges will re-peer with the AZ2 ToR’s and routing will work.
Once AZ1 is backup the DRS rules will migrate the managers and Edge nodes back to the AZ1 hosts and routing will return to be via AZ1 ToRs.
Thats all there is to it a nice clean configuration the only downside to the HA multisite method is that all traffic will egress via a single site as it the case for all the multisite options, but there is also some downtime after a failure while the vCenter, NSX-T Managers and Edge nodes are failed over and powered back on. This is al ot less than the manual failover method but more than the automated method so all are still viable multisite options you just have to decide which is best for your needs.