Intro
Welcome to Part 15 of the NSX-V Lab Series. In the previous post, we deployed our ESG’s.
In this post we will configure our Edge Services Gateways and setup dynamic routing between the DLR and ESGs and to our simulated physical routers. This is a big post so get ready!
The Build
As mentioned in the previous post I’m using Sophos XG virtual routers to simulate the physical routers. As this lab is nested and I only have a single Cisco 3750 physical Switch which cannot do BGP I needed a way to have virtual routers in the lab. Also to match a normal customer deployment I need two virtual routers to allow me to pair each Edge node uplink with a separate router for ECMP.
I have created VLANs on the routers.
For site A
Router 1 will host VLAN 60 while Router 2 will host VLAN 70.
Each VLAN will be assigned the relevant gateway IP 10.60.1.1/24 for VLAN 60 and 10.70.1.1/24 for VLAN 70
For site B
Router 1 will host VLAN 65 while Router 2 will host VLAN 75.
Each VLAN will be assigned the relevant gateway IP 10.65.1.1/24 for VLAN 65 and 10.75.1.1/24 for VLAN 75
Site A Router 1
Site B Router 1
Next I need to configure the BGP Router ID and Local AS for each router.
As I am using the same Routers for my NSX-T lab I have set the Router ID to that of the NSX-T VLAN 160 but if you are just using the router for NSX-V you can just use the VLAN 60 IP.
For Site A Router 1 I set the ID to 10.160.1.1 and the local AS to 65000.
For Site B Router 1 I set the ID to 10.165.1.1 and the local AS to 65000.
We’ll come back to the Sophos XG’s later when we add the BGP neighbours i.e. the Edge VMs and confirm we are getting routes.
The first step we need to do is to disable the Firewall on all our ESGs, since we are using ECMP Edges this is a requirement.
In order to disable the firewall we must login to the vSphere client using the Flex (flash) client as the tab for the firewall is not on the HTML 5 interface.
Double click on each ESG then go to the ‘Manage’ tab then ‘Firewall’ and click the ‘Stop’ button.
Click ‘Publish Changes’ Repeat for all ESGs in all sites.
A point to note here when you have a primary and secondary site setup is to make sure when you are editing settings that you are on the correct site.
For example if I change the NSX Manager drop down to be the secondary and then edit an ESG. when I click the back button it will set the interface back to the Primary NSX Manager again so I then need to change it back to secondary in order to edit the other ESG. I’ve seen many customers make errors thinking they were configuring the secondary Edges when if fact the interface switched back to primary and they didn’t realize.
The rest of the configuration can be done in either the Flex or HTML 5 client.
On the first ESG go to the ‘Routing’ tab and click the Start button next to ECMP. Next click the ‘Edit’ link next to Dynamic Routing Configuration
Click Save to set the Router ID.
Now click Publish to save the changes.
Change to the BGP window and click on ‘EDIT’
Change the Status to ‘Enable’ Set Graceful Restart to ‘Disable’ and set the Local AS to 65150
Graceful restart should be Disabled for ECMP Edges to prevent traffic from being black holed to a dead edge in the case of an Edge failure. By disabling it the forwarding state for the Edge is removed and thus stops traffic being set to it.
Once set click ‘SAVE’ The Publish Changes
click on ‘+ ADD’ to add our first BGP neighbor.
Enter the IP address of Router 1 VLAN 60, the remote AS is 65000.
Leave all other settings at the default. Click ‘ADD’ then Publish.
For a production deployment we would normally tweak the keep alive and hold down timers and reduce them for faster convergence times, but my virtual Sophos XG routers cannot change these settings (at least if they can I’ve not figured out how to yet).
Repeat for the Router 2 Neighbor.
Now we’ll add our DLR as a neighbor.
Enter the DLR Protocol Address in the IP Address field. The Protocol Address needs to be an unused IP on the same subnet as the Forwarding address which is the Uplink interface IP.
Use the same Remote AS number as the ESG 65150
Next we need to configure route redistribution.
Change to the Route Redistribution page and Enable BGP then hit ‘Publish’
Under ‘Route Redistribution Table’ Click the ‘+ ADD’ link.
Set the Learner Protocol to BGP and tick Static Routes and Connected then Click ‘ADD’
Then Publish.
Repeat for all ESGs.
For the ESGs on Site B you may run into the Route redistribution greyed out issue if you are using 6.4.5 Follow This blog post for details on how to get around the issue.
Next we will configure routing on our DLR and Peer with our ESGs.
From the Primary NSX manager open up the DLR configuration screen and go to Routing and Global Configuration.
Under Routing Configuration click Edit.
Enable ECMP and click ‘SAVE’
Then under Dynamic Routing Configuration click ‘Edit’
Leave the default Uplink-To-ESG as the router ID and click ‘SAVE’
Now click ‘Publish’
Next go to BGP and click ‘EDIT’ under the Configuration section.
Set the Status to Enable, change the Graceful Restart to Disable and set the Local AS to 65150.
Click ‘SAVE’
Before we add any neighbors lets configure the route redistribution. You don’t need to do that before adding the neighbors but for the purpose of this demonstration I’ll add one neighbor at a time to show the change to the routes and neighbor peers on the ESG, DLR and Routers so we need to have the redistribution working first.
Switch to the Route redistribution screen, if like mine OSPF is enabled already we need to change it, if yours is empty you can just go ahead and add the BGP configuration.
For me I’ll select the OSPF entry in the table and click ‘Edit’
Now I just change the Learner Protocol to BGP and click ‘SAVE’
You can enable Static Routes as well but I’ll only add those to the ESGs so its not needed here.
Now set OSPF to Disabled and Enable BGP then click ‘Publish’
Switch back to the BGP page and under Neighbors click ‘+ ADD’
Select the Uplink-To-ESG interface, this will auto complete them Forwarding IP address. Enter the IP of the neighbor, I’ll start off with the Site A ESG01.
Enter a Protocol IP this needs to be on the same network as the Forwarding Address. Enter the Remote AS and leave all the other settings default and click ‘ADD’ then Publish.
OK lets jump on the DLR console and see what effect that had.
If we go to the Summary screen we can see that the 0 Edge is the active one.
Login and run show ip bgp neighbors summary
We can see the neighbor 10.56.1.11 with a status of E for Established.
Run the same command on the ESG01 and we see three neighbors.
10.56.1.2 is the DLR and is Established.
10.60.1.1 and 10.70.1.1 are the Sophos XG routers with a status of C and A as they are not yet setup so have not established the connection.
If we run show ip route we can see the three test networks we created earlier with a B for BGP.
We now repeat the process to add the second site A ESG.
For the ESG’s in Site B we need to set a different Weight this will make them the less preferred route out as such traffic will egress via the active site A.
to do that we set the Weight to 30.
Repeat for Site B ESG02
The next step is to add the ESGs as neighbors to our Sophos XG routers.
This needs to be done on each XG router.
First I login to Site A XG router 1 and add NSXV-ESG01 as a neighbor.
Going to the Information page and clicking on Summary show that I have peered with the ESG.
10.160.1.11 and 10.160.1.12 are my NSX-T routers, we are only interested in the top row.
BGP router identifier 10.160.1.1, local AS number 65000 RIB entries 13, using 832 bytes of memory Peers 3, using 7452 bytes of memory Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 10.60.1.11 4 65150 37 36 0 0 0 00:00:09 6 10.160.1.11 4 65100 0 0 0 0 0 never Active 10.160.1.12 4 65100 0 0 0 0 0 never Active Total number of neighbors 3
If I check the routes on the XG router I can see my test app networks.
BGP table version is 0, local router ID is 10.160.1.1 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale, R Removed Origin codes: i - IGP, e - EGP, ? - incomplete Network Next Hop Metric LocPrf Weight Path *> 10.10.1.0/24 10.60.1.11 0 65150 ? *> 10.10.2.0/24 10.60.1.11 0 65150 ? *> 10.10.3.0/24 10.60.1.11 0 65150 ? *> 10.56.1.0/28 10.60.1.11 0 65150 ? *> 10.60.1.0/24 10.60.1.11 0 65150 ? *> 10.70.1.0/24 10.60.1.11 0 65150 ? *> 192.168.88.0 0.0.0.0 0 32768 i Total number of prefixes 7
NSXV-ESG01 now shows two connected neighbors the DLR and the XG router.
Now for Site A XG router 2.
BGP router identifier 10.170.1.1, local AS number 65000 RIB entries 13, using 832 bytes of memory Peers 3, using 7452 bytes of memory Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 10.70.1.11 4 65150 3 4 0 0 0 00:00:24 6 10.170.1.11 4 65100 0 0 0 0 0 never Active 10.170.1.12 4 65100 0 0 0 0 0 never Active Total number of neighbors 3
Success we now have the DLR and the two XG routers peered and exchanging routes
The 192.168.88.0 routes are my home LAN which the ESG is now getting a route to.
I now repeat the process to add the second Site A ESG.
BGP router identifier 10.160.1.1, local AS number 65000 RIB entries 13, using 832 bytes of memory Peers 4, using 9936 bytes of memory Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 10.60.1.11 4 65150 46 44 0 0 0 00:08:39 6 10.60.1.12 4 65150 3 5 0 0 0 00:00:12 6 10.160.1.11 4 65100 0 0 0 0 0 never Active 10.160.1.12 4 65100 0 0 0 0 0 never Active Total number of neighbors 4
BGP table version is 0, local router ID is 10.160.1.1 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale, R Removed Origin codes: i - IGP, e - EGP, ? - incomplete Network Next Hop Metric LocPrf Weight Path * 10.10.1.0/24 10.60.1.12 0 65150 ? *> 10.60.1.11 0 65150 ? * 10.10.2.0/24 10.60.1.12 0 65150 ? *> 10.60.1.11 0 65150 ? * 10.10.3.0/24 10.60.1.12 0 65150 ? *> 10.60.1.11 0 65150 ? * 10.56.1.0/28 10.60.1.12 0 65150 ? *> 10.60.1.11 0 65150 ? * 10.60.1.0/24 10.60.1.12 0 65150 ? *> 10.60.1.11 0 65150 ? * 10.70.1.0/24 10.60.1.12 0 65150 ? *> 10.60.1.11 0 65150 ? *> 192.168.88.0 0.0.0.0 0 32768 i Total number of prefixes 7
And then do the same thing on my two Site B XG routers adding the Site B ESGs.
BGP router identifier 10.165.1.1, local AS number 65000 RIB entries 13, using 832 bytes of memory Peers 2, using 4968 bytes of memory Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 10.65.1.11 4 65150 31 39 0 0 0 00:13:54 6 10.65.1.12 4 65150 3 5 0 0 0 00:00:06 6 Total number of neighbors 2
BGP table version is 0, local router ID is 10.165.1.1 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale, R Removed Origin codes: i - IGP, e - EGP, ? - incomplete Network Next Hop Metric LocPrf Weight Path * 10.10.1.0/24 10.65.1.12 0 65150 ? *> 10.65.1.11 0 65150 ? * 10.10.2.0/24 10.65.1.12 0 65150 ? *> 10.65.1.11 0 65150 ? * 10.10.3.0/24 10.65.1.12 0 65150 ? *> 10.65.1.11 0 65150 ? * 10.56.1.0/28 10.65.1.12 0 65150 ? *> 10.65.1.11 0 65150 ? * 10.65.1.0/24 10.65.1.12 0 65150 ? *> 10.65.1.11 0 65150 ? * 10.75.1.0/24 10.65.1.12 0 65150 ? *> 10.65.1.11 0 65150 ? *> 192.168.88.0 0.0.0.0 0 32768 i Total number of prefixes 7
Next to test my Testapp I have migrated Web02 and App02 from my Primary Site A to the Secondary Site B.
The nested hosts are also running on separate physical hosts so the traffic is having to hit the physical switch.
A ping from each server to the other confirms the site to site connectivity.
As a final test I want to ensure I can reach my test app from an external PC.
I have a Computer attached to my local home LAN, the virtual routers also have a network adapter in this local LAN.
On the external PC I need to add four static summary routes to cover my three test networks.
I add the following
route add 10.10.0.0 MASK 255.255.252.0 192.168.88.80
route add 10.10.0.0 MASK 255.255.252.0 192.168.88.90
route add 10.10.0.0 MASK 255.255.252.0 192.168.88.85
route add 10.10.0.0 MASK 255.255.252.0 192.168.88.95
192.168.88.80 is the home LAN IP for Site A Router 1 192.168.88.90 is Site A Router 2.
192.168.88.85 is the home LAN IP for Site B Router 1 192.168.88.95 is Site B Router 2.
These cover my three test app networks which are 10.10.1.0/24 10.10.2.0/24 and 10.10.3.0/24 and the four routes go one to each of the Routers.
In normal conditions they tend to use .80 but shutting down that router will cause the traffic to flick to the other one.
So let’s test it, first a telnet to my web server
C:\WINDOWS\system32>tracert 10.10.1.11
Tracing route to 10.10.1.11 over a maximum of 30 hops
1 8 ms 2 ms 1 ms 192.168.88.80
2 4 ms 2 ms 1 ms 10.60.1.11
3 12 ms 3 ms 5 ms 10.56.1.1
4 8 ms 2 ms 2 ms 10.10.1.11
Trace complete.
So the first hop is to 192.168.88.80 which is the LAN IP for Site A Router 1
Next hop is 10.60.1.11 which is the Uplink interface A of NSXV-ESG01
Then 10.56.1.1 is the uplink interface of the DLR
Finally we hit 10.10.1.11 which is the web server.
If I ping the App server.
C:\WINDOWS\system32>tracert 10.10.2.11
Tracing route to 10.10.2.11 over a maximum of 30 hops
1 8 ms 3 ms 2 ms 192.168.88.85
2 10 ms 2 ms 3 ms 10.65.1.11
3 12 ms 2 ms 2 ms 10.56.1.1
4 5 ms 2 ms 3 ms 10.10.2.11
Trace complete.
The first hop is to 192.168.88.85 which is the LAN IP for Site B Router 1
I’ve not configured a way to control Ingress traffic which is something you would need to do in a real deployment as you’d want the traffic to hit the same site that the VM was located on.
Next hop is 10.65.1.11 which is the Uplink interface A of SiteB-NSXV-ESG01
Then 10.56.1.1 is the uplink interface of the DLR
Finally we hit 10.10.2.11 which is the App server.
DB server is the same as the App server trace route.
If I now power off ESG01 1 and run the trace route again I get the following.
WEB
C:\WINDOWS\system32>tracert 10.10.1.11
Tracing route to 10.10.1.11 over a maximum of 30 hops
1 9 ms 2 ms 2 ms 192.168.88.80
2 3 ms 2 ms 2 ms 10.60.1.12
3 12 ms 2 ms 3 ms 10.56.1.1
4 9 ms 3 ms 2 ms 10.10.1.11
Trace complete.
APP
C:\WINDOWS\system32>tracert 10.10.2.11
Tracing route to 10.10.2.11 over a maximum of 30 hops
1 23 ms 1 ms 1 ms 192.168.88.85
2 16 ms 2 ms 1 ms 10.65.1.12
3 10 ms 2 ms 2 ms 10.56.1.1
4 4 ms 2 ms 3 ms 10.10.2.11
Trace complete.
Notice the changes hop two is now hitting 10.60.1.12 for web which is Uplink interface A of Site A NSXV-ESG02
For App hop 2 is now hitting 10.65.1.12 which is the Uplink interface A of SiteB-NSXV-ESG02
The rest of the hops are the same.
Lets power back on the ESG01 Edges.
Now I want to test the egress traffic from my servers in each site.
first off Web01 which is in Site A.
192.168.88.102 is my external machine IP.
The first hop is the DLR Gateway IP for the web logical switch.
10.56.1.11 is the internal Interface on the Site A NSXV-ESG01 thats the interface that connects the ESG to the DLR.
10.60.1.1 is the VLAN interface on the Site A XG router 1
Finally then I hit my external machine.
Next I test my Web02 VM which is sitting in Site B.
The first hop as before is the DLR Gateway IP for the web logical switch.
10.56.1.12 is the internal Interface on the Site A NSXV-ESG02 thats the interface that connects the ESG to the DLR.
So you may ask, why if the VM is in Site B is this hop hitting the Site A ESG?
This is expected, remember we configured the DLR to have a site preference of Site A by setting the BGP weights to 60 for Site A and 30 for Site B and the higher weight is preferred so all egress traffic will normally leave via Site A Edges.
10.60.1.1 is the VLAN interface on the Site A XG router 1
Finally then I hit my external machine.
To prove that the egress will leave via Site A normally.
I can run it again this time I have shut down Site A ESG02, the traffic should now go out via 10.56.1.11 which is the Site A ESG01 Edge.
The first hop is the DLR Gateway IP for the web logical switch.
10.56.1.11 is the internal Interface on the Site A NSXV-ESG01 which is what we expected so its still using the BGP weight for the remaining Site A Edge as a preferred route.
10.70.1.1 is the VLAN interface on the Site A XG router 2
Then I hit my external machine.
Now if I power off the remaining Edge in Site A…
The first hop is the DLR Gateway IP for the web logical switch.
10.56.1.14 is the internal Interface on the Site B NSXV-ESG02 which is as we expected The DLR has flushed the routes for the Site A Edges and is now using the Site B Edges for Egress.
10.75.1.1 is the VLAN interface on the Site B XG router 2
Then I hit my external machine.
A quick check of the DLR routing table shows that it has two routes out via the Site B Edges 10.56.1.13 and 14
The final test is to power the Site A Edges back on to show that the traffic will revert back to using the Site A Edges as per the BGP weights.
The first hop is the DLR Gateway IP for the web logical switch.
10.56.1.11 is the internal Interface on the Site A NSXV-ESG01 so the traffic has reverted back to the preferred Site A
10.60.1.1 is the VLAN interface on the Site A XG router 1
Then I hit my external machine.
Final screenshot I promise.
A quick check of the DLR again.
Note that the two bottom routes have now changed to 10.56.1.11 and 12
So what happened to 13 and 14? They only show if 11 and 12 are down as they are less preferred routes and so do not show unless they are the active routes.
So there we have it dynamic routing is working and we can reach our test app from an external network ?
And that concludes my NSX-V Lab install series I hope you found it useful.
Thanks for reading