Welcome to Part 4 of the NSX-T Lab Series. In the previous post, we covered adding the vCenter as a compute manager.
In this post we’ll be adding an additional NSX-T manager to form a cluster and setting the VIP used for load balancing, and for good measure I’ll cover removing said cluster as well 🙂
Why do we need to form a cluster?
For production deployments a cluster of NSX managers is a must and three is recommended and supported amount to have.
Not only does having a cluster make the manager highly available but the managers now include the controllers and as such you need three.
There is currently a bit of confusion on what is required especially for a lab build. This extract is from the NSX-T 2.4 installation guide.
- As of NSX-T Data Center 2.4, the NSX Manager contains the NSX Central Control Plane process. This service is critical for the operation of NSX. If there is a complete loss of NSX Managers, or if the cluster is reduced from three NSX Managers to one NSX Manager, you will not be able to make topology changes to your environment, and vMotion of machines depending on NSX will fail.
- For lab and proof-of-concept deployments where there are no production workloads, you can run a single NSX Manager to save resources. NSX Manager nodes can be deployed on either ESXi or KVM. However, mixed deployments of managers on both ESXi and KVM are not supported.
So for lab deployments it states that one manager is fine but that also means only 1 controller and as it states in the first bullet point you need more than one to make topology changes and for vMotion to work.
In my lab I’ve been running with just one manager and thus one controller and I’ve not had any issues so far.
There is also no way for the system to know if it’s a lab or production so again confusing statement on whether or not it goes into read only mode with just one manager/controller.
My gut reaction is that it doesn’t, unlike NSX-V where for a lab you needed a minimum of two controllers in your lab. I can only assume that in the lab build you will only deploy 1 Manager and so it functions as expected. if you deploy a cluster and two of the Managers go down then you run into the topology change issue but this is just me speculating.
Take it how you will but remember for a production build you need to deploy three managers.
Point to note
As per VMware KB article https://kb.vmware.com/s/article/66796
– Vmkernel Interfaces such as Management, vMotion connected to N-VDS, go into blocking state causing host to show disconnected in vSphere Client and the hosts are unmanageable after a 24 hour countdown.
By design if the host looses access to the control plane a 24 hour countdown starts, after 24 hours the ports are put into a blocking state.
In my lab I’m not currently placing VMK’s on an N-VDS so I’ve not experienced the issue but it’s something to be aware of and another reason to make sure in production you have multiple Manager/Controllers deployed!
So lets get started, from the NSX console we go to ‘System’ and after ensuring that our current NSX manager is up and stable (albeit resource constrained) we hit the ‘Add Nodes’ link.
From here we select the compute manager and enable or not SSH and root access then set the complex passwords and DNS server details.
One thing to note is that regardless of what passwords you set they will be replaced by the current NSX managers credentials which makes sense since they are a cluster.
Another thing to note is that we only have a choice of Small, Medium and Large deployments, again this is not unexpected since the Extra Small deployment that we used in the lab is not intended to be used in production or for anything other than the cloud manager so we’ll select Small.
Give the manager a name and set the locations for the VM select the network and provide an IP and gateway. in order for the cluster to form the IP’s of the NSX managers must reside within the same Layer 2 network.
The deployment will start and NSX will deploy the OVA via the vCenter.
Once complete we’ll see the cluster is stable the connectivity is up and sync is completed.
If we go back to the home page and the select ‘System’ from the Dashboards dropdown we can take a different look at the cluster and fabric status.
Back on the System Overview page the next stage for a production deployment is to add an additional Manager node so go ahead and repeat the previous steps and come back here when you have three managers deployed and ready.
Now that we have created a highly available cluster, we need to configure a way to access the system in a distributed fashion and so we can configure a cluster VIP giving us a single point of access. There are a few different ways to configure the VIP and I’ll cover those in a separate post another time. For now we’ll utilise the NSX-T built in VIP which is VMwares recommended approach anyway.
From the Overview page click on ‘Edit’ next to Virtual IP.
Set the IP for the VIP.
You can now test access by connecting to the VIP IP.
As you can see now that the second manager has been up for a while the resource usage of the original manger have come down as they are spreading the load across them.
We can also verify the status of the cluster from the command line by SSHing to the NSX manager and running the command ‘get cluster status’
NSX CLI (Manager, Policy, Controller 188.8.131.52.0.13716579). Press ? for command list or enter: help NSXTMan01> get cluster status Cluster Id: ecaef6ae-f0e6-4900-b8fd-d2ed2184a64e Group Type: DATASTORE Group Status: STABLE Members: UUID FQDN IP STATUS 3ece2742-ca3a-9e61-d833-06077701b293 NSXTMan01 192.168.10.50 UP c61fc8ee-6986-4b6c-b769-02083483febf NSXTMan02 192.168.10.60 UP Group Type: CLUSTER_BOOT_MANAGER Group Status: STABLE Members: UUID FQDN IP STATUS 3ece2742-ca3a-9e61-d833-06077701b293 NSXTMan01 192.168.10.50 UP c61fc8ee-6986-4b6c-b769-02083483febf NSXTMan02 192.168.10.60 UP Group Type: CONTROLLER Group Status: STABLE Members: UUID FQDN IP STATUS 8fd960f5-1bf8-4b9e-805b-0e02822f6865 NSXTMan02 192.168.10.60 UP 6d15c565-f996-47ca-994a-18102ed7bb2d NSXTMan01 192.168.10.50 UP Group Type: MANAGER Group Status: STABLE Members: UUID FQDN IP STATUS 3ece2742-ca3a-9e61-d833-06077701b293 NSXTMan01 192.168.10.50 UP c61fc8ee-6986-4b6c-b769-02083483febf NSXTMan02 192.168.10.60 UP Group Type: POLICY Group Status: STABLE Members: UUID FQDN IP STATUS 3ece2742-ca3a-9e61-d833-06077701b293 NSXTMan01 192.168.10.50 UP c61fc8ee-6986-4b6c-b769-02083483febf NSXTMan02 192.168.10.60 UP Group Type: HTTPS Group Status: STABLE Members: UUID FQDN IP STATUS 3ece2742-ca3a-9e61-d833-06077701b293 NSXTMan01 192.168.10.50 UP c61fc8ee-6986-4b6c-b769-02083483febf NSXTMan02 192.168.10.60 UP NSXTMan01>
We can also see the cluster configuration by running the command ‘get cluster config’
NSXTMan01> get cluster config Cluster Id: ecaef6ae-f0e6-4900-b8fd-d2ed2184a64e Cluster Configuration Version: 4 Number of nodes in the cluster: 2 Node UUID: 3ece2742-ca3a-9e61-d833-06077701b293 Node Status: JOINED ENTITY UUID IP ADDRESS PORT FQDN HTTPS 464dd4a8-94c4-41e8-a7d1-b574a559f42c 192.168.10.50 443 NSXTMan01 CONTROLLER 6d15c565-f996-47ca-994a-18102ed7bb2d 192.168.10.50 - NSXTMan01 CLUSTER_BOOT_MANAGER ce464bae-60c6-492b-a0ea-d623fb8f4690 192.168.10.50 - NSXTMan01 DATASTORE 449461d3-9786-4263-8778-ee163c7b9c7c 192.168.10.50 9000 NSXTMan01 MANAGER 37e1c92a-79f5-49ad-b388-b564da0f54d3 192.168.10.50 - NSXTMan01 POLICY a4e64055-c9b0-4bf6-9d3e-57a7aec38282 192.168.10.50 - NSXTMan01 Node UUID: c61fc8ee-6986-4b6c-b769-02083483febf Node Status: JOINED ENTITY UUID IP ADDRESS PORT FQDN HTTPS c8e3e88c-4b46-49c7-a190-1ac623940bd7 192.168.10.60 443 NSXTMan02 CONTROLLER 8fd960f5-1bf8-4b9e-805b-0e02822f6865 192.168.10.60 - NSXTMan02 CLUSTER_BOOT_MANAGER 918e11f9-74de-4182-a899-79d00ea4ef49 192.168.10.60 - NSXTMan02 DATASTORE e61894f4-4747-4aeb-ad4b-5697a819a122 192.168.10.60 9000 NSXTMan02 MANAGER b6f61067-d026-43d5-bf4b-e10705c361c9 192.168.10.60 - NSXTMan02 POLICY 232a89df-565e-476c-bf5b-15acdbbead67 192.168.10.60 - NSXTMan02 NSXTMan01>
Running ‘get services’ will display all services and their status.
NSXTMan01> get services Service name: cluster_manager Service state: running Service name: cm-inventory Service state: running Service name: controller Service state: running Listen address: Service name: datastore Service state: running Service name: http Service state: running Session timeout: 1800 Connection timeout: 30 Redirect host: (not configured) Client API rate limit: 100 requests/sec Client API concurrency limit: 40 Global API concurrency limit: 199 Service name: install-upgrade Service state: running Enabled on: 192.168.10.50 Service name: liagent Service state: stopped Service name: manager Service state: running Logging level: info Service name: mgmt-plane-bus Service state: running Service name: migration-coordinator Service state: stopped Service name: node-mgmt Service state: running Truncated list
Removing cluster node and VIP
In order to remove the VIP you’ll need to be logged into an NSX manager directly and not via the VIP, as you can see in the previous screenshot the Edit and Reset options next to the VIP are greyed out.
When we login to the NSX manager directly we have the option to make changes. so we will select to Reset.
Confirm the reset and it will remove the VIP.
Next we’ll remove the NSX manager we just deployed, in the corner there is a little cog icon click this and select Delete.
Confirm that you indeed want to delete the Node.
After a while the node will be removed. at thats all there is to it.
Onto the next stage where we will be creating a TEP IP Pool.
NSX-T Lab Part:5 NSX-T Create IP Pool