NSX-T Lab: Deploy Additional Manager and form a cluster

Intro

Welcome to Part 4 of the NSX-T Lab Series. In the previous post, we covered adding the vCenter as a compute manager.
In this post we’ll be adding an additional NSX-T manager to form a cluster and setting the VIP used for load balancing, and for good measure I’ll cover removing said cluster as well 🙂

Why do we need to form a cluster?

For production deployments a cluster of NSX managers is a must and three is recommended and supported amount to have.
Not only does having a cluster make the manager highly available but the managers now include the controllers and as such you need three.
There is currently a bit of confusion on what is required especially for a lab build. This extract is from the NSX-T 2.4 installation guide.

  • As of NSX-T Data Center 2.4, the NSX Manager contains the NSX Central Control Plane process. This service is critical for the operation of NSX. If there is a complete loss of NSX Managers, or if the cluster is reduced from three NSX Managers to one NSX Manager, you will not be able to make topology changes to your environment, and vMotion of machines depending on NSX will fail.
  • For lab and proof-of-concept deployments where there are no production workloads, you can run a single NSX Manager to save resources. NSX Manager nodes can be deployed on either ESXi or KVM. However, mixed deployments of managers on both ESXi and KVM are not supported.

So for lab deployments it states that one manager is fine but that also means only 1 controller and as it states in the first bullet point you need more than one to make topology changes and for vMotion to work.
In my lab I’ve been running with just one manager and thus one controller and I’ve not had any issues so far.
There is also no way for the system to know if it’s a lab or production so again confusing statement on whether or not it goes into read only mode with just one manager/controller.
My gut reaction is that it doesn’t, unlike NSX-V where for a lab you needed a minimum of two controllers in your lab. I can only assume that in the lab build you will only deploy 1 Manager and so it functions as expected. if you deploy a cluster and two of the Managers go down then you run into the topology change issue but this is just me speculating.
Take it how you will but remember for a production build you need to deploy three managers.

Point to note

As per VMware KB article https://kb.vmware.com/s/article/66796

– Vmkernel Interfaces such as Management, vMotion connected to N-VDS, go into blocking state causing host to show disconnected in vSphere Client and the hosts are unmanageable after a 24 hour countdown.

By design if the host looses access to the control plane a 24 hour countdown starts, after 24 hours the ports are put into a blocking state.
In my lab I’m not currently placing VMK’s on an N-VDS so I’ve not experienced the issue but it’s something to be aware of and another reason to make sure in production you have multiple Manager/Controllers deployed!

The build

So lets get started, from the NSX console we go to ‘System’ and after ensuring that our current NSX manager is up and stable (albeit resource constrained) we hit the ‘Add Nodes’ link.

From here we select the compute manager and enable or not SSH and root access then set the complex passwords and DNS server details.

One thing to note is that regardless of what passwords you set they will be replaced by the current NSX managers credentials which makes sense since they are a cluster.

Another thing to note is that we only have a choice of Small, Medium and Large deployments, again this is not unexpected since the Extra Small deployment that we used in the lab is not intended to be used in production or for anything other than the cloud manager so we’ll select Small.

Give the manager a name and set the locations for the VM select the network and provide an IP and gateway. in order for the cluster to form the IP’s of the NSX managers must reside within the same Layer 2 network.

The deployment will start and NSX will deploy the OVA via the vCenter.

Once complete we’ll see the cluster is stable the connectivity is up and sync is completed.

If we go back to the home page and the select ‘System’ from the Dashboards dropdown we can take a different look at the cluster and fabric status.

Back on the System Overview page the next stage for a production deployment is to add an additional Manager node so go ahead and repeat the previous steps and come back here when you have three managers deployed and ready.

Now that we have created a highly available cluster, we need to configure a way to access the system in a distributed fashion and so we can configure a cluster VIP giving us a single point of access. There are a few different ways to configure the VIP and I’ll cover those in a separate post another time. For now we’ll utilise the NSX-T built in VIP which is VMwares recommended approach anyway.

From the Overview page click on ‘Edit’ next to Virtual IP.

Set the IP for the VIP.

You can now test access by connecting to the VIP IP.
As you can see now that the second manager has been up for a while the resource usage of the original manger have come down as they are spreading the load across them.

We can also verify the status of the cluster from the command line by SSHing to the NSX manager and running the command ‘get cluster status’

NSX CLI (Manager, Policy, Controller 2.4.1.0.0.13716579). Press ? for command list or enter: help
NSXTMan01> get cluster status
Cluster Id: ecaef6ae-f0e6-4900-b8fd-d2ed2184a64e
Group Type: DATASTORE
Group Status: STABLE

Members:
    UUID                                       FQDN                                       IP               STATUS          
    3ece2742-ca3a-9e61-d833-06077701b293       NSXTMan01                                  192.168.10.50    UP              
    c61fc8ee-6986-4b6c-b769-02083483febf       NSXTMan02                                  192.168.10.60    UP              

Group Type: CLUSTER_BOOT_MANAGER
Group Status: STABLE

Members:
    UUID                                       FQDN                                       IP               STATUS          
    3ece2742-ca3a-9e61-d833-06077701b293       NSXTMan01                                  192.168.10.50    UP              
    c61fc8ee-6986-4b6c-b769-02083483febf       NSXTMan02                                  192.168.10.60    UP              

Group Type: CONTROLLER
Group Status: STABLE

Members:
    UUID                                       FQDN                                       IP               STATUS          
    8fd960f5-1bf8-4b9e-805b-0e02822f6865       NSXTMan02                                  192.168.10.60    UP              
    6d15c565-f996-47ca-994a-18102ed7bb2d       NSXTMan01                                  192.168.10.50    UP              

Group Type: MANAGER
Group Status: STABLE

Members:
    UUID                                       FQDN                                       IP               STATUS          
    3ece2742-ca3a-9e61-d833-06077701b293       NSXTMan01                                  192.168.10.50    UP              
    c61fc8ee-6986-4b6c-b769-02083483febf       NSXTMan02                                  192.168.10.60    UP              

Group Type: POLICY
Group Status: STABLE

Members:
    UUID                                       FQDN                                       IP               STATUS          
    3ece2742-ca3a-9e61-d833-06077701b293       NSXTMan01                                  192.168.10.50    UP              
    c61fc8ee-6986-4b6c-b769-02083483febf       NSXTMan02                                  192.168.10.60    UP              

Group Type: HTTPS
Group Status: STABLE

Members:
    UUID                                       FQDN                                       IP               STATUS          
    3ece2742-ca3a-9e61-d833-06077701b293       NSXTMan01                                  192.168.10.50    UP              
    c61fc8ee-6986-4b6c-b769-02083483febf       NSXTMan02                                  192.168.10.60    UP

NSXTMan01> 

We can also see the cluster configuration by running the command ‘get cluster config’

NSXTMan01> get cluster config 
Cluster Id: ecaef6ae-f0e6-4900-b8fd-d2ed2184a64e
Cluster Configuration Version: 4
Number of nodes in the cluster: 2

Node UUID: 3ece2742-ca3a-9e61-d833-06077701b293
Node Status: JOINED
    ENTITY                               UUID                                       IP ADDRESS      PORT     FQDN                                      
    HTTPS                                464dd4a8-94c4-41e8-a7d1-b574a559f42c       192.168.10.50   443      NSXTMan01                                 
    CONTROLLER                           6d15c565-f996-47ca-994a-18102ed7bb2d       192.168.10.50   -        NSXTMan01                                 
    CLUSTER_BOOT_MANAGER                 ce464bae-60c6-492b-a0ea-d623fb8f4690       192.168.10.50   -        NSXTMan01                                 
    DATASTORE                            449461d3-9786-4263-8778-ee163c7b9c7c       192.168.10.50   9000     NSXTMan01                                 
    MANAGER                              37e1c92a-79f5-49ad-b388-b564da0f54d3       192.168.10.50   -        NSXTMan01                                 
    POLICY                               a4e64055-c9b0-4bf6-9d3e-57a7aec38282       192.168.10.50   -        NSXTMan01                                 

Node UUID: c61fc8ee-6986-4b6c-b769-02083483febf
Node Status: JOINED
    ENTITY                               UUID                                       IP ADDRESS      PORT     FQDN                                      
    HTTPS                                c8e3e88c-4b46-49c7-a190-1ac623940bd7       192.168.10.60   443      NSXTMan02                                 
    CONTROLLER                           8fd960f5-1bf8-4b9e-805b-0e02822f6865       192.168.10.60   -        NSXTMan02                                 
    CLUSTER_BOOT_MANAGER                 918e11f9-74de-4182-a899-79d00ea4ef49       192.168.10.60   -        NSXTMan02                                 
    DATASTORE                            e61894f4-4747-4aeb-ad4b-5697a819a122       192.168.10.60   9000     NSXTMan02                                 
    MANAGER                              b6f61067-d026-43d5-bf4b-e10705c361c9       192.168.10.60   -        NSXTMan02                                 
    POLICY                               232a89df-565e-476c-bf5b-15acdbbead67       192.168.10.60   -        NSXTMan02

NSXTMan01> 

Running ‘get services’ will display all services and their status.

NSXTMan01> get services
Service name:                  cluster_manager     
Service state:                 running             

Service name:                  cm-inventory        
Service state:                 running             

Service name:                  controller          
Service state:                 running             
Listen address:                                    

Service name:                  datastore           
Service state:                 running             

Service name:                  http                
Service state:                 running             
Session timeout:               1800                
Connection timeout:            30                  
Redirect host:                 (not configured)    
Client API rate limit:         100 requests/sec    
Client API concurrency limit:  40                  
Global API concurrency limit:  199                 

Service name:                  install-upgrade     
Service state:                 running             
Enabled on:                    192.168.10.50       

Service name:                  liagent             
Service state:                 stopped             

Service name:                  manager             
Service state:                 running             
Logging level:                 info                

Service name:                  mgmt-plane-bus      
Service state:                 running             

Service name:                  migration-coordinator
Service state:                 stopped             

Service name:                  node-mgmt           
Service state:                 running             

Truncated list       

Removing cluster node and VIP

In order to remove the VIP you’ll need to be logged into an NSX manager directly and not via the VIP, as you can see in the previous screenshot the Edit and Reset options next to the VIP are greyed out.
When we login to the NSX manager directly we have the option to make changes. so we will select to Reset.

Confirm the reset and it will remove the VIP.

Next we’ll remove the NSX manager we just deployed, in the corner there is a little cog icon click this and select Delete.

Confirm that you indeed want to delete the Node.

After a while the node will be removed. at thats all there is to it.

Onto the next stage where we will be creating a TEP IP Pool.
NSX-T Lab Part:5 NSX-T Create IP Pool

Leave a Reply

Your email address will not be published. Required fields are marked *