Startup plan for vSphere clusters hosting a Kubernetes Supervisor

Starting up a vSphere cluster that hosts a Kubernetes Supervisor and workload cluster from cold, that was shutdown gracefully, might be useful to know. I created the instructions below to help with my recent Homelab datacenter move.

As I have mentioned before, I recently moved my Homelab from one datacenter to another. Having gracefully shutdown in the first datacenter, I needed to bring everything back up in the new datacenter. Given that I had previously enabled the Workload Management feature in vSphere and deployed not only a Kubernetes Supervisor, but also a Kubernetes workload cluster as well, meant that there are some requirements to satisfy for this to happen smoothly. Luckily the process isn’t quite as prescriptive as the shutdown.

Many of the applications hosted by Kubernetes have their ingress managed by VMware NSX Advanced Load Balancer and the entire Supervisor and workload cluster sits atop a series of VxLAN segments managed by VMware NSX.

Documentation does exist for the procedure, but I wanted to test it out before I needed to do it in anger. So, based on that documentation I created my own runbook and added some steps and clarification. I also disagree slightly with the order in the documentation. As I mentioned above, the Supervisor cluster sits on a VxLAN backed network segment that is managed by NSX. When vCenter is started, it will automatically power on the Supervisor cluster. However, without the NSX Edge nodes being turned on the Supervisor will have no way to communicate with vCenter. I’d argue therefore that the Edge nodes should be powered on before vCenter. This is what I have documented below and it differs from the product documentation above.

Update 25/03/2026: My lab was not running VCF at the time that I created the runbook, nor was VSAN in use. Although some of the steps are still relevant, the ordering has changed and there are some differences if using this for VCF 9 and with VSAN present.

Prerequisites

  • Power on all ESXi hosts.
  • Login to the UI of each ESXi host. Check the following:
  • Required datastores are mounted.
  • Physical NICs are up and show connectivity to the right networks. (If you can reach the ESXi UI then there’s a good chance that this is the case, but best to verify that!)
  • There are no new hardware alarms.
  • If DNS and NTP are external to this cluster then verify connectivity to them.
  • Locate the following virtual machines and record which hosts they are currently registered to:
    • Active Directory (DNS)
    • NTP
    • vCenter
    • NSX Edge nodes

Process

Take ESXi hosts out of Maintenance Mode

  1. Connect to each ESXi host and login as the root user.
  2. With the host selected in the UI, click the Actions button.
  3. Select the option to Exit maintenance mode.

Startup core services hosted by Virtual Machines

In my Homelab core services like Active Directory (DNS) and NTP are provided by virtual machines within the cluster. For anything much to work properly these need to be up first.

  1. Connect to each required ESXi host and login as the root user.
  2. Repeat for each Active Directory virtual machine:
    • Select Actions > Power > Power on.
  3. Repeat for each NTP virtual machine:
    • Select Actions > Power > Power on.

Note: It’s always wise before continuing to verify that these core services have come up cleanly and are reachable.

Startup NSX Edge nodes

As noted earlier, I beleive that the NSX Edge nodes should be up before vCenter attempts to start the Supervisor cluster. So, if they aren’t already started…

  1. Connect to each required ESXi host and login as the root user.
  2. Repeat for each NSX Edge node virtual machine:
    • Select Actions > Power > Power on.

Startup the NSX Manager server

  1. Connect to the required ESXi host(s) and login as the root user.
  2. Locate the NSX Manager virtual machine(s):
    • Select Actions > Power > Power on.

Startup the NSX Advanced Load Balancer controller server

  1. Connect to the required ESXi host(s) and login as the root user.
  2. Locate the NSX ALB controller virtual machine(s):
    • Select Actions > Power > Power on.

Startup the vCenter server

  1. Connect to the required ESXi host and login as the root user.
  2. Locate the vCenter Server virtual machine:
    • Select Actions > Power > Power on.

vCenter will take a few minutes to power on. Even before the UI is available you may see the “SupervisorControlPlaneVM” virtual machines get powered on.

As soon as the NSX ALB controller is able to communicate with vCenter then you will also see the Service Engines get powered on.

After a time, the Supervisor cluster will automatically power on workload cluster VMs. If you want to, watch the Virtual Services console in the NSX ALB interface and wait for the services to start coming up. It will take some time for them to turn “green” because the health metric is averaged over a period of time.

Start vSphere Cluster Services

Once the vCenter UI is available you should be nearly done. You can safely power on any VM still remaining. One final task though is to re-enable the vSphere Cluster Services.

  1. Login to vCenter.
  2. Select the vCenter object in the Hosts and Clusters inventory view.
  3. Select the Configure tab.
  4. Select Advanced Settings.
  5. Click Edit Settings.
  6. Locate the ‘config.vcls.clusters.domain-c(number).enabled’ properties. (Hint: Just filter the properties by “vcls”.)
  7. Set the value to ‘true’.
  8. Save the changes.

The vCLS VMs will be provisioned automatically. vMotion and other cluster-level services will now function as normal.

Summary

That was my runbook. It worked well in testing and very well on the day.

Photo by Gia Oris.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.