During my migration from vSphere 4.1 to vSphere 5.5, I ran into an issue I had never experienced before. When your VMware HA network is partitioned, vCenter will not let you perform VMotions. At first I was surprised but after some searching I learned the reasons behind it and it makes complete sense now.
I had a cluster running ESX 4.0/4.1 hosts with a very basic network config. Service Console was in VLAN100, VMotion in VLAN110 each in different IP ranges. The new ESXi 5.5 hosts would get a completely different network config. First vmkernel port in VLAN200, second vmkernel port in VLAN210 both enabled for management traffic. And there were two VMotion vmkernel ports in VLAN220. When adding these new hosts to the same clusters as the ESX 4.0/4.1 hosts, I received a number of HA error messages:
- The vSphere HA Agent on the host is alive and has management network connectivity, but the management network has been partitioned.
- This state is reported by a vSphere HA Master Agent that is in a partition other than the one containing this host.
- The vSphere HA protected VMs running on the host are monitored by one or more vSphere HA Master Agents, and the agents will attempt to restart the VMs after a failure.
This seemed logical since the old hosts had their Service Console interface in a different IP range than the new ESXi 5.5 hosts. Shouldn’t be a problem, since this situation would only last one day during working hours, while I was working on that cluster and my plan was to VMotion the VMs in the cluster from the old hosts to the new hosts. And that is where I was stopped. I was unable to free an 4.1 host because VMotion was not allowed to a new ESXi 5.5 host. Fortunately, KB 1033634 “vSphere HA and FT Error Messages” came to help and explained why I could not VMotion.
And as in many situations before, it is completely logical that VMware decided to treat a partition HA situation in this way. Taken from the KB Article: “The powered-on virtual machine you are attempting to migrate will not be protected by vSphere HA after the vMotion operation completes because the vSphere HA master agent is not currently responsible for it. vSphere HA will not restart the virtual machine if it subsequently fails. To restore vSphere HA protection, resolve any network partitions or disk accessibility issues.“.
Makes perfect sense and now you know too or even better, you already knew :-)
This issue has little to do with any restrictions imposed by VMware. Rather, it’s the fact that your partitioned hosts literally have no means to communicate with each other because of your VLAN configuration. For vMotion for work, the source and destination host must be able to communicate initially via the Management Network (or Service Console network for ESX) and then via the vMotion network to transfer the VM’s memory state. Your hosts have neither of these networks in common, hence the partition and the inability to vMotion.
If you can get all the 4.x and 5.5 hosts onto the same Management VLAN and vMotion VLAN, then your partitions should rejoin and you’ll be able to vMotion your VMs across (assuming they’re sharing storage – you won’t be able to use xvMotion (vMotion without shared storage) until all the hosts are 5.1 or above).
Not sure if that is completely correct, because when I disable HA on the cluster, I can perform VMotions.