I received a call with a typical error message within the vSphere world: When powering on VMs we received a warning with the following message

‘the operation is not allowed in the current state’

Scenario summary: vCenter/ESXi 5.5U3

  1. Storage LUNs were replicated to a second device (async)
  2. Failover to second storage device was triggered
  3. Datastores were made visible to the ESXi and the VMFS was resignatured
  4. VMs were registered to the ESXi hosts

Symptoms

When the recovered VMs are powered on, the mentioned error occurred.

Screen Shot 2015-03-27 at 17.22.15

A reboot of the ESXi, vCenter and its services and even an ESXi reconnect did not solved the problem, so I started a more deterministic root cause analysis.

Root cause:

The recovered virtual machines CD-Drive were referring to an ISO-file on a non-existent NFS datastore that hasn’t been recovered. Unfortunately the error message itself was not pointing to the root cause.

Root cause analysis:

checking the vCenter vpxd.log didn’t gave us much information about the problem:


vim.VirtualMachine.powerOn: vim.fault.InvalidHostConnectionState:
mem> –> Result:
mem> –> (vim.fault.InvalidHostConnectionState) {
mem> –> dynamicType = <unset>,
mem> –> faultCause = (vmodl.MethodFault) null,
mem> –> host = ”,
mem> –> msg = “”,
mem> –> }
mem> –> Args:
hmm, yeah…not very much useful information. So next step -> checking the hostd.log within the ESXi host.
2015-03-27T12:03:36.340Z [69C40B70 info ‘Solo.Vmomi’ opID=hostd-6dc9 user=root] Throw vmodl.fault.RequestCanceled
2015-03-27T12:03:36.340Z [69C40B70 info ‘Solo.Vmomi’ opID=hostd-6dc9 user=root] Result:
–> (vmodl.fault.RequestCanceled) {
–> dynamicType = <unset>,
–> faultCause = (vmodl.MethodFault) null,
–> msg = “”,
–> }
2015-03-27T12:03:36.341Z [FFBC6B70 error ‘SoapAdapter.HTTPService.HttpConnection’] Failed to read header on stream <io_obj p:0x6ab82a48, h:66, <TCP ‘0.0.0.0:0’>, <TCP ‘0.0.0.0:0’>>: N7Vmacore15SystemExceptionE(Connection reset by peer)
2015-03-27T12:03:40.024Z [FFBC6B70 info ‘Libs’] FILE: FileVMKGetMaxFileSize: Could not get max file size for path: /vmfs/volumes/XXXXXX, error: Inappropriate ioctl for device
2015-03-27T12:03:40.024Z [FFBC6B70 info ‘Libs’] FILE: File_GetVMFSAttributes: Could not get volume attributes (ret = -1): Function not implemented
2015-03-27T12:03:40.024Z [FFBC6B70 info ‘Libs’] FILE: FileVMKGetMaxOrSupportsFileSize: File_GetVMFSAttributes Failed

so it seems that we had some kind of IO problems. Checking /vmfs/volumes/XXXX we realized that we were not able to access the device.
The volume itself was a NFS share mounted as a datastore and as you probably know are also mounted in the /vmfs/ folder of the ESXi.

Even though the VMs are running on block-based storage (iSCSI) I found out that there was still a dependancy between the VM and the not-reachable NFS device -> The VMs had an ISO-file from a NFS datastore mounted. During the failover of the storage the NFS datastore hasn’t been restored and the VM was trying to access the NFS share to include the ISO file.

Summary:

Those things happen all the time, so take care to unmount devices when you don’t need them anymore (Use RVTools/Scripts and establish an overall operating process -> check my ops-manual framework 😉 ). Those little things can be a real show-stopper in any kind of automatic recovery procedures (scripted, vSphere Site Recovery Manager, etc.)