vSphere Replication 6.0 part 3: enhanced linked mode, backup and recovery scenarios

In the 3rd part of my series I am going to talk about the usage of vCenter enhanced linked mode and vSphere Replication 6.0 and how it can be used to protect the vSphere replication infrastructure itself.

In the newest version vSphere replication makes use of the LookupService provided by SSO on the new Plattform Service Controller (PSC). Having multiple vCenter instances sharing the same PSC, the so called vCenter enhanced linked mode, we are not just able to manage all vCenter within a single vSphere Web Client. We can also use vSphere replication to replicate and therefore protect VMs from one site to another and migrate simply a VM back after a restore of the protected site within an integrated view.

The following demonstrated a logical view on a recommended vCenter enhanced linked mode setup.


This architecture has a lot of benefits. You have a vCenter in both sites which is required when you are forced to recover your VMs (in a supported way). As soon as we are having our vCenter in an enhanced linked mode we are able to select all joined vCenter as a target for our vSphere replication protection.

vSphere Replication linked mode target site

I see very often that the backup strategy of some organizations does not take it into consideration that you very often MUST have a vCenter to recover a VM with your backup solution ( if there is no ’emergency-direct-to-ESXi-recovery-feature’ included). For sure there are ways to register the replicated VM back to on the recovery site, but hey … (officially) we need to make sure that our recovery procedures are supported by the vendor.

In the current situation there is one thing I am uncomfortable with. The current recommended way by VMware tells us to create a dedicated PSC-Cluster with a Network Load Balancer in front of it. Since only NSX, F5 and NetScaler is supported this puts a little additional fee for licensing, operating and implementing of the solution. To be honest, I don’t believe to see such a setup pretty often in non-enterprise environments (On this level people are waiting for vVol replication support ;-)).

The only ‘easier’ suitable option would be to a solution like the following in place


Referring to VMware blog post on the new PSC architecture possibilities the only recommended option is the one mentioned in the beginning. I am currently evaluating and searching discussions about the pros/cons of the mentioned configuration. I will write about the findings in a different post.

Protect and Recover vSphere Replication Appliances and Server (Demo)

It’s worth to remember protecting the vSphere Replication Appliances as well, so that in case of an outage your are able to bring back the replication infrastructure pretty painless. I am going to show you how to recover from a vSphere Replication Appliance data-loss.

In my Lab environment I have two sites and I am going to protect the vSphere replication appliance from LE02 (managed by LE02VCE03) to LE01 (managed by vCSA01). The PSC of each vCenter has joined to the same SSO-Domain.

On my protected site I have 6 machines protected.

In the first scenario I have lost my vSphere replication appliance data on the protection site, so I recover it (vSRA) with help of vSphere replication


and once the original site has been restored, I failback to it via cross vCenter vMotion.

vSphere_Replication_2 vSphere_Replication_4

One thing you need to take care of is that the vSphere Replication Appliance and Server are registered against a vCenter. If you restore this machine in the way I described it above or with any other backup solution that restores the VM you need to make sure to re-register the VM with the vCenter, otherwise you see the following error within the vSphere replication menu.



So what to do? Register the recovered VM as a vSphere replication server

Screen Shot 2015-07-14 at 16.16.10

and verify that all of your vSphere replication jobs are still in place / running.

Screen Shot 2015-07-11 at 11.15.50

Voila… we recovered the vSphere Replication Appliance and can go on with our next test.

Recover  protected virtual machines with and failback with cross vCenter vMotion (Demo)

My protected site has been failed and the data has been lost. Lucky me I was able to recover all protected VMs on my recovery site. Depending on the network-characteristics you might be forced to change the IPs of your VMs (PowerCLI can be your friend 😉 )

Screen Shot 2015-07-14 at 17.11.58

Screen Shot 2015-07-13 at 21.32.39

After the rebuild of my primary site. I was able to to failback/migrate all VMs with cross vCenter vMotion to the original site.

Screen Shot 2015-07-13 at 22.53.59


Finalize the steps and voila. You have successfully failed back the VMs.

Make sure to configure a re-protection of the virtual machines.

Final words

The thing I am still missing is a smooth way of having a simple setup of a vCenter in a linked mode. Once I lost my protected site the behaviour of the Web Client was getting really slow and sluggish. Even after the site recovery I needed a reboot of my primary vCenter to get it fully functional again. At this time I am still not sure what’s the best way to establish a vCenter in enhanced linked mode in a ‘stretched’ environment. Any input / discussions / opinions are very appreciated.



vSphere Replication 6.0 – Part 2: vSphere Replication performance and SLAs

After a few weeks and several reasons (professional and non-professional) I finally restarted writing about my current vSphere replication version 6.0 series. This part 2 focus on some network design options and how they might impact SLAs defined for the Recovery Point Objective (RPO).

Since I summarized the architecture and components in part 1 I am now going to analyze the effects on the performance based on the network design decisions.

Option 1: “Keep as much traffic as possible within the ESXi”


Result via ESXTOP:

ESXTOP - option 1

-> With the network configuration to minimize the routing effort I was able to nearly utilize to complete vmnic adapter (ca. 900 Mbit / s)

Option 2: “Having replication traffic routed between vSphere Replication Appliance and the VMkernel port”


Result via ESXTOP:

ESXTOP Option 2

-> As expected the throughput dropped nearly by 50% to around 440 Mbit / s.

I know that those 2 results are depending on the specific characteristic of my homelab environment. The reason I have written that down was to create an awareness that the network decision has an impact on the replication performance and therefore maybe on the fact if you can meet an SLA or NOT.

Let’s make a short calculation within a small scenario.

RPO – Recovery Point Objective: How much data can get lost during a failure. This value is configured during the setup of a replication job and defines within which time-interval the concrete replication is started

Amount machines 15
VM Size 100 GB = 102400 MB
Max average daily disk-change rate 5%
Max Replication transfer rate option 1 901 Mbit / s = 112,625 MB / s
Max Replication transfer rate option 1 440 Mbit / s = 55 MB / s

The initial replication can be calculated with the following formula:

Screen Shot 2015-06-30 at 22.23.10

and will take the following amount of time in our scenario:

Option 1:

Screen Shot 2015-06-30 at 22.23.37

Option 2:

Screen Shot 2015-06-30 at 22.24.01

To meet a SLA we are in most cases more interested about how long the ongoing replication will take place.

Screen Shot 2015-06-30 at 22.24.36

Option 1:

Screen Shot 2015-06-30 at 22.25.04

Option 2:

Screen Shot 2015-06-30 at 22.25.28

So if you have an RPO defined with 15 minutes there is a risk not to meet the SLA within option 2.

Maybe I repeat myself, but that this is just an example of a calculation (and depending on the use case the limiting factor will be the link between the protected and replicated site). Nevertheless you need to get aware of the following relevant metrics when you design replication:

  • replication-throughput
  • change-rate
  • number and size of your VMs.

In production we don’t want to receive an RPO violation alarm (technically or by the service manager ;-). If you can’t meet the requirements in a theoretical calculation, you will not be able to meet them during daily operations.

Which tool can we use to get the above metrics? Replication-throughput via ESXTOP (network-view: n), number and size of your VMs via PowerCLI (If haven’t done stuff with PowerCLI so far, this is a great starting task for it ;-).

For gathering data about the data change-rate within a VM I refer to a PowerCLI script Scott Herold (his name was in the comments) has created a few years ago that used the change-block-tracking mechanism. I found the script via google and you can download it here (Download: CBT_Tracker – Howto). No need to say that you should understand it (and it’s influence on your system – it uses CBT and snapshots – see the comments within the script) and test the script first before you use it for your analysis.

Compression – The X-Files continues

As I have already said VMware has included a new compression mechanism in 6.0 to speed up the initial copy job. During my first tests (Setup 1 with compression enabled) I had a higher CPU utilization (that’s expected on my vSphere Replication Appliance), but also a lower Throughput of the replication data. I am totally unaware what went wrong here. I will try to figure out more about this effect and keep you informed ;-). If you have any ideas/hints what went wrong in my setup. Please comment or contact via Twitter (@lenzker).

Enhance your #homelab by re-enabling transparent page sharing

“Sharing is caring!”

I decided to quickly write down a few last words about a widely discussed topic before my 4 week journey to Brazil begins.

Transparent page sharing (TPS)

The concept of transparent page sharing has been widely explained and the impact discussed in all of the years (whitepaper). Short version: Multiple identical virtual memory pages are pointing to the same one page within the host memory.

The general behaviour has changed multiple times with enhancements within AMD’s & Intel’ newer CPU generations where large pages were intensively used (2MB instead of 4KB areas –> Increasing the chance of a TLB-Hit) and the benefits of TPS couldn’t be used until the ESXi host gets under memory pressure and changes it’s memory state to break up with his girlfri….I mean the large pages into small ones.

Last year some security concerns came up with this feature and VMware pro-actively deactivated the TPS feature with all newer ESXi versions and updates (KB).

I don’t want to talk about the impacts and design decisions on productive systems, but on homelab environments instead. It is nearly a physical constant that a homelab always lacks of memory.

By deactivating large pages & the new security mechanisms you can save a nice and predictable amount of consumed/host-backed memory.

And especially in homelab environments the risk of a lower performance (caused by the higher memory access time through the higher probability of TLB-misses) and of security concerns might be mitigated.

What to do?

Change on each ESXi host the advanced system settings



Mem.ShareForceSalting = 0


and wait a couple of minutes until TSP kicks in.

Effect on my homelab

60 minutes after the setting was done the amount of consumed RAM on my cluster (96GB RAM – setup) decreased from 55GB to 44,8GB which means around 20% memory consumption has been saved in my VM constellation (multiple identical Windows 2012 R2 and nested ESXi which have a high ‘shared’ value)


So if you need a very quick approach to workaround the memory pressure in your homelab and you can live the potential performance loss –> re-activate Transparent page sharing as a first step to optimize your environment. Sure you can also skip the deactivation of large pages and hope that during a change in the memory state the large page breakup process is quick enough and therefore TPS works again. But I preferred a permanent and audit-able approach of monitoring the amount of shared memory in my lab.

2nd step to optimize the memory consumption –> Size your VMs right… but this is nothing I will tell you about since my plane is going to start… my substitute vRealize Operations will do the job 😉

#PowerCLI: Distributed switch portgroup usage. Good vs. bad vs. Max Power approach

From now on there are three ways of doing things: the right way, the wrong way, and the Max Power way.

Isn’t that the wrong way?

Yes! But faster!

This quote from Homer Simpsons came directly into my mind when I was doing some PowerCLI scripting during the week.


I started with the wrong / Max Power way and suddenly came to a much more smarter solution – the right way.

The task was to gather the usage on the virtual distributed switch portgroups within a world-wide environment with around 30 vCenter. (Final script on the bottom of this blog)

Once again I realized there are many roads to Rome and even with PowerCLI you can either go there by crawling or using a plane.

My first approach was to get each portgroup and have a look through each port if it has a connection to the virtual network adapter of a VM (each VM only has 1 Network adapter).

$ports = Get-VDSwitch | Get-VDPort -VDPortgroup $pg
$portsconnected = $ports | Where {$_.ConnectedEntity.Name -eq 'Network adapter 1'}

That approach was incredibly slow (> 12 hours) since it took a while to get all port objects of the distributed switch (more than 5000 per vDS).

Thanks to Brian Graf‘s great blog article we know how to access vSphere objects extension data in a much more elegant way.

$networks = Get-View -viewtype Network
Foreach ($network in $networks){
    $pgname = $network.Name
    $connectedports = ($network.VM).count 

Doing it that way it took 15 minutes instead of 12++ hours.

It really makes a huge difference if you code something right or wrong. That’s counts for Software, SQL-queries and also for all kind of scripts we use and built in our daily IT-infrastructure world.

The final script gives you an csv output file with the values

Datacenter, PortgroupName, VLANID, NumberOfConnectedPorts

Make sure to use Powershell 3.0++ so you can use the -append option in the export-csv cmdlet.


Good one

$results = @()

$cluster = get-cluster | Sort-Object -Property Name
$dcName = (Get-Datacenter).Name
$networks = Get-View -viewtype Network

Foreach ($network in $networks){
    $pgname = $network.Name
    $pg = Get-VDPortgroup -Name $pgname
    $vlanid = $pg.vlanConfiguration.VlanID
    $connectedports = ($network.VM).count

    $details = @{
        PortgroupName = $pgname
        VLANID = $vlanId
        NumberOfConnectedPorts = $connectedports
        Datacenter = $dcName

    $results += New-Object PSObject -Property $details

$results | export-csv -Append -Path c:\temp\newvDSnetworkConnected.csv -NoTypeInformation

 Bad one

$results = @()

$dcName = (Get-Datacenter).Name
$pgs = Get-VDSwitch | Get-VDPortgroup | Where {$_.IsUplink -ne 'True'}

foreach ($pg in $pgs){

    $ports = Get-VDSwitch | Get-VDPort -VDPortgroup $pg
    $portsconnected = $ports | Where {$_.ConnectedEntity.Name -eq 'Network adapter 1'}

    $pgname = $
    $vlanId = $pg.VlanConfiguration.VlanId

    $connectedports = $portsconnected.count
    $details = @{
        PortgroupName = $pgname
        VLANID = $vlanId
        NumberOfConnectedPorts = $connectedports
        Datacenter = $dcName
    $results += New-Object PSObject -Property $details 

$results | export-csv -Append -Path c:\temp\vDSnetworkConnected.csv -NoTypeInformation

Shall I upgrade vHW? Enhanced vNUMA support with Hardware Version 11 (vHW11) in vSphere 6.

As with every release VMware increased the version of the virtual hardware of its virtual machines.

ESXi 5.0 vHW8
ESXi 5.1 vHW9
ESXi 5.5 vHW10
ESXi 6.0 vHW11

Each ESXi / vCenter has a only a limited set of vHW compatibility  (e.g. vHW11 MUST run on ESXi6.0++ and therefore managed by a vCenter 6.0++. Check the VMware solution interoperability Matrix for that) and multiple virtual machine related feature enhancements (e.g. new OS-support, new device-support, more vCPU, more vRAM, more more more).

vHW11 - upgrade

Andreas Lesslhummer did a short summary on the what’s new features within vHW 11 and one feature that jumped directly into my eyes:

‘vNUMA aware hot-add RAM’

In my job people often ask me.

‘Do I need to upgrade my virtual machine hardware version?’ 

and I respond with a typical consultant/trainer answer

‘It depends! ;-)’

You need the following questions to answer

Q1. What is the current and the oldest version of vSphere where your VMs might run in a worst-case (failover, recovery, etc.)?

A1. Make sure you are able to run the VMs in all environments (Check the compatibility upwards (newest version) and downwards (oldest version)). You don’t want to start to deal with a vCenter converter to downgrade vHW again.

Q2. Do you need the additional features offered by the specific vHW?

A2. For most use-cases I have dealt with so far, it was not necessary to upgrade the Hardware Version for the new features, except you have pretty BIG MONSTER VMS or the customer had VMs with a vHW < 7With vHW11 a new feature might come in handy if you are dealing with … I don’t know how to call them…  ‘a little bit smaller business critical MONSTER VMs (> 8vCPUs)’.

vSphere 6.0 and vHW11 resolves 1 out of 2 constraints still not everyone is aware about regarding (v)NUMA. vNUMA offers the non-unified memory characteristics of the Server to the guest operating system inside a virtual machine. The guest operating system can now schedule more efficient its processes to the vCPU. This is only relevant by default if you have > 8 vCPUs and multiple Sockets defined as your virtual machine hardware (Mathias Ewald wrote an excellent article about it a while ago).

NEW: If you enable memory-hot add to a VM (to increase the memory size during the runtime – ideal for performance baseline definitions OR quick responses to increasing demand) the additional memory will be extended equally over all existing NUMA nodes of the VM if you are on vHW11.

Unfortunately the other constraint still remains in vSphere 6.0. If you enable CPU hot-add in your VM, the vNUMA characteristic will be hidden from the Guest-OS (KB – 2040375).

Make sure you are aware of the hot-plug settings you have done in your environment with your business critical VMs, since it might have a performance impact (Sample here).

If you want to have memory hot add available including vNUMA support and your complete environment is running on vSphere 6.0, upgrade to vHW11 enable memory-hotplug and disable cpu-hotplug.

vSphere Replication 6.0 Part 1: Architecture and features at a glance. vSphere Replication standalone

vSphere Replication is a really cool application helping us to replicate our virtual machines on a VM-level without the need to have a dedicated and replicating storage. Besides the conservative replication methods it can also be used to replicate to a public cloud provider (call it DaaS or whatever 😉 like VMware vCloud Air (I am still lacking technical-deep knowledge of VMware’s hybrid cloud appriache. It will be a seperate blog-post once I know more;-).

In the following I want to give an architectural overview of the new version 6.0 of vSphere replication. I realized during some of my Site Recovery Manager class people might get confused with some KB-mentioned terminologies and maximums so I wanted to create something that clarifies all of those things.

This article is the first part of the vSphere replication 6.0 series (not sure if series is the right word if only 3 episodes are planned 😉 )

General features and what’s new in vSphere Replication 6.0 (NEW)

  • Replication on a virtual machine level
  • Replication functionality embedded in the vmkernel as vSphere replication agent
  • RPO (Recovery Point Objective = data that can be lost): 15min – 24hours
  • Quiescing of the guest-OS to ensure crash-consistency for Windows (via VSS) and Linux (NEW)
  • Replication indepedant of theunderlaying storage technology (vSAN, NAS, SAN, Local)
  • Support for enhanced vSphere functionalities (Storage DRS, svMotion, vSAN)
  • Initial replication can be optimized by manually transferring the virtual machine to the recovery location (replication seeds)
  • Transferring of the changed blocks (not CBT) and initial synchronization can be compressed and therefore minimize the required network-bandwidth (NEW)
  • Network can be more granular configured and more isolated from each other (NEW) with VMKernel functions for NFC and vSphere Replication (vSphere 6.0 required)
  • Up to 2000 Replications (NEW)

Components required for vSphere replication

For vSphere replication we need besides to the mandatory components (vCenter, SSO, Inventory Service, Webclient and ESXi) download the vSphere replication appliance from

The general task of the vSphere replication appliance is to get data  (VM files and changes) from the vSphere agent of a protected ESXi and transfer it to the configured recovery ESXi (via a mechanism called Network File Copy – NFS).

Now it might get a little bit confusing. The appliance we are downloading are in fact 2 different appliances with 2 different OVF-file pointing to the same disk file.

  1. 1x OVF (vSphere_Replication_OVF10.ovf) which is the virtual machine descriptor for the vSphere replication (manager) appliance – used for SRM OR single vSphere Replication – 2 or 4 vCPU / 4GB RAM
  2. 1x OVF (vSphere_Replication_AddOn_OVF10.ovf) which is the virtual machine descriptor for the vSphere replication server – can be used to balance the load and increase the maximum number of replicated VMs – 2 vCPU / 512 MB RAM

vSphere replication (manager) appliance

The vSphere replication (manager) appliance is the ‘brain’ in the vSphere replication process and is registered to the vCenter so that the vCenter-Webclient is aware of the new functionality. It stores the configured data in the embedded postgres-SQL database or in an externally added SQL database. The VMware documentation typically talks about the vSphere replication appliance, to make sure not to mix it up with the replication server I put the (manager) within the terminology. The vSphere replication (manager) appliance includes also the 1st vSphere replication Server. Only 1 vSphere replication appliance can registered with a vCenter and supports theoretically up to 2000 replications if we have 10 vSphere replication server in total. Please be aware of the following KB if you want to replicate more than 500 VMs since minor changes at the Appliance are mandatory.

vSphere replication server

The vSphere replication server in general is responsible for the replication job (data gathering from source-ESXi and data transferring to target-ESXi). It is included within the vSphere replication appliance and can effectively handle 200 replication jobs. Even though it I have read in some blogs that it is only possible to spread the replication load over several vSphere replication server in conjunction with Site Recovery Manger it works out of the box without the need for Site recovery manager.

Sample Architecture

The following picture should illustrate the components and traffic flow during the vSphere replication process.



The following diagrams shows two sample Architectures regarding the network.

In the first diagram the vSphere replication network MUST be routed/switched on layer 3, while in the  second example we are able to stay in a single network segmet with our replication-traffic (thanks to the new VMkernel functionalites for NFC/Replication traffic in vSphere 6).

Option 1: External routing/switch mandatory (would be a good use case for the internal routing of NSX ;-)):


Option 2: No routing mandatory & switching occurs within the ESXi


Of course those are only two simple configuration samples, but I want to make you aware of that the (virtual) network design has an impact on the replication performance in the end.

I will focus on the performance difference between those two options (and the usage of the compression mode within an LAN) in part 2. Stay tuned 😉

[#Troubleshooting] the operation is not allowed in the current state after replicated storage failover

I received a call with a typical error message within the vSphere world: When powering on VMs we received a warning with the following message

‘the operation is not allowed in the current state’

Scenario summary: vCenter/ESXi 5.5U3

  1. Storage LUNs were replicated to a second device (async)
  2. Failover to second storage device was triggered
  3. Datastores were made visible to the ESXi and the VMFS was resignatured
  4. VMs were registered to the ESXi hosts


When the recovered VMs are powered on, the mentioned error occurred.

Screen Shot 2015-03-27 at 17.22.15

A reboot of the ESXi, vCenter and its services and even an ESXi reconnect did not solved the problem, so I started a more deterministic root cause analysis.

Root cause:

The recovered virtual machines CD-Drive were referring to an ISO-file on a non-existent NFS datastore that hasn’t been recovered. Unfortunately the error message itself was not pointing to the root cause.

Root cause analysis:

checking the vCenter vpxd.log didn’t gave us much information about the problem:

vim.VirtualMachine.powerOn: vim.fault.InvalidHostConnectionState:
mem> –> Result:
mem> –> (vim.fault.InvalidHostConnectionState) {
mem> –> dynamicType = <unset>,
mem> –> faultCause = (vmodl.MethodFault) null,
mem> –> host = ”,
mem> –> msg = “”,
mem> –> }
mem> –> Args:
hmm, yeah…not very much useful information. So next step -> checking the hostd.log within the ESXi host.
2015-03-27T12:03:36.340Z [69C40B70 info ‘Solo.Vmomi’ opID=hostd-6dc9 user=root] Throw vmodl.fault.RequestCanceled
2015-03-27T12:03:36.340Z [69C40B70 info ‘Solo.Vmomi’ opID=hostd-6dc9 user=root] Result:
–> (vmodl.fault.RequestCanceled) {
–> dynamicType = <unset>,
–> faultCause = (vmodl.MethodFault) null,
–> msg = “”,
–> }
2015-03-27T12:03:36.341Z [FFBC6B70 error ‘SoapAdapter.HTTPService.HttpConnection’] Failed to read header on stream <io_obj p:0x6ab82a48, h:66, <TCP ‘’>, <TCP ‘’>>: N7Vmacore15SystemExceptionE(Connection reset by peer)
2015-03-27T12:03:40.024Z [FFBC6B70 info ‘Libs’] FILE: FileVMKGetMaxFileSize: Could not get max file size for path: /vmfs/volumes/XXXXXX, error: Inappropriate ioctl for device
2015-03-27T12:03:40.024Z [FFBC6B70 info ‘Libs’] FILE: File_GetVMFSAttributes: Could not get volume attributes (ret = -1): Function not implemented
2015-03-27T12:03:40.024Z [FFBC6B70 info ‘Libs’] FILE: FileVMKGetMaxOrSupportsFileSize: File_GetVMFSAttributes Failed

so it seems that we had some kind of IO problems. Checking /vmfs/volumes/XXXX we realized that we were not able to access the device.
The volume itself was a NFS share mounted as a datastore and as you probably know are also mounted in the /vmfs/ folder of the ESXi.

Even though the VMs are running on block-based storage (iSCSI) I found out that there was still a dependancy between the VM and the not-reachable NFS device -> The VMs had an ISO-file from a NFS datastore mounted. During the failover of the storage the NFS datastore hasn’t been restored and the VM was trying to access the NFS share to include the ISO file.


Those things happen all the time, so take care to unmount devices when you don’t need them anymore (Use RVTools/Scripts and establish an overall operating process -> check my ops-manual framework 😉 ). Those little things can be a real show-stopper in any kind of automatic recovery procedures (scripted, vSphere Site Recovery Manager, etc.)

vSphere Web Client 6.0: working remote via tethering. Bandwidth comparison RDP vs. local Browser

When I first installed vSphere 6.0 I was pretty impressed about the performance gain of the vSphere Web Client. Finally the Web Client is a tool I can work productive with and mustn’t be afraid to be marked as unproductive from my customers (it’s tough to argument a higher hour-rate if I wait 20% of my time for the UI 😉 ).

So my homelab was installed with vSphere 6.0 and I tried to connect to it via VPN from my hotel wifi. Since the wifi was blocking my VPN attempts I was forced to tether/share the internet via my smartphone.

Sharing internet on… starting OpenVPN against my homelab… opened Chrome to use the Web Client and.… the useability with the Web Client 6.0 was really really good.

After a few minutes I received a warning by my provider T-mobile that my data plan has reached the 80% thresholds. I know 500MB included in my data plan is not that much, but still I was really suprised seeing this OpenVPN statistics after a few minutes.

Screen Shot 2015-03-16 at 21.11.54


Since I haven’t used any other services than the vSphere Web Client I wanted to know how much bandwidth working in a local Browser via the Web Client really needs.

I created a test-case (it’s Sunday, weather is bad, Bundesliga is pausing) which should take around 3-4 minutes:

  1. Login to the vCenter via the Web Client
  2. Navigate around in the home menu
  3. Select Hosts and Clusters
  4. Choose an ESXi Host and navigate to Manage / Settings / Networking
  5. View the vSwitch, virtual adapater and physical Nics settings
  6. Go to related Objects and select a datastore
  7. Browse the content of the datastore
  8. Create a new VM with default settings
  9. Power on the VM

I did this test with 2 Browsers Chrome and Firefox (to make sure the results are nearly identical) and observed the results via the activity monitor of MacOS. As a 3rd alternative I have chosen to use a remote connection via Microsoft Remote Desktop (native resolution, 16 Bit color) and did the same test-case steps mentioned above.

Here are the results:

  1. Chrome – duration: <4 minutes, bandwidth: ca. 21 MB
  2. Firefox – duration: < 4 minutes, bandwidth: ca:26 MB
  3. RDP – duration: < 3.5 minutes, bandwidth: ca. 2 MB

Of course there are a lot of factors not considered (high activity on the vCenter that might increase the data most certainly), but the numbers should give you a feeling that the better performance of the Web Client seems to come side by side with a pretty bandwidth sensitive caching on the client side. So if you work with limited bandwidth or any kind of throughput limitation use an RDP connection to a Control-Center VM within your environment that is able to use the Web Client for your daily vSphere operations.


Screen Shot 2015-03-29 at 15.27.18 Screen Shot 2015-03-29 at 15.34.47 Screen Shot 2015-03-29 at 15.38.05

vSphere and CPU power management: performance vs. costs in the VDI field

Sometimes I love being an instructor. 2 weeks of Optimize and Scale and finally I have more valid and realistic values from 2 participants of mine regarding  performance vs. power usage.

First of all thanks to Thomas Bröcker and Alexander Ganser who were not just discussing this topic with me, but also did this experiment in their environment. First of all I am proud that it seems that I have motivated Alexander to blog about his findings in English :-). While he is focusing in his post on hosting server applications on Dell/Fujitsu hardware (-> please have a look at it), I will extent this information by using data from a HP-based VDI environment, where the impact on performance, power-usage and costs were much higher than I have expected it.

The trend of green IT not just had an effect on more effective consumer CPUs, it is also getting more and more a trend in modern datacenters. Hosts are powered down and on automatically (DPM – productive users of this feature please contact me 😉 ), CPU frequencies are dynamically changed or cores are disabled on demand (core parking). Since I always recommend NOT to use any power management features in a server environment, I am now following up this topic by giving suitable and realistic numbers from a production environment.

A few details about the setup and the scenario I am going to talk about. For my calculations later on I selected a common VDI size of around 1000 Windows 7 virtual machines.

VDI: 1000

Number of ESXi (NoESXi): 20 (vSphere 5.5 U2)

CPU-type: 2x Intel Xeon E5-2665 (8 Cores 2.4 – 3.1 Ghz – TPD – 115W)

vCPU per VM: 4 (pretty high for regular VDI – but multimedia / video capability was a requirement ( by avg. 80% of the VDI have active users)

vCPU / Core rate: 12.5

A few comments to the data. Intranet video-quality was miserable with the initial VM sizing (1 vCPU). We took a known and approved methodology of the 2 performance affecting dimensions:

  • 1st dimension: Sizing of a virtual machine (is the virtual hardware enough for the proposed workload?) – verified by check if the end-user satisfied with the performance (embedded videos are working fluently).
  • 2nd dimension: Sharing of resources (how much contention can we tolerate when multiple virtual hardware instances (vCPU) shares the physical hardware (Cores) – verified by defining thresholds for specific ESXi metrics.

As a baseline approach we defined that an intranet video needs to run fluently and ESXTOP metrics %RDY (per vCPU – to determine a general scheduling contention) and %CO-STOP (to determine a scheduling difficulty because of the 4vCPU SMP) were not reaching a specific threshold (3% Ready / 0% CO-STOP) during working hours. *

// * of course we would run into a resource-contention once each user on this ESXi host is going to watch a video within the virtual desktop resulting a much higher %rdy value.

So far so good. The following parameters describe dependant variables for the power costs of such an environment. Of course the used metrics can differ between countries (price for energy) and datacenter type (cooling).

Power usage per host: This data was taken in real-time via iLO HP DL 380G8 and describes the current power usage of the server.  We tested the following energy-safer settings (Can be changed during runtime and has a direct effect):

  • HP Dynamic Power Savings

  • Static High Performance Mode

Climate factor: A metric defining how much power is effort to cool down the IT systems within a datacenter. This varies a lot for different datacenter and I am referring as a source to Centron who did an analysis in German with an outcome that the factor is between 1,3 and 1,5 which means that for 100 Watt used by a component we need 30/50 Watt for the cooling energy. The value I will take is randomly taken as 1,5 and can differ a lot in each datacenter.

Power Price: This price will differ the most in each country depending on the regulations. The price is normed as kilo Watt hour, means how much do you pay for 1000 Watt power usage in 1 hour. Small companies in Germany will have to pay around (25 Cent per kWH), while large enterprises with a huge power demand pay much less ( around 10 Cent per kWH)

Data was collected during a workday at around 11 AM – Friday. We assume that the data is taken during a regular office-hour workload.

Avg. power usage per host Power Savings (PU-PS) = 170 Watt = 0,170 kW

Avg. power usage per host High Performance (PU-HP) = 230 Watt = 0,230 kW

Price per kW in an hour (price) = 0,25 Euro

climate factor (cli-fa) = 1,5

so let’s take the data and do some calculations based on the VDI-server data mentioned above:

VDI – Power-costs per year = NoESXi * (price * PU-XX * 24 * 365) * cli-fa

Power-Costs per year Power Saving mode = 20 * (0,25Euro/W * 0,17W *  24 * 365) * 1,5 =11169Euro

Power-Costs per year High Performance mode = 20 * (0,25Euro/W * 0,23W * 24 * 365 ) * 1,5 = 15111 Euro

11169 Euro vs 15111 Euro a year (for the power of around 1000 VDIs)

The result of the power-saving mode is very high/aggressive in a VDI environment and is far less when the ESXi host is used for server virtualization (I refer back to the blog post of Alexander Ganser since we observed nearly the same numbers for our serers). Server virtualization has a higher constant CPU-load while VDI workload pattern is much more infrequent and gives a CPU more chances to quiesce-down a little bit. We observed around 10% power-savings in the server field.

So now let’s get a step ahead and compare the influence of the energy-saving mode for the performance.

  • HP Dynamic Power Savings: CPU Ready avg of 2% per vCPU (=400ms in Real-Time charts)

  • Static High Performance Mode: CPU Ready avg of 1% per vCPU (=200ms in Real-Time charts)



As you can see the power usage has a direct impact on the ready values of our virtual machines vCPU. At the end of the day the power-savings have a little financial impact in the VDI field, still I always recommend deactivating ALL power-saving methods since I always try to ensure the highest performance.

Especially in the VDI field with irregular sudden CPU spikes the wake-up / clock-increasement of the Core takes too much time and if you read through the VMware community on a regular basis you will see that a lot of strange symptoms are very often resolved by disabling energy-saving mechanisms.

Please be aware that those numbers may differ in your environment depending on your server, climate-factor, consolidation-rate, etc.

Automatic virtual hardware change after shutdown – vCenter alarms and #PowerCLI

Teaching an Optimize and Scale class this week I was asked if there is chance to automatically change the virtual hardware once a VM has been shut down or powered off.

Scenario – Customer wants to change the CPU / RAM of their XenDesktops with minimal impact on the end-user. If a VM is powered off vCenter should change the hardware and wait until another components powers the VM on again.

At first I was not sure how this can be done. Using PowerCLI for changing the virtual hardware? – easy… but how can we call this script once the VM is powered off? After a minute I remebered that we can use vCenter alarms to trigger more than just snmp/email actions if a specific Event/Condition occurs. We can also run cmds and therefore call a PowerCLI script.


Using AD-based authentication for the PowerCLI script makes it easier regarding the Authentication mechanism. Therefore the vCenter server must run with Active Directory Service Account that has the proper permissions on the vCenter level



Chapter 1:  Easy going – Hardcode the CPU count in the script

Create the PowerCLI script

Now we create the modify-cpu.ps1 script that will be stored in the C:\scripts\

with the following content. (Note: the CPU count must be hardcoded in the script by changing the $numCPU parameter. Be aware of that this script changes the count of Cores and stays with 1 virtual socket.

Complete Script – modify-cpu.ps1:


$vCenter = 'localhost'

$numCPU = 4
# Include VMware-PowerCLI SnapIn
if(!(Get-PSSnapin | Where {$ -eq "vmware.vimautomation.core"}))
        Write-Host "Adding PowerCLI snap in"
        Add-PSSnapin VMware.VimAutomation.Core -ea 0| out-null
        throw "Could not load PowerCLI snapin"

Connect-VIServer $vCenter

$vm = get-vm $vmname
$spec=New-Object –Type VMware.Vim.VirtualMAchineConfigSpec –Property @{"NumCoresPerSocket" = $numCPU;"numCPUs" = $numCPU}
$vm.ExtensionData.ReconfigVM_Task($spec) | out-null

Disconnect-VIServer $vCenter -confirm:$false


Create the vCenter alarm that will call the script above with the VM name as parameter

Select the VM you want the virtual CPU count be changed after a shutdown and create a vCenter alarm (wow I am really using the webclient)

Give it a valuable name and describiton and select to monitor for a specific event


As an event where the alarm and therefore the action will be triggered we select VM – Powered off


And finally as an action we call our created PowerCLI script in C:\scripts with the following statement:

“c:\windows\system32\cmd.exe” “/c echo.|powershell.exe -nologo -noprofile -noninteractive C:\scripts\modify-cpu.ps1 {targetName}”

With the {targetName} variable we can transfer the name of the virtual machine that caused the Trigger.


And voila. If the VM is now getting powered-off -> the change of the virtual hardware will be done.


(German language, hm!? Doesn’t sound it quiet nice :P)


If the configuration is not working as expected, validate the functionality by call cmd with the user that is running the vCenter service:

> Open CMD

> runas /user:Domain\vCenteruser cmd

> “c:\windows\system32\cmd.exe” “/c echo.|powershell.exe -nologo -noprofile -noninteractive C:\scripts\modify-hardware.ps1 {targetName}”

Chapter 2:  Make it more awesome by using VMs and Templates Folders

To be more flexible on the hardware configuration I wanted to have the following functionality:

Drag and Drop a VM in a specific folder. If the VM is powered off change the virtual hardware based on specific settings. The settings should be extracted from the folder name.

Foldername specification: modify_ressourcetype_value

for example I create 4 Folders:

  • modify_cpu_2
  • modify_cpu_4
  • modify_ram_4
  • modify_ram_8

If you drag a VM in the modify_ram_8 folder and power it off, the VM will be configured with 8GB memory.

So I needed to change my script in a way that it gathers the folder the of the VM that is transmitted with the script call:

Complete Script – modify-hardware.ps1:

$vm = get-vm $vmname
$vmFolderName = $vm.Folder.Name

and split the foldername at the ‘_’ chars:

$items = $vmFolderName.Split('_')
$ressourceType = $items[1]
$amount = $items[2]

Now I can select the change RAM or CPU logic with a ‘switch’ statement:


         $vCPu= $amount
         $spec=New-Object –Type VMware.Vim.VirtualMAchineConfigSpec –Property @{"NumCoresPerSocket" =          
         $vCPu;"numCPUs" = $vCPu}
         $vm.ExtensionData.ReconfigVM_Task($spec) | out-null
          $NmbrRam = $amount
          set-vm -VM $vm -memoryGb $NmbrRam -confirm:$false
         write-verbose 'Wrong folder name specification'

If the VM is not in a folder that matches the specification. Nothing will happen.

Transfer the following modify-hardware.ps1 script into C:\scripts\ and configure this time the alarms in the same way as described above – but now on the specific folder we have created for this solution.

The action call this time needs to be renamed since I created another script:

“c:\windows\system32\cmd.exe” “/c echo.|powershell.exe -nologo -noprofile -noninteractive C:\scripts\modify-hardware.ps1 {targetName}”


Complete Script – modify-hardware.ps1:


$vCenter = 'localhost'


# Include VMware-PowerCLI SnapIn
if(!(Get-PSSnapin | Where {$ -eq "vmware.vimautomation.core"}))
        Write-Host "Adding PowerCLI snap in"
        Add-PSSnapin VMware.VimAutomation.Core -ea 0| out-null
        throw "Could not load PowerCLI snapin"

Connect-VIServer $vCenter

#Gather Objects and Data
$vm = get-vm $vmname
$vmFolderName = $vm.Folder.Name

#Split the folder name to extract parameter - Foldername must be modify_ressource_x
#while ressource must be ram or cpu and X the number of CPU or amount of RAM in GB

$items = $vmFolderName.Split('_')
$ressourceType = $items[1]
$amount = $items[2]


         $vCPu= $amount
         $spec=New-Object –Type VMware.Vim.VirtualMAchineConfigSpec –Property @{"NumCoresPerSocket" =          
         $vCPu;"numCPUs" = $vCPu}
         $vm.ExtensionData.ReconfigVM_Task($spec) | out-null
          $NmbrRam = $amount
          set-vm -VM $vm -memoryGb $NmbrRam -confirm:$false
         write-verbose 'Wrong folder name specification'

Disconnect-ViServer $vCenter -confirm:$false

Maybe some of you will benefit from this little script-collection. Have fun with it 😉

© 2017 v(e)Xpertise

Theme by Anders NorenUp ↑