Sometimes I love being an instructor. 2 weeks of Optimize and Scale and finally I have more valid and realistic values from 2 participants of mine regarding performance vs. power usage.
First of all thanks to Thomas Bröcker and Alexander Ganser who were not just discussing this topic with me, but also did this experiment in their environment. First of all I am proud that it seems that I have motivated Alexander to blog about his findings in English :-). While he is focusing in his post on hosting server applications on Dell/Fujitsu hardware (-> please have a look at it), I will extent this information by using data from a HP-based VDI environment, where the impact on performance, power-usage and costs were much higher than I have expected it.
The trend of green IT not just had an effect on more effective consumer CPUs, it is also getting more and more a trend in modern datacenters. Hosts are powered down and on automatically (DPM – productive users of this feature please contact me 😉 ), CPU frequencies are dynamically changed or cores are disabled on demand (core parking). Since I always recommend NOT to use any power management features in a server environment, I am now following up this topic by giving suitable and realistic numbers from a production environment.
A few details about the setup and the scenario I am going to talk about. For my calculations later on I selected a common VDI size of around 1000 Windows 7 virtual machines.
Number of ESXi (NoESXi): 20 (vSphere 5.5 U2)
CPU-type: 2x Intel Xeon E5-2665 (8 Cores 2.4 – 3.1 Ghz – TPD – 115W)
vCPU per VM: 4 (pretty high for regular VDI – but multimedia / video capability was a requirement ( by avg. 80% of the VDI have active users)
vCPU / Core rate: 12.5
A few comments to the data. Intranet video-quality was miserable with the initial VM sizing (1 vCPU). We took a known and approved methodology of the 2 performance affecting dimensions:
- 1st dimension: Sizing of a virtual machine (is the virtual hardware enough for the proposed workload?) – verified by check if the end-user satisfied with the performance (embedded videos are working fluently).
- 2nd dimension: Sharing of resources (how much contention can we tolerate when multiple virtual hardware instances (vCPU) shares the physical hardware (Cores) – verified by defining thresholds for specific ESXi metrics.
As a baseline approach we defined that an intranet video needs to run fluently and ESXTOP metrics %RDY (per vCPU – to determine a general scheduling contention) and %CO-STOP (to determine a scheduling difficulty because of the 4vCPU SMP) were not reaching a specific threshold (3% Ready / 0% CO-STOP) during working hours. *
// * of course we would run into a resource-contention once each user on this ESXi host is going to watch a video within the virtual desktop resulting a much higher %rdy value.
So far so good. The following parameters describe dependant variables for the power costs of such an environment. Of course the used metrics can differ between countries (price for energy) and datacenter type (cooling).
Power usage per host: This data was taken in real-time via iLO HP DL 380G8 and describes the current power usage of the server. We tested the following energy-safer settings (Can be changed during runtime and has a direct effect):
HP Dynamic Power Savings
Static High Performance Mode
Climate factor: A metric defining how much power is effort to cool down the IT systems within a datacenter. This varies a lot for different datacenter and I am referring as a source to Centron who did an analysis in German with an outcome that the factor is between 1,3 and 1,5 which means that for 100 Watt used by a component we need 30/50 Watt for the cooling energy. The value I will take is randomly taken as 1,5 and can differ a lot in each datacenter.
Power Price: This price will differ the most in each country depending on the regulations. The price is normed as kilo Watt hour, means how much do you pay for 1000 Watt power usage in 1 hour. Small companies in Germany will have to pay around (25 Cent per kWH), while large enterprises with a huge power demand pay much less ( around 10 Cent per kWH)
Data was collected during a workday at around 11 AM – Friday. We assume that the data is taken during a regular office-hour workload.
Avg. power usage per host Power Savings (PU-PS) = 170 Watt = 0,170 kW
Avg. power usage per host High Performance (PU-HP) = 230 Watt = 0,230 kW
Price per kW in an hour (price) = 0,25 Euro
climate factor (cli-fa) = 1,5
so let’s take the data and do some calculations based on the VDI-server data mentioned above:
VDI – Power-costs per year = NoESXi * (price * PU-XX * 24 * 365) * cli-fa
Power-Costs per year Power Saving mode = 20 * (0,25Euro/W * 0,17W * 24 * 365) * 1,5 =11169Euro
Power-Costs per year High Performance mode = 20 * (0,25Euro/W * 0,23W * 24 * 365 ) * 1,5 = 15111 Euro
11169 Euro vs 15111 Euro a year (for the power of around 1000 VDIs)
The result of the power-saving mode is very high/aggressive in a VDI environment and is far less when the ESXi host is used for server virtualization (I refer back to the blog post of Alexander Ganser since we observed nearly the same numbers for our serers). Server virtualization has a higher constant CPU-load while VDI workload pattern is much more infrequent and gives a CPU more chances to quiesce-down a little bit. We observed around 10% power-savings in the server field.
So now let’s get a step ahead and compare the influence of the energy-saving mode for the performance.
HP Dynamic Power Savings: CPU Ready avg of 2% per vCPU (=400ms in Real-Time charts)
Static High Performance Mode: CPU Ready avg of 1% per vCPU (=200ms in Real-Time charts)
As you can see the power usage has a direct impact on the ready values of our virtual machines vCPU. At the end of the day the power-savings have a little financial impact in the VDI field, still I always recommend deactivating ALL power-saving methods since I always try to ensure the highest performance.
Especially in the VDI field with irregular sudden CPU spikes the wake-up / clock-increasement of the Core takes too much time and if you read through the VMware community on a regular basis you will see that a lot of strange symptoms are very often resolved by disabling energy-saving mechanisms.
Please be aware that those numbers may differ in your environment depending on your server, climate-factor, consolidation-rate, etc.