Tagvsphere

vSphere Replication 6.0 part 3: enhanced linked mode, backup and recovery scenarios

In the 3rd part of my series I am going to talk about the usage of vCenter enhanced linked mode and vSphere Replication 6.0 and how it can be used to protect the vSphere replication infrastructure itself.

In the newest version vSphere replication makes use of the LookupService provided by SSO on the new Plattform Service Controller (PSC). Having multiple vCenter instances sharing the same PSC, the so called vCenter enhanced linked mode, we are not just able to manage all vCenter within a single vSphere Web Client. We can also use vSphere replication to replicate and therefore protect VMs from one site to another and migrate simply a VM back after a restore of the protected site within an integrated view.

The following demonstrated a logical view on a recommended vCenter enhanced linked mode setup.

vsphere_replication_enhanced_linkedMode

This architecture has a lot of benefits. You have a vCenter in both sites which is required when you are forced to recover your VMs (in a supported way). As soon as we are having our vCenter in an enhanced linked mode we are able to select all joined vCenter as a target for our vSphere replication protection.

vSphere Replication linked mode target site

I see very often that the backup strategy of some organizations does not take it into consideration that you very often MUST have a vCenter to recover a VM with your backup solution ( if there is no ’emergency-direct-to-ESXi-recovery-feature’ included). For sure there are ways to register the replicated VM back to on the recovery site, but hey … (officially) we need to make sure that our recovery procedures are supported by the vendor.

In the current situation there is one thing I am uncomfortable with. The current recommended way by VMware tells us to create a dedicated PSC-Cluster with a Network Load Balancer in front of it. Since only NSX, F5 and NetScaler is supported this puts a little additional fee for licensing, operating and implementing of the solution. To be honest, I don’t believe to see such a setup pretty often in non-enterprise environments (On this level people are waiting for vVol replication support ;-)).

The only ‘easier’ suitable option would be to a solution like the following in place

vcenter_enhanced_linked_mode

Referring to VMware blog post on the new PSC architecture possibilities the only recommended option is the one mentioned in the beginning. I am currently evaluating and searching discussions about the pros/cons of the mentioned configuration. I will write about the findings in a different post.

Protect and Recover vSphere Replication Appliances and Server (Demo)

It’s worth to remember protecting the vSphere Replication Appliances as well, so that in case of an outage your are able to bring back the replication infrastructure pretty painless. I am going to show you how to recover from a vSphere Replication Appliance data-loss.

In my Lab environment I have two sites and I am going to protect the vSphere replication appliance from LE02 (managed by LE02VCE03) to LE01 (managed by vCSA01). The PSC of each vCenter has joined to the same SSO-Domain.

On my protected site I have 6 machines protected.

In the first scenario I have lost my vSphere replication appliance data on the protection site, so I recover it (vSRA) with help of vSphere replication

vSphere_replication_1

and once the original site has been restored, I failback to it via cross vCenter vMotion.

vSphere_Replication_2 vSphere_Replication_4

One thing you need to take care of is that the vSphere Replication Appliance and Server are registered against a vCenter. If you restore this machine in the way I described it above or with any other backup solution that restores the VM you need to make sure to re-register the VM with the vCenter, otherwise you see the following error within the vSphere replication menu.

vSphere_replication_6

 

So what to do? Register the recovered VM as a vSphere replication server

Screen Shot 2015-07-14 at 16.16.10

and verify that all of your vSphere replication jobs are still in place / running.

Screen Shot 2015-07-11 at 11.15.50

Voila… we recovered the vSphere Replication Appliance and can go on with our next test.

Recover  protected virtual machines with and failback with cross vCenter vMotion (Demo)

My protected site has been failed and the data has been lost. Lucky me I was able to recover all protected VMs on my recovery site. Depending on the network-characteristics you might be forced to change the IPs of your VMs (PowerCLI can be your friend 😉 )

Screen Shot 2015-07-14 at 17.11.58

Screen Shot 2015-07-13 at 21.32.39

After the rebuild of my primary site. I was able to to failback/migrate all VMs with cross vCenter vMotion to the original site.

Screen Shot 2015-07-13 at 22.53.59

 

Finalize the steps and voila. You have successfully failed back the VMs.

Make sure to configure a re-protection of the virtual machines.

Final words

The thing I am still missing is a smooth way of having a simple setup of a vCenter in a linked mode. Once I lost my protected site the behaviour of the Web Client was getting really slow and sluggish. Even after the site recovery I needed a reboot of my primary vCenter to get it fully functional again. At this time I am still not sure what’s the best way to establish a vCenter in enhanced linked mode in a ‘stretched’ environment. Any input / discussions / opinions are very appreciated.

 

 

vSphere Replication 6.0 – Part 2: vSphere Replication performance and SLAs

After a few weeks and several reasons (professional and non-professional) I finally restarted writing about my current vSphere replication version 6.0 series. This part 2 focus on some network design options and how they might impact SLAs defined for the Recovery Point Objective (RPO).

Since I summarized the architecture and components in part 1 I am now going to analyze the effects on the performance based on the network design decisions.

Option 1: “Keep as much traffic as possible within the ESXi”

vsphere_replication_design2

Result via ESXTOP:

ESXTOP - option 1

-> With the network configuration to minimize the routing effort I was able to nearly utilize to complete vmnic adapter (ca. 900 Mbit / s)

Option 2: “Having replication traffic routed between vSphere Replication Appliance and the VMkernel port”

vsphere_replication_design1

Result via ESXTOP:

ESXTOP Option 2

-> As expected the throughput dropped nearly by 50% to around 440 Mbit / s.

I know that those 2 results are depending on the specific characteristic of my homelab environment. The reason I have written that down was to create an awareness that the network decision has an impact on the replication performance and therefore maybe on the fact if you can meet an SLA or NOT.

Let’s make a short calculation within a small scenario.

RPO – Recovery Point Objective: How much data can get lost during a failure. This value is configured during the setup of a replication job and defines within which time-interval the concrete replication is started

Amount machines 15
VM Size 100 GB = 102400 MB
Max average daily disk-change rate 5%
Max Replication transfer rate option 1 901 Mbit / s = 112,625 MB / s
Max Replication transfer rate option 1 440 Mbit / s = 55 MB / s

The initial replication can be calculated with the following formula:

Screen Shot 2015-06-30 at 22.23.10

and will take the following amount of time in our scenario:

Option 1:

Screen Shot 2015-06-30 at 22.23.37

Option 2:

Screen Shot 2015-06-30 at 22.24.01

To meet a SLA we are in most cases more interested about how long the ongoing replication will take place.

Screen Shot 2015-06-30 at 22.24.36

Option 1:

Screen Shot 2015-06-30 at 22.25.04

Option 2:

Screen Shot 2015-06-30 at 22.25.28

So if you have an RPO defined with 15 minutes there is a risk not to meet the SLA within option 2.

Maybe I repeat myself, but that this is just an example of a calculation (and depending on the use case the limiting factor will be the link between the protected and replicated site). Nevertheless you need to get aware of the following relevant metrics when you design replication:

  • replication-throughput
  • change-rate
  • number and size of your VMs.

In production we don’t want to receive an RPO violation alarm (technically or by the service manager ;-). If you can’t meet the requirements in a theoretical calculation, you will not be able to meet them during daily operations.

Which tool can we use to get the above metrics? Replication-throughput via ESXTOP (network-view: n), number and size of your VMs via PowerCLI (If haven’t done stuff with PowerCLI so far, this is a great starting task for it ;-).

For gathering data about the data change-rate within a VM I refer to a PowerCLI script Scott Herold (his name was in the comments) has created a few years ago that used the change-block-tracking mechanism. I found the script via google and you can download it here (Download: CBT_Tracker – Howto). No need to say that you should understand it (and it’s influence on your system – it uses CBT and snapshots – see the comments within the script) and test the script first before you use it for your analysis.

Compression – The X-Files continues

As I have already said VMware has included a new compression mechanism in 6.0 to speed up the initial copy job. During my first tests (Setup 1 with compression enabled) I had a higher CPU utilization (that’s expected on my vSphere Replication Appliance), but also a lower Throughput of the replication data. I am totally unaware what went wrong here. I will try to figure out more about this effect and keep you informed ;-). If you have any ideas/hints what went wrong in my setup. Please comment or contact via Twitter (@lenzker).

#PowerCLI: Distributed switch portgroup usage. Good vs. bad vs. Max Power approach

From now on there are three ways of doing things: the right way, the wrong way, and the Max Power way.

Isn’t that the wrong way?

Yes! But faster!

This quote from Homer Simpsons came directly into my mind when I was doing some PowerCLI scripting during the week.

maxpower

I started with the wrong / Max Power way and suddenly came to a much more smarter solution – the right way.

The task was to gather the usage on the virtual distributed switch portgroups within a world-wide environment with around 30 vCenter. (Final script on the bottom of this blog)

Once again I realized there are many roads to Rome and even with PowerCLI you can either go there by crawling or using a plane.

My first approach was to get each portgroup and have a look through each port if it has a connection to the virtual network adapter of a VM (each VM only has 1 Network adapter).

$ports = Get-VDSwitch | Get-VDPort -VDPortgroup $pg
$portsconnected = $ports | Where {$_.ConnectedEntity.Name -eq 'Network adapter 1'}

That approach was incredibly slow (> 12 hours) since it took a while to get all port objects of the distributed switch (more than 5000 per vDS).

Thanks to Brian Graf‘s great blog article we know how to access vSphere objects extension data in a much more elegant way.

$networks = Get-View -viewtype Network
Foreach ($network in $networks){
    $pgname = $network.Name
    $connectedports = ($network.VM).count 
}

Doing it that way it took 15 minutes instead of 12++ hours.

It really makes a huge difference if you code something right or wrong. That’s counts for Software, SQL-queries and also for all kind of scripts we use and built in our daily IT-infrastructure world.

The final script gives you an csv output file with the values

Datacenter, PortgroupName, VLANID, NumberOfConnectedPorts

Make sure to use Powershell 3.0++ so you can use the -append option in the export-csv cmdlet.

Enjoy.

Good one

$results = @()

$cluster = get-cluster | Sort-Object -Property Name
$dcName = (Get-Datacenter).Name
$networks = Get-View -viewtype Network

Foreach ($network in $networks){
    $pgname = $network.Name
    $pg = Get-VDPortgroup -Name $pgname
    $vlanid = $pg.vlanConfiguration.VlanID
    $connectedports = ($network.VM).count

    $details = @{
        PortgroupName = $pgname
        VLANID = $vlanId
        NumberOfConnectedPorts = $connectedports
        Datacenter = $dcName
    }

    $results += New-Object PSObject -Property $details
}

$results | export-csv -Append -Path c:\temp\newvDSnetworkConnected.csv -NoTypeInformation

 Bad one

$results = @()

$dcName = (Get-Datacenter).Name
$pgs = Get-VDSwitch | Get-VDPortgroup | Where {$_.IsUplink -ne 'True'}

foreach ($pg in $pgs){

    $ports = Get-VDSwitch | Get-VDPort -VDPortgroup $pg
    $portsconnected = $ports | Where {$_.ConnectedEntity.Name -eq 'Network adapter 1'}

    $pgname = $pg.name
    $vlanId = $pg.VlanConfiguration.VlanId

    $connectedports = $portsconnected.count
    $details = @{
        PortgroupName = $pgname
        VLANID = $vlanId
        NumberOfConnectedPorts = $connectedports
        Datacenter = $dcName
    }
    $results += New-Object PSObject -Property $details 
} 

$results | export-csv -Append -Path c:\temp\vDSnetworkConnected.csv -NoTypeInformation

[#Troubleshooting] the operation is not allowed in the current state after replicated storage failover

I received a call with a typical error message within the vSphere world: When powering on VMs we received a warning with the following message

‘the operation is not allowed in the current state’

Scenario summary: vCenter/ESXi 5.5U3

  1. Storage LUNs were replicated to a second device (async)
  2. Failover to second storage device was triggered
  3. Datastores were made visible to the ESXi and the VMFS was resignatured
  4. VMs were registered to the ESXi hosts

Symptoms

When the recovered VMs are powered on, the mentioned error occurred.

Screen Shot 2015-03-27 at 17.22.15

A reboot of the ESXi, vCenter and its services and even an ESXi reconnect did not solved the problem, so I started a more deterministic root cause analysis.

Root cause:

The recovered virtual machines CD-Drive were referring to an ISO-file on a non-existent NFS datastore that hasn’t been recovered. Unfortunately the error message itself was not pointing to the root cause.

Root cause analysis:

checking the vCenter vpxd.log didn’t gave us much information about the problem:


vim.VirtualMachine.powerOn: vim.fault.InvalidHostConnectionState:
mem> –> Result:
mem> –> (vim.fault.InvalidHostConnectionState) {
mem> –> dynamicType = <unset>,
mem> –> faultCause = (vmodl.MethodFault) null,
mem> –> host = ”,
mem> –> msg = “”,
mem> –> }
mem> –> Args:
hmm, yeah…not very much useful information. So next step -> checking the hostd.log within the ESXi host.
2015-03-27T12:03:36.340Z [69C40B70 info ‘Solo.Vmomi’ opID=hostd-6dc9 user=root] Throw vmodl.fault.RequestCanceled
2015-03-27T12:03:36.340Z [69C40B70 info ‘Solo.Vmomi’ opID=hostd-6dc9 user=root] Result:
–> (vmodl.fault.RequestCanceled) {
–> dynamicType = <unset>,
–> faultCause = (vmodl.MethodFault) null,
–> msg = “”,
–> }
2015-03-27T12:03:36.341Z [FFBC6B70 error ‘SoapAdapter.HTTPService.HttpConnection’] Failed to read header on stream <io_obj p:0x6ab82a48, h:66, <TCP ‘0.0.0.0:0’>, <TCP ‘0.0.0.0:0’>>: N7Vmacore15SystemExceptionE(Connection reset by peer)
2015-03-27T12:03:40.024Z [FFBC6B70 info ‘Libs’] FILE: FileVMKGetMaxFileSize: Could not get max file size for path: /vmfs/volumes/XXXXXX, error: Inappropriate ioctl for device
2015-03-27T12:03:40.024Z [FFBC6B70 info ‘Libs’] FILE: File_GetVMFSAttributes: Could not get volume attributes (ret = -1): Function not implemented
2015-03-27T12:03:40.024Z [FFBC6B70 info ‘Libs’] FILE: FileVMKGetMaxOrSupportsFileSize: File_GetVMFSAttributes Failed

so it seems that we had some kind of IO problems. Checking /vmfs/volumes/XXXX we realized that we were not able to access the device.
The volume itself was a NFS share mounted as a datastore and as you probably know are also mounted in the /vmfs/ folder of the ESXi.

Even though the VMs are running on block-based storage (iSCSI) I found out that there was still a dependancy between the VM and the not-reachable NFS device -> The VMs had an ISO-file from a NFS datastore mounted. During the failover of the storage the NFS datastore hasn’t been restored and the VM was trying to access the NFS share to include the ISO file.

Summary:

Those things happen all the time, so take care to unmount devices when you don’t need them anymore (Use RVTools/Scripts and establish an overall operating process -> check my ops-manual framework 😉 ). Those little things can be a real show-stopper in any kind of automatic recovery procedures (scripted, vSphere Site Recovery Manager, etc.)

vSphere and CPU power management: performance vs. costs in the VDI field

Sometimes I love being an instructor. 2 weeks of Optimize and Scale and finally I have more valid and realistic values from 2 participants of mine regarding  performance vs. power usage.

First of all thanks to Thomas Bröcker and Alexander Ganser who were not just discussing this topic with me, but also did this experiment in their environment. First of all I am proud that it seems that I have motivated Alexander to blog about his findings in English :-). While he is focusing in his post on hosting server applications on Dell/Fujitsu hardware (-> please have a look at it), I will extent this information by using data from a HP-based VDI environment, where the impact on performance, power-usage and costs were much higher than I have expected it.

The trend of green IT not just had an effect on more effective consumer CPUs, it is also getting more and more a trend in modern datacenters. Hosts are powered down and on automatically (DPM – productive users of this feature please contact me 😉 ), CPU frequencies are dynamically changed or cores are disabled on demand (core parking). Since I always recommend NOT to use any power management features in a server environment, I am now following up this topic by giving suitable and realistic numbers from a production environment.

A few details about the setup and the scenario I am going to talk about. For my calculations later on I selected a common VDI size of around 1000 Windows 7 virtual machines.

VDI: 1000

Number of ESXi (NoESXi): 20 (vSphere 5.5 U2)

CPU-type: 2x Intel Xeon E5-2665 (8 Cores 2.4 – 3.1 Ghz – TPD – 115W)

vCPU per VM: 4 (pretty high for regular VDI – but multimedia / video capability was a requirement ( by avg. 80% of the VDI have active users)

vCPU / Core rate: 12.5

A few comments to the data. Intranet video-quality was miserable with the initial VM sizing (1 vCPU). We took a known and approved methodology of the 2 performance affecting dimensions:

  • 1st dimension: Sizing of a virtual machine (is the virtual hardware enough for the proposed workload?) – verified by check if the end-user satisfied with the performance (embedded videos are working fluently).
  • 2nd dimension: Sharing of resources (how much contention can we tolerate when multiple virtual hardware instances (vCPU) shares the physical hardware (Cores) – verified by defining thresholds for specific ESXi metrics.

As a baseline approach we defined that an intranet video needs to run fluently and ESXTOP metrics %RDY (per vCPU – to determine a general scheduling contention) and %CO-STOP (to determine a scheduling difficulty because of the 4vCPU SMP) were not reaching a specific threshold (3% Ready / 0% CO-STOP) during working hours. *

// * of course we would run into a resource-contention once each user on this ESXi host is going to watch a video within the virtual desktop resulting a much higher %rdy value.

So far so good. The following parameters describe dependant variables for the power costs of such an environment. Of course the used metrics can differ between countries (price for energy) and datacenter type (cooling).

Power usage per host: This data was taken in real-time via iLO HP DL 380G8 and describes the current power usage of the server.  We tested the following energy-safer settings (Can be changed during runtime and has a direct effect):

  • HP Dynamic Power Savings

  • Static High Performance Mode

Climate factor: A metric defining how much power is effort to cool down the IT systems within a datacenter. This varies a lot for different datacenter and I am referring as a source to Centron who did an analysis in German with an outcome that the factor is between 1,3 and 1,5 which means that for 100 Watt used by a component we need 30/50 Watt for the cooling energy. The value I will take is randomly taken as 1,5 and can differ a lot in each datacenter.

Power Price: This price will differ the most in each country depending on the regulations. The price is normed as kilo Watt hour, means how much do you pay for 1000 Watt power usage in 1 hour. Small companies in Germany will have to pay around (25 Cent per kWH), while large enterprises with a huge power demand pay much less ( around 10 Cent per kWH)

Data was collected during a workday at around 11 AM – Friday. We assume that the data is taken during a regular office-hour workload.

Avg. power usage per host Power Savings (PU-PS) = 170 Watt = 0,170 kW

Avg. power usage per host High Performance (PU-HP) = 230 Watt = 0,230 kW

Price per kW in an hour (price) = 0,25 Euro

climate factor (cli-fa) = 1,5

so let’s take the data and do some calculations based on the VDI-server data mentioned above:

VDI – Power-costs per year = NoESXi * (price * PU-XX * 24 * 365) * cli-fa

Power-Costs per year Power Saving mode = 20 * (0,25Euro/W * 0,17W *  24 * 365) * 1,5 =11169Euro

Power-Costs per year High Performance mode = 20 * (0,25Euro/W * 0,23W * 24 * 365 ) * 1,5 = 15111 Euro

11169 Euro vs 15111 Euro a year (for the power of around 1000 VDIs)

The result of the power-saving mode is very high/aggressive in a VDI environment and is far less when the ESXi host is used for server virtualization (I refer back to the blog post of Alexander Ganser since we observed nearly the same numbers for our serers). Server virtualization has a higher constant CPU-load while VDI workload pattern is much more infrequent and gives a CPU more chances to quiesce-down a little bit. We observed around 10% power-savings in the server field.

So now let’s get a step ahead and compare the influence of the energy-saving mode for the performance.

  • HP Dynamic Power Savings: CPU Ready avg of 2% per vCPU (=400ms in Real-Time charts)

  • Static High Performance Mode: CPU Ready avg of 1% per vCPU (=200ms in Real-Time charts)

PowerSaving-ESX

PowerSaving-ILO

As you can see the power usage has a direct impact on the ready values of our virtual machines vCPU. At the end of the day the power-savings have a little financial impact in the VDI field, still I always recommend deactivating ALL power-saving methods since I always try to ensure the highest performance.

Especially in the VDI field with irregular sudden CPU spikes the wake-up / clock-increasement of the Core takes too much time and if you read through the VMware community on a regular basis you will see that a lot of strange symptoms are very often resolved by disabling energy-saving mechanisms.

Please be aware that those numbers may differ in your environment depending on your server, climate-factor, consolidation-rate, etc.

IMO: Is SMP fault tolerance even useful? My view on it!

Maish Saidel-Keesing has written a post about the fault-tolerance topic with multiple vCPUs a few weeks ago. He has valid points in his argumentation, but anyway I want to give you a little bit of my view on this topic (IMO).

With fault-tolerance two VMs are running nearly symmetrical on 2 different ESXi hosts with one (primary) processing IO and the other one dropping it (secondary). With the release of vSphere 6.0 VMware will support this feature with a VM of up to 4vCPU and 64 Gbyte memory. [More Details here]

I try to summarize the outcome Maish’s argumentation:

FT is not the big deal feature since it only protects against a hardware failure of the ESXi host without any interruptions in the service of the protected VM. It does NOT detected or deal with a failure at Operating Systems and Application level.

So what Maish think we really need are cluster-mechansims on application level and if legacy applications don’t.

I would in general not disagree with this opinion. In an ideal world all applications would be stateless, scaleable and protectable with a load-balancer in front of them. But we will need 1X or more years until all applications are built in such a new ‘modern’ way. We will not get rid of the legacy applications in the short-term.

Within the last 4 years of beeing an instructor I received one questions nearly every time when delivering a vSphere class:

‘Can we finally protect our SMP-VMs now with Fault Tolerance? No?! Awww :(‘

So I would not say there is a not a need out there for this feature. Being involved in some bidding last year we had very often the requirement to deliver a system for automation-solutions within large building-complexes (airports, factories, etc.).

Software being used in such domains are sometimes legacy application par excelente (ironic) programmed with a paradigm long before agile/restful/virtualization played a role in the tech-world.  Sometimes you can licence a cluster feature (and pay 10 time as much as for a 1-node licence) – sometimes you can’t cluster it and need other ideas or workaround to increase the availability.

Some biddings were not won because of opponents who where able to deliver solutions that can (on the paper) tolerate an hardware outage without any service-/session impact.

For me with SMP-FT typical design-considerations come into play:

  • How does the cluster work? Does it work on application/OS-level or does it only protect for a general outage?
  • What were failure/failover reasons in the past? (e.g. vCenter – in most cases I had a failure here it was because of Database problem [40%], Active Directory / SSO problem [10%], a hardware failure [45%] or rest [5%])  -> A feature like FT would protected against a huge amount of failure experienced in the past. Same considerations can be taken into account for all kind of applications (e.g. virtual load-balancer, Horizon View Connection Server etc.)
  • How much would a suitable solutions cost to make, buy or update?

Sure we need to get rid of legacy applications, but to be honest… this will be a very long road (the business decides and pays it) and once we have gotten to the point where the legacy applications are gone – the next generation of legacy applications is in place that need to be transformed (Docker?! 😉 ).

We should see FT as it is. A new tool within our VMware toolkit to fit specific requirements and protect VMs (legacy/new ones) on a new level with pros- and cons (as always). IMO every tool / feature that gives us more opportunities to protect the IT is very welcome.

Nested ESXi with OpenStack

For all of you who want to run VMware’s ESXi 5.x on an OpenStack cloud running vSphere as the hypervisor, I have a tiny little tip that might save you some researching: The difficulty I faced was “How do I enable nesting (vHV) for an OpenStack deployed instance?”. I was almost going to write a script to add

featMask.vm.hv.capable="Min:1"
vhv.enable="True"

and run it after the “nova boot” command, and then I found what I am going to show you now.

Remember that uploading an image into Glance you can specify key/value pairs called properties? Well, you are probably already aware of this:

root@controller:~# glance image-show 9eb827d3-7657-4bd5-a6fa-61de7d12f649
+-------------------------------+--------------------------------------+
| Property                      | Value                                |
+-------------------------------+--------------------------------------+
| Property 'vmware_adaptertype' | ide                                  |
| Property 'vmware_disktype'    | sparse                               |
| Property 'vmware_ostype'      | windows7Server64Guest                |
| checksum                      | ced321a1d2aadea42abfa8a7b944a0ef     |
| container_format              | bare                                 |
| created_at                    | 2014-01-15T22:35:14                  |
| deleted                       | False                                |
| disk_format                   | vmdk                                 |
| id                            | 9eb827d3-7657-4bd5-a6fa-61de7d12f649 |
| is_public                     | True                                 |
| min_disk                      | 0                                    |
| min_ram                       | 0                                    |
| name                          | Windows 2012 R2 Std                  |
| protected                     | False                                |
| size                          | 10493231104                          |
| status                        | active                               |
| updated_at                    | 2014-01-15T22:37:42                  |
+-------------------------------+--------------------------------------+
root@controller:~#

At this point, take a look at the vmware_ostype property, which is set to “windows7Server64Guest”. This value is passed to the vSphere API when deploying an image through ESXi’s API (VMwareESXDriver) or the vCenter API (VMwareVCDriver). Looking at the vSphere API/SDK API Reference you can find valid values and since vSphere 5.0 we find “vmkernel4guest” and “vmkernel5guest” in the list representing ESXi 4.x and 5.x respectively. According to my testing, this works with Nova’s VMwareESXDriver as well as VMwareVCDriver.

This is how you change the property in case you set it differently:

# glance image-update --property "vmware_ostype=vmkernel5Guest" IMAGE

And to complete the pictures, this is the code in Nova that implements this functionality:

  93 def get_vm_create_spec(client_factory, instance, name, data_store_name,
  94                        vif_infos, os_type="otherGuest"):
  95     """Builds the VM Create spec."""
  96     config_spec = client_factory.create('ns0:VirtualMachineConfigSpec')
  97     config_spec.name = name
  98     config_spec.guestId = os_type
  99     # The name is the unique identifier for the VM. This will either be the
 100     # instance UUID or the instance UUID with suffix '-rescue' for VM's that
 101     # are in rescue mode
 102     config_spec.instanceUuid = name
 103 
 104     # Allow nested ESX instances to host 64 bit VMs.
 105     if os_type == "vmkernel5Guest":
 106         config_spec.nestedHVEnabled = "True"

You can see that vHV is only enabled if the os_type is set to vmkernel5Guest. I would assume that like this you cannot nest Hyper-V or KVM but I haven’t validated.

Pretty good already. But what I am really looking for is running ESXi on top of KVM as I need nested ESXi combined with Neutron to create properly isolated tenant networks. The most current progress with this can probably be found in the VMware Community.

Video Recommendation: Nicira NVP vs VMware NSX

Please take a look at the following questions:

  • What is NSX?
  • What the heck is the difference to Nicira NVP or are they the same?
  • What are the technologies behind NSX and how does it work?

Is there any you cannot answer, yet? If so, I would like to direct your attention to two just great videos on Youtube which will clarify:

OpenStack Networking – Theory Session, Part 1

OpenStack Networking – Theory Session, Part 2

Watching this will be the best 1h 45min you have invested for a while!

Have fun!

vCloud Director: Low Performance Powering On A vApp

I am working in a project including vCloud Director as well as most other parts of VMware’s cloud stack for a while now. Until a couple of days ago, everything was running fine regarding the deployment process of vApps from vCloud Director UI or through vCenter Orchestrator. Now we noticed that starting and stopping vApps takes way too long: Powering on a single VM vApp directly connected to an external network takes three steps in vCenter:

  1. Reconfigure virtual machine
  2. Reconfigure virtual machine (again)
  3. Power On virtual machine

The first step of reconfigure virtual machine showed up in vCenter right after we triggered the vApp power on in vCloud Director. From then it took around 5min to reach step two. Once step 2 was completed, the stack paused for another 10min before the VM was actually powered on. This even seemed to have implications on vCenter Orchestrator including timeouts and failed workflows.

We spent an entire day on trying to track the problem down and came up with the opinion that it had to be inside vCloud Director. But before we went into log files, message queues etc, we decided to simply reboot the entire stack: BINGO! After the reboot the problem vanished.

Shutdown Process:

  1. vCO
  2. vCD
  3. vCD NFS
  4. VSM
  5. vCenter
  6. SSO
  7. DB

Then boot the stack in reverse order and watch vCloud Director powering on VMs withing seconds 😉

Using OVFtool via Powercli with a session ticket – lessons learned

Powercli is a really great tool for automation. Nevertheless from time to time we need other tools as well to fullfil our needs. If you want to automize the distribution of templates in your environment the OVFtool is a really nice way to achieve this. Since I wanted to deploy the template to multiple Clusters in an environment, the following would do the trick.

Having 2 arrays which stores all the vCenter and all the Clusters in the environment the following would do the trick.

function DistributeTemplates 
(
	[Parameter(Mandatory=$True)]
	[String]$templateLocation
)
 
$ovftool = "C:\Program Files\VMware\OVFtool\ovftool.exe"
 
foreach ($vCenter in $vCenterlist) {
    Connect-VIServer $vCenter
    foreach ($cluster in $clusterList) {
 
        $arglist = ' --name=TemplateName --network=NetworkName -ds=DatastoreName $($templateLocation) vi://vCenterUser:Password@$($vCenter)/DCNAME/host/$($cluster)'  		 		
        $process = Start-Process $ovftool -Argumentlist ($arglist) -wait
    }
}

Eventhough it was working I was not happy about how the authentication mechanism was used (Password in cleartext…nooooo way).

Luckily I found a post at geekafterfive.com who explained how we can use a ticketing automatism in OVFTool.

$Session = Get-View -Id Sessionmanager
$ticket = $session.AcquireCloneTicket()
Unfortunatley struggeld with two things:

1. Make sure you are only connected to one vCenter, otherwise

 $ticket = $session.AcquireCloneTicket()

will throw an error Method invocation failed because [System.Object[]] doesn’t contain a method named ‘AcquireCloneTicket’.”

powercli_error2. I could only upload the templates to the first cluster. The second one was always failing. It seems that my ticket was not valid anymore. Luckily a closer look to the vSphere SDK Programming guide told me that “A client application executing on behalf of a remoteuser can invoke the AcquireCloneTicket operation of SessionManager to obtain a onetime user name and password for logging on without entering a subsequent password” …. ahhh…one time password…I thought the ticket will be valid for multiple operations once I’m connected to a vCenter. But since my thoughts don’t count on this topic (*yeahyeah…what a rough world) I needed to create a new ticket before every OVFTool operation.

So the following script was completly satisfying my (template-automation) needs.

function DistributeTemplates 
(
	[Parameter(Mandatory=$True)]
	[String]$templateLocation
)
 
$ovftool = "C:\Program Files\VMware\OVFtool\ovftool.exe"
 
foreach ($vCenter in $vCenterlist) {
    Connect-VIServer $vCenter
    foreach ($cluster in $clusterList) {
        $Session = Get-View -Id Sessionmanager
        $Ticket = $Session.AcquireCloneTicket()
        $arglist = ' --I:targetSessionTicket=$($Ticket) --name=TemplateName --network=NetworkName -ds=DatastoreName $($templateLocation) vi://$($vCenter)/DCNAME/host/$($cluster)'  		 		
        $process = Start-Process $ovftool -Argumentlist ($arglist) -wait
    }
}

Now a ticket of the vCenter authentication is generated after the usage and I haven’t had to deal with storing any credentials while deploying a template to the whole wide world :)…yeha…

© 2017 v(e)Xpertise

Theme by Anders NorenUp ↑