AuthorMathias Ewald

Loading VMware Appliances into OpenStack

Today, I want to talk about the challenges of loading VMware appliances into OpenStack Glance and give you a recipe of how to do it.

I migrated my lab to OpenStack but need to be able to test the latest VMware products in order to keep myself up to speed. As VMware provides more and more of its software as virtual appliances in OVF or OVA format it makes sense to have them in Glance for provisioning on OpenStack.

The following procedure is illustrated at the example of vCAC Appliance 6.0:

 

Challenge 1: Format Conversion

If you get to download the appliance as OVF you are already one step ahead. OVF is simply an XML-based configuration and does not include any information required to run the VM on OpenStack.

OVAs on the other hand need to be unpacked first. Luckily, OVA is nothing but TAR:

$ file VMware-vCAC-Appliance-6.0.0.0-1445145_OVF10.ova
VMware-vCAC-Appliance-6.0.0.0-1445145_OVF10.ova: POSIX tar archive (GNU)
$

So we continue extracting the archive:

$ tar xvf ../VMware-vCAC-Appliance-6.0.0.0-1445145_OVF10.ova
VMware-vCAC-Appliance-6.0.0.0-1445145_OVF10.ovf
VMware-vCAC-Appliance-6.0.0.0-1445145_OVF10.mf
VMware-vCAC-Appliance-6.0.0.0-1445145_OVF10.cert
VMware-vCAC-Appliance-6.0.0.0-1445145-system.vmdk
VMware-vCAC-Appliance-6.0.0.0-1445145-data.vmdk

The .ovf, .mf and .cert files can be deleted right away. We will not need those anymore. After that the VMDK files must be converted to QCOW2 or RAW:

$ qemu-img convert -O qcow2 VMware-vCAC-Appliance-6.0.0.0-1445145-system.vmdk VMware-vCAC-Appliance-6.0.0.0-1445145-system.img
$ qemu-img convert -O qcow2 VMware-vCAC-Appliance-6.0.0.0-1445145-data.vmdk VMware-vCAC-Appliance-6.0.0.0-1445145-data.img

 

Challenge 2: Multi-Disk Virtual Machines

Unfortunately, OpenStack does not support images existing of multiple disks. Bad luck that VMware has the habit of distributing their appliances with a system disk and a separate data disk (*system.vmdk and *-data.vmdk). To still load this appliance into Glance we need to merge the disks into a single one:

First, we use guestfish to get some information on the filesystems inside the disk images:

$ guestfish

Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.

Type: 'help' for help on commands
'man' to read the manual
'quit' to quit the shell

> add VMware-vCAC-Appliance-6.0.0.0-1445145-system.img
> add VMware-vCAC-Appliance-6.0.0.0-1445145-data.img
> run
>
> list-filesystems
/dev/sda1: ext3
/dev/sda2: swap
/dev/sdb1: ext3
/dev/sdb2: ext3
>

It is safe to assume that the EXT3 on /dev/sda1 is the root file system, so we mount it and have a look into /etc/fstab to see where the other filesystems should be mounted:

> mount /dev/sda1 /
> cat /etc/fstab
/dev/sda2 swap swap defaults 0 0
/dev/sda1 / ext3 defaults 1 1
proc /proc proc defaults 0 0
sysfs /sys sysfs noauto 0 0
debugfs /sys/kernel/debug debugfs noauto 0 0
devpts /dev/pts devpts mode=0620,gid=5 0 0
/dev/sdb1 /storage/log ext3 rw,nosuid,nodev,exec,auto,nouser,async 0 1
/dev/sdb2 /storage/db ext3 rw,nosuid,nodev,exec,auto,nouser,async 0 1

>

Next, as we want to get rid of sdb all together, we remove the entries in /etc/fstab

> vi /etc/fstab

and mount /dev/sdb1 to /mnt. This way we can copy the contents over to /storage/log:

> mount /dev/sdb1 /mnt
> cp_a /mnt/core /storage/log/
> cp_a /mnt/vmware /storage/log/

Of course, the same has to happen for /dev/sdb2:

> umount /mnt
> mount /dev/sdb2 /mnt
> cp_a /mnt/pgdata /storage/db/

And we are done! Please not it is important to use cp_a instead of cp_r as it will preserve permissions and ownership.

> exit

 

Challenge 3: Disk Space

Well, so far so good! But the image that originally served as the system disk now has to hold all the data as well. Therefore, we need more space! So we have to resize the disk image, the partition(s) and filesystem(s). Here is the easiest way I’ve found:

$ qemu-img create -f qcow2 newdisk.qcow2 50G
$ virt-resize --expand /dev/sda1 VMware-vCAC-Appliance-6.0.0.0-1445145-system.img newdisk.qcow2
Examining VMware-vCAC-Appliance-6.0.0.0-1445145-system.img ...
**********

Summary of changes:

/dev/sda1: This partition will be resized from 12.0G to 47.0G. The
filesystem ext3 on /dev/sda1 will be expanded using the 'resize2fs'
method.

/dev/sda2: This partition will be left alone.

**********
Setting up initial partition table on newdisk.qcow2 ...
Copying /dev/sda1 ...
Copying /dev/sda2 ...
Expanding /dev/sda1 using the 'resize2fs' method ...

Resize operation completed with no errors. Before deleting the old
disk, carefully check that the resized disk boots and works correctly.

$

What’s happening here? First, we create a new empty disk of the desired size with “qemu-img create”. After that, virt-resize copies data from the original file to the new one resizing partitions on the fly. Next, the filesystems are resized using resize2fs.

The image can now be uploaded into Glance. Please make sure to add properties that set the SCSI controller chipset properly. For example, IDE is going to work. The reason for this is the name of disks: using VirtIO we will get /dev/vda1 and would have to adjust those name e.g. in /etc/fstab, too. I have only had luck with ide so far:

glance image-update --property hw_disk_bus=ide 724ad050-a636-4c98-8ae5-9ff58973c84c

Have fun!

Deleting vCloud Director Temporary Data

Recently, I learned a bit about vCloud Director internal database and table structure which I am going to share with you here.

vCD holds two types of tables worth pointing out: the ORTZ and INV tables. The latter store information about vCenter Server inventory (INV) and are kept up-to-date as the inventory changes. This could be due to changes vCD makes itself or those carried out be an administrator. When the vCD service starts up it connects to vCenter to read in its inventory and update its INV tables. The QRTZ tables are used for heart beating and synchronization between cells (at least from what I understood, no guarantees).

Why am I telling you this? Both types of tables can be cleared without losing any vital data. You can make use of this knowledge whenever you feel your vCloud DB is out of sync with the vCenter inventory. This happens for example in the case where you have to restore your vCenter Database from a Backup without restoring vCloud’s database.

WARNING: Following this procedure is completely unsupported by VMware GSS. Follow the instructions below on your own risk or when you have no support for your installation anyway πŸ˜‰

  1. Shutdown both vCloud Director cells
  2. Clear the QRTZ and INV tables using “delete from <table name>”
  3. Start one of the cells and watch it start (/opt/vmware/vcloud-director/log/cell.log)
  4. Start the other cells

Here some SQL statements that will automate step 2 for you:

delete from QRTZ_SCHEDULER_STATE;
delete from QRTZ_FIRED_TRIGGERS;
delete from QRTZ_PAUSED_TRIGGER_GRPS;
delete from QRTZ_CALENDARS;
delete from QRTZ_TRIGGER_LISTENERS;
delete from QRTZ_BLOB_TRIGGERS;
delete from QRTZ_CRON_TRIGGERS;
delete from QRTZ_SIMPLE_TRIGGERS;
delete from QRTZ_TRIGGERS;
delete from QRTZ_JOB_LISTENERS;
delete from QRTZ_JOB_DETAILS;

delete from compute_resource_inv;
delete from custom_field_manager_inv;
delete from cluster_compute_resource_inv;
delete from datacenter_inv;
delete from datacenter_network_inv;
delete from datastore_inv;
delete from datastore_profile_inv;
delete from dv_portgroup_inv;
delete from dv_switch_inv;
delete from folder_inv;
delete from managed_server_inv;
delete from managed_server_datastore_inv;
delete from managed_server_network_inv;
delete from network_inv;
delete from resource_pool_inv;
delete from storage_pod_inv;
delete from storage_profile_inv;
delete from task_inv;
delete from vm_inv;
delete from property_map;

 

Looking Back at a Day of vShield Manager Troubleshooting

Dear diarrhea diary,

this entire day was f***** up by a VMware product called vShield Manager …

Like this or similar should today’s entry in my non-existing diary look like. It was one of these typical “piece of cake” tasks that turn into nightmares πŸ˜€ Literally the task read “Configure VXLAN for the ***** cluster” – easy hmm!?

1. Ok, let’s go: The physical switch configuration turned out easy as it was already done for me πŸ™‚ CHECK.

2. So, naive me, I connected to vShield Manager UI, went Datacenter -> Network Virtualization -> Prepare and added the cluster, gave it the name of the also already existing Distributed Switch and the VLAN ID and let it run. FAIL: “not ready”.

vsm

VSM itself doesn’t give a lot of details but I knew that probably the deployment and installation of the VXLAN VIB package failed. Looking at esxupdate.log I could see a “temporary failure in DNS lookup” (exact wording was probably different).Β  Looking at the ESXi hosts’ DNS configuration: empty. Cool! Fix -> CHECK. Later I found out that I myself blogged about this a while ago πŸ˜€

3. Now lets try again, but first we have to “unprepare” the cluster: removed check in VSM: Error. Of course. VSM didn’t create the Port Group nor the VMkernel ports and now tries to remove them … computer logic πŸ˜€ At this point, simply hit “Refresh” and it will be gone. Now we can try the preparation process once more: Error:

domain-c3943 already has been configured with a mapping.

Grrrr … luckily, found this: http://blog.jgriffiths.org/?p=482 To be honest the sentence “VMware support was able to help… and I suggest unless you don’t care about your cluster or vShield implementation that you call them to solve it” scared my a bit, BUT to balls to gain (wait is that right?). WORKS! PARTYYY! But once again: preparation failed (devil)

4. I can’t quite remember which error message or log entry helped me find VMware KB 2053782.Β  Following the steps sounds simply but hey, why should anything work today?! πŸ˜€ Check my other blog post about this particular one. After applying the – I like to call it – “curl”-hack to VSM (see the step before) again, I prepared the cluster one more time and finally the VXLAN VIB could be deployed, BUT …

5. … The Port Group was not created … f*** this sh***. After a 30ish minutes of blind flight through VSM and vCD, I figured out that other clusters could not deploy VXLANs anymore. Due to this insight and a good amount of despair, I just rebootet VSM. Then unprepare, “curl”-hack, prepare … and: WORKS!

vxlan

Portgroup is there. BUT:

6. No VMkernel Ports were created (I ran out of curses by that time). Another 30min passed until I unprepared, “curl”-hacked and prepared the cluster one last time before the VMkernel Ports were then magically created. THANK GOD! So I went ahead creating a Network Scope for the cluster.

I tested creating VXLAN networks over VSM a couple of times and it seemed properly create additional Port Groups. You think the days was over, yet? WROOONG!

7. Next, I tried through vCloud Director. The weird thing was that a Network Pool for that cluster already existed with a different name than the Network Scope I just created. Had to be some relict from before my time in the project. Trying to deploy a vApp I ran into something I am going to write about tomorrow. As this was fixed, I kept receiving this:

7

Telling from the error message, vCloud Director tries to allocate a network from a pool for which VSM has no network scope defined. Those thing did not work out:
– Click “Repair” on the Network Pool
– Create a Network Scope with the same name as the Network Pool as vCD uses some kind of ID instead of the name of the Network Scope.

The only possible solutions I could come up with are deleting and re-creating the Provider vCD or going into the vCD database and do some magic there. The only information on this I could find was in the Vmware Comunities: https://communities.vmware.com/thread/448106. So I am going to open a ticket.

Good night.

VMware vSphere Update Manager causes VXLAN Agent to fail on install and uninstall

The title of this article is that of VMware KB article 2053782. Following the steps seems simple but turned out to provide a couple of pitfalls and inaccuracies:

Mean pitfall:

Open a browser to the MOB at: https://vCenter_Server_IP/eam/mob/

When I opened this URL to my vCenter server, I received the following error message:

The vSphere ESX Agent Manager (vEAM) failed to complete the command.

The thing to point out here is the slash / after “mob” in the URL! Without this slash it won’t work

Unclear Instructions:

b. In the <config> field, change the value from true to false:
<config>
<bypassVumEnabled>false</bypassVumEnabled>
</config>

Reading this, the way I understood the instructions was to leave the XML data as is but turn “true” into “false” for the bypassVumEnabled element. In the code example they gave, they removed all the other elements but I thought that’s probably because of saving space in the KB article. WRONG! Turned out you have to:

  1. Delete all the XML data from the text field
  2. Paste the code above (config element with nested bypassVumEnabled element) – nothing else!

 

Hope that helps πŸ™‚

 

Nested ESXi with OpenStack

For all of you who want to run VMware’s ESXi 5.x on an OpenStack cloud running vSphere as the hypervisor, I have a tiny little tip that might save you some researching: The difficulty I faced was “How do I enable nesting (vHV) for an OpenStack deployed instance?”. I was almost going to write a script to add

featMask.vm.hv.capable="Min:1"
vhv.enable="True"

and run it after the “nova boot” command, and then I found what I am going to show you now.

Remember that uploading an image into Glance you can specify key/value pairs called properties? Well, you are probably already aware of this:

root@controller:~# glance image-show 9eb827d3-7657-4bd5-a6fa-61de7d12f649
+-------------------------------+--------------------------------------+
| Property                      | Value                                |
+-------------------------------+--------------------------------------+
| Property 'vmware_adaptertype' | ide                                  |
| Property 'vmware_disktype'    | sparse                               |
| Property 'vmware_ostype'      | windows7Server64Guest                |
| checksum                      | ced321a1d2aadea42abfa8a7b944a0ef     |
| container_format              | bare                                 |
| created_at                    | 2014-01-15T22:35:14                  |
| deleted                       | False                                |
| disk_format                   | vmdk                                 |
| id                            | 9eb827d3-7657-4bd5-a6fa-61de7d12f649 |
| is_public                     | True                                 |
| min_disk                      | 0                                    |
| min_ram                       | 0                                    |
| name                          | Windows 2012 R2 Std                  |
| protected                     | False                                |
| size                          | 10493231104                          |
| status                        | active                               |
| updated_at                    | 2014-01-15T22:37:42                  |
+-------------------------------+--------------------------------------+
root@controller:~#

At this point, take a look at the vmware_ostype property, which is set to “windows7Server64Guest”. This value is passed to the vSphere API when deploying an image through ESXi’s API (VMwareESXDriver) or the vCenter API (VMwareVCDriver). Looking at the vSphere API/SDK API Reference you can find valid values and since vSphere 5.0 we find “vmkernel4guest” and “vmkernel5guest” in the list representing ESXi 4.x and 5.x respectively. According to my testing, this works with Nova’s VMwareESXDriver as well as VMwareVCDriver.

This is how you change the property in case you set it differently:

# glance image-update --property "vmware_ostype=vmkernel5Guest" IMAGE

And to complete the pictures, this is the code in Nova that implements this functionality:

  93 def get_vm_create_spec(client_factory, instance, name, data_store_name,
  94                        vif_infos, os_type="otherGuest"):
  95     """Builds the VM Create spec."""
  96     config_spec = client_factory.create('ns0:VirtualMachineConfigSpec')
  97     config_spec.name = name
  98     config_spec.guestId = os_type
  99     # The name is the unique identifier for the VM. This will either be the
 100     # instance UUID or the instance UUID with suffix '-rescue' for VM's that
 101     # are in rescue mode
 102     config_spec.instanceUuid = name
 103 
 104     # Allow nested ESX instances to host 64 bit VMs.
 105     if os_type == "vmkernel5Guest":
 106         config_spec.nestedHVEnabled = "True"

You can see that vHV is only enabled if the os_type is set to vmkernel5Guest. I would assume that like this you cannot nest Hyper-V or KVM but I haven’t validated.

Pretty good already. But what I am really looking for is running ESXi on top of KVM as I need nested ESXi combined with Neutron to create properly isolated tenant networks. The most current progress with this can probably be found in the VMware Community.

Fixing Failed OVF Deployment Due to Integrity Error

We were just about to deploy the latest version of VCE’s UIMP when vSphere Client failed with an error message about a failed integrity check of the VMDK file. Although not the best idea, the fix was easy: When an integrity check is performed at all, OVF brings an additional file: the manifest file (.mf). In our case, it contained SHA1 checksum information about the .ovf and the .vmdk file:

mathias@x1c:/media/mathias/Volume$ ls -hl UIMP*
-rw------- 1 mathias mathias 2,6G MΓ€r  5 19:25 UIMP-4.0.0.2.359-Install-Media.iso
-rw------- 1 mathias mathias 1,3G MΓ€r  6 09:28 UIMP_OVF10-disk1.vmdk
-rw------- 1 mathias mathias  133 MΓ€r  6 10:54 UIMP_OVF10.mf
-rw------- 1 mathias mathias 125K Jan  4 23:13 UIMP_OVF10.ovf
mathias@x1c:/media/mathias/Volume$ cat UIMP_OVF10.mf
SHA1(UIMP_OVF10.ovf)= 881533ff36aebc901555dfa2c1d52a6bd4c47d99
SHA1(UIMP_OVF10-disk1.vmdk)= f175f150decb2bf5a859903b050f4ea4a3982023

Interestingly, the whole OVF data was contained in an ISO file which passed the MD5 check after download. So something must have gone wrong packaging the appliance. I calculated the SHA1 checksum for the file and compared it to the one in the manifest:

mathias@x1c:/media/mathias/Volume$ sha1sum UIMP_OVF10.ovf
881533ff36aebc901555dfa2c1d52a6bd4c47d99  UIMP_OVF10.ovf
mathias@x1c:/media/mathias/Volume$ sha1sum UIMP_OVF10-disk1.vmdk
f175f150decb2bf5a859903b050f4ea4a3982023  UIMP_OVF10-disk1.vmdk
mathias@x1c:/media/mathias/Volume$

VMDK: f175f150decb2bf5a859903b050f4ea4a3982023
MF: 9371305140e8c541b0cea6b368ad4af47221998e

Strange πŸ˜€ Well, the fix was to simply edit the manifest with the correct SHA1 checksum. But please keep in mind that just because we just forged the checksum doesnt mean the data is not corrupt. In this case, as the MD5 check of the ISO was successful we assumed that the data is probably fine an decided to give it a try. In the end, the file was still corrupt and brought errors mounting the root file system.u

Do not run TSM as a vCD Workload!

I just got called in for a Tivoli Backup troubleshooting. The symptoms seen were extremely strange:

The TSM proxy successfully connected to vCenter and a backup job could be started. In vSphere Client we could see that the VM to be backed up was snapshotted. The next step would be to attach die VMDK to the TSM virtual machine but instead it was attached to an entirely different VM πŸ˜€ Of course, the backup job failed.

Looking at the TSM VM, I found out it was part of a vApp deployed through vCenter Orcestrator and vCloud Director. I figured this was probably a bad idea to run TSM proxy in a vCD vApp for several reasons:

1. TSM is going to back up vCloud Director VMs and running that same backup server as a vCD VM itself seemed strange. Any scripts or similar to backup the entire vCD vApp workload would probably try to back up the TSM proxy, too.

2. TSM talks to vCenter requesting the creating of snapshots and attachment of VMDKs to itself. As vCD VM the VM is marked as controlled by vCD and any changes through vSphere Client are not recommended. But exactly this would happen when a VMDK gets attached to TSM for backup.

So the first try was to clone the vCloud VM into an ordinary vCenter VM and shut the vApp down. Booom, works! We resolved the issue quickly but unfortunately, the actual technical cause for this is still unknown to us. So in case one of you knows what exactly was going on, please drop me a mail πŸ™‚

cheers
Mathias

Video Recommendation: Nicira NVP vs VMware NSX

Please take a look at the following questions:

  • What is NSX?
  • What the heck is the difference to Nicira NVP or are they the same?
  • What are the technologies behind NSX and how does it work?

Is there any you cannot answer, yet? If so, I would like to direct your attention to two just great videos on Youtube which will clarify:

OpenStack Networking – Theory Session, Part 1

OpenStack Networking – Theory Session, Part 2

Watching this will be the best 1h 45min you have invested for a while!

Have fun!

vCloud Director: Low Performance Powering On A vApp

I am working in a project including vCloud Director as well as most other parts of VMware’s cloud stack for a while now. Until a couple of days ago, everything was running fine regarding the deployment process of vApps from vCloud Director UI or through vCenter Orchestrator. Now we noticed that starting and stopping vApps takes way too long: Powering on a single VM vApp directly connected to an external network takes three steps in vCenter:

  1. Reconfigure virtual machine
  2. Reconfigure virtual machine (again)
  3. Power On virtual machine

The first step of reconfigure virtual machine showed up in vCenter right after we triggered the vApp power on in vCloud Director. From then it took around 5min to reach step two. Once step 2 was completed, the stack paused for another 10min before the VM was actually powered on. This even seemed to have implications on vCenter Orchestrator including timeouts and failed workflows.

We spent an entire day on trying to track the problem down and came up with the opinion that it had to be inside vCloud Director. But before we went into log files, message queues etc, we decided to simply reboot the entire stack: BINGO! After the reboot the problem vanished.

Shutdown Process:

  1. vCO
  2. vCD
  3. vCD NFS
  4. VSM
  5. vCenter
  6. SSO
  7. DB

Then boot the stack in reverse order and watch vCloud Director powering on VMs withing seconds πŸ˜‰

Video Recommendation on Virtualization Security

Unfortunately, I didn’t find a lot of time this year between Christmas and New Year to stay up2date with and watch this year’s Chaos Computer Club Conference (CCC). But I had a look at the schedule before and pin pointed the talk Virtually Impossible: The Reality Of Virtualization Security as something I definitely wanted to see. Luckily, everything is recorded and already available to watch online by now, so I took the 59min this morning to watch it πŸ™‚

 

[Click here for the original page.]

Honestly, the talk is highly technical. A lot of things you need to know about virtualizing instructions sets, memory architectures and I/O devices is presumed to fully understand what the guy is talking about. But don’t let that discourage you now! It’s just the fist – let’s say – 30min that are a bit tricky. Still, you will get a pretty good overview of what security in a virtualized environment actually means and why it is almost impossible to achieve. Seriously, I myself was a bit shocked to hear all that. Virtualization is what I work with all day and I wasn’t aware it was that bad πŸ˜€

For those of you the talk might have motivated to learn more about virtualization, I recommend reading this:

51b3EzPw4WL

800 pages with the full dose of virtualization. I’ve been through this for my studies so I know what I am talking about πŸ˜‰

© 2017 v(e)Xpertise

Theme by Anders NorenUp ↑