TagvCloud Director

Deleting vCloud Director Temporary Data

Recently, I learned a bit about vCloud Director internal database and table structure which I am going to share with you here.

vCD holds two types of tables worth pointing out: the ORTZ and INV tables. The latter store information about vCenter Server inventory (INV) and are kept up-to-date as the inventory changes. This could be due to changes vCD makes itself or those carried out be an administrator. When the vCD service starts up it connects to vCenter to read in its inventory and update its INV tables. The QRTZ tables are used for heart beating and synchronization between cells (at least from what I understood, no guarantees).

Why am I telling you this? Both types of tables can be cleared without losing any vital data. You can make use of this knowledge whenever you feel your vCloud DB is out of sync with the vCenter inventory. This happens for example in the case where you have to restore your vCenter Database from a Backup without restoring vCloud’s database.

WARNING: Following this procedure is completely unsupported by VMware GSS. Follow the instructions below on your own risk or when you have no support for your installation anyway 😉

  1. Shutdown both vCloud Director cells
  2. Clear the QRTZ and INV tables using “delete from <table name>”
  3. Start one of the cells and watch it start (/opt/vmware/vcloud-director/log/cell.log)
  4. Start the other cells

Here some SQL statements that will automate step 2 for you:

delete from QRTZ_SCHEDULER_STATE;
delete from QRTZ_FIRED_TRIGGERS;
delete from QRTZ_PAUSED_TRIGGER_GRPS;
delete from QRTZ_CALENDARS;
delete from QRTZ_TRIGGER_LISTENERS;
delete from QRTZ_BLOB_TRIGGERS;
delete from QRTZ_CRON_TRIGGERS;
delete from QRTZ_SIMPLE_TRIGGERS;
delete from QRTZ_TRIGGERS;
delete from QRTZ_JOB_LISTENERS;
delete from QRTZ_JOB_DETAILS;

delete from compute_resource_inv;
delete from custom_field_manager_inv;
delete from cluster_compute_resource_inv;
delete from datacenter_inv;
delete from datacenter_network_inv;
delete from datastore_inv;
delete from datastore_profile_inv;
delete from dv_portgroup_inv;
delete from dv_switch_inv;
delete from folder_inv;
delete from managed_server_inv;
delete from managed_server_datastore_inv;
delete from managed_server_network_inv;
delete from network_inv;
delete from resource_pool_inv;
delete from storage_pod_inv;
delete from storage_profile_inv;
delete from task_inv;
delete from vm_inv;
delete from property_map;

 

Looking Back at a Day of vShield Manager Troubleshooting

Dear diarrhea diary,

this entire day was f***** up by a VMware product called vShield Manager …

Like this or similar should today’s entry in my non-existing diary look like. It was one of these typical “piece of cake” tasks that turn into nightmares 😀 Literally the task read “Configure VXLAN for the ***** cluster” – easy hmm!?

1. Ok, let’s go: The physical switch configuration turned out easy as it was already done for me 🙂 CHECK.

2. So, naive me, I connected to vShield Manager UI, went Datacenter -> Network Virtualization -> Prepare and added the cluster, gave it the name of the also already existing Distributed Switch and the VLAN ID and let it run. FAIL: “not ready”.

vsm

VSM itself doesn’t give a lot of details but I knew that probably the deployment and installation of the VXLAN VIB package failed. Looking at esxupdate.log I could see a “temporary failure in DNS lookup” (exact wording was probably different).  Looking at the ESXi hosts’ DNS configuration: empty. Cool! Fix -> CHECK. Later I found out that I myself blogged about this a while ago 😀

3. Now lets try again, but first we have to “unprepare” the cluster: removed check in VSM: Error. Of course. VSM didn’t create the Port Group nor the VMkernel ports and now tries to remove them … computer logic 😀 At this point, simply hit “Refresh” and it will be gone. Now we can try the preparation process once more: Error:

domain-c3943 already has been configured with a mapping.

Grrrr … luckily, found this: http://blog.jgriffiths.org/?p=482 To be honest the sentence “VMware support was able to help… and I suggest unless you don’t care about your cluster or vShield implementation that you call them to solve it” scared my a bit, BUT to balls to gain (wait is that right?). WORKS! PARTYYY! But once again: preparation failed (devil)

4. I can’t quite remember which error message or log entry helped me find VMware KB 2053782.  Following the steps sounds simply but hey, why should anything work today?! 😀 Check my other blog post about this particular one. After applying the – I like to call it – “curl”-hack to VSM (see the step before) again, I prepared the cluster one more time and finally the VXLAN VIB could be deployed, BUT …

5. … The Port Group was not created … f*** this sh***. After a 30ish minutes of blind flight through VSM and vCD, I figured out that other clusters could not deploy VXLANs anymore. Due to this insight and a good amount of despair, I just rebootet VSM. Then unprepare, “curl”-hack, prepare … and: WORKS!

vxlan

Portgroup is there. BUT:

6. No VMkernel Ports were created (I ran out of curses by that time). Another 30min passed until I unprepared, “curl”-hacked and prepared the cluster one last time before the VMkernel Ports were then magically created. THANK GOD! So I went ahead creating a Network Scope for the cluster.

I tested creating VXLAN networks over VSM a couple of times and it seemed properly create additional Port Groups. You think the days was over, yet? WROOONG!

7. Next, I tried through vCloud Director. The weird thing was that a Network Pool for that cluster already existed with a different name than the Network Scope I just created. Had to be some relict from before my time in the project. Trying to deploy a vApp I ran into something I am going to write about tomorrow. As this was fixed, I kept receiving this:

7

Telling from the error message, vCloud Director tries to allocate a network from a pool for which VSM has no network scope defined. Those thing did not work out:
– Click “Repair” on the Network Pool
– Create a Network Scope with the same name as the Network Pool as vCD uses some kind of ID instead of the name of the Network Scope.

The only possible solutions I could come up with are deleting and re-creating the Provider vCD or going into the vCD database and do some magic there. The only information on this I could find was in the Vmware Comunities: https://communities.vmware.com/thread/448106. So I am going to open a ticket.

Good night.

vCloud Director: Low Performance Powering On A vApp

I am working in a project including vCloud Director as well as most other parts of VMware’s cloud stack for a while now. Until a couple of days ago, everything was running fine regarding the deployment process of vApps from vCloud Director UI or through vCenter Orchestrator. Now we noticed that starting and stopping vApps takes way too long: Powering on a single VM vApp directly connected to an external network takes three steps in vCenter:

  1. Reconfigure virtual machine
  2. Reconfigure virtual machine (again)
  3. Power On virtual machine

The first step of reconfigure virtual machine showed up in vCenter right after we triggered the vApp power on in vCloud Director. From then it took around 5min to reach step two. Once step 2 was completed, the stack paused for another 10min before the VM was actually powered on. This even seemed to have implications on vCenter Orchestrator including timeouts and failed workflows.

We spent an entire day on trying to track the problem down and came up with the opinion that it had to be inside vCloud Director. But before we went into log files, message queues etc, we decided to simply reboot the entire stack: BINGO! After the reboot the problem vanished.

Shutdown Process:

  1. vCO
  2. vCD
  3. vCD NFS
  4. VSM
  5. vCenter
  6. SSO
  7. DB

Then boot the stack in reverse order and watch vCloud Director powering on VMs withing seconds 😉

Import vCenter Operations (vCOps) as an OVF into a vCloud Director environment

Having a standard in the virtualization world like OVF or OVA is a real cool thing. Nevertheless from time to time I stumble about some issues with it. Just happened while trying to import vCenter Operation to a vCloud environment at a customer site.

Even though you can import OVF (unfortuneatly no OVA files -> extract the OVA first, e.g. with the ovftool or tar) it hasn’t worked in the vCloud environment, since the vCOPS vAPP consists of two virtual machines and wants to have additional information (EULA, network, etc.) during the deployment process.

So my next idea was to deploy the OVF in the vCenter just to import it afterwards into the vCloud director.

0005

After the importing process i wasn’t able to start the VM either

001

Unable to start vAPP ….., Invalid vApp properties: Unknown property vami.ip0.VM_2 referenced in property ip.

0021

To solve this problem, I moved the two virtual machines (analytics and UI) out of the vApp in the vCenter and deactivated their vApp properties in the virtual machine settings.

003

If any error message occur during this configuration part and prompt wants you to refresh the browser, ignore this by clicking NO (worked only in Chrome for me)

004

Now you can navigate to the vCloud Director, import both virtual machines to your vApp.

0005

and voila. The vCenter Operations VMs can boot in your vCloud environment.

006

Keep in mind that you eventually need to change the IP addresses of your vCenter Operations components. Just follow the instructions on virtualGhetto

vCloud Director – Fenced Networks Explained

I recently stumbled across vCloud Director fenced networks again and wondered how they actually work. Reading the documentation did not help too much:

Fencing allows identical virtual machines in different vApps to be powered on without conflict by isolating the MAC and IP addresses of the virtual machines.

This is pretty much all I could find in the vCloud Director User’s Guide, not very deep dive, right? Then I thought “This is probably a vShield Edge feature, so I should check that documentation, too!” but I was disappointed: The document does not even contain the word “fenced” – damn! Instead, I found that vShield Edge can do load balancing for HTTP traffic – I might be writing about this soon. Looking at the description quoted above, I wondered “Doesn’t a NAT-routed network do that, too?”. I mean without a doubt we could have the same IP addresses and MAC addresses behind two NAT-routed vApp networks, right? NATed to the Organization Network, this should not be a problem. Bottom line: the VMware documentation does not tell how vApp network fencing actually works. So I did a little research which I would like to share with you now:

What Does a Fenced vApp Network Actually Do?

The term “fenced” only means that this vApp is to some extent isolated from the rest of the network. Going with this definition, any vCloud Director network that is not a direct connected one can be called “fenced”, because a vShield Edge device isolates the vApp from other vApps and possibly the internet. This isolation includes the MAC address (outside the vApp, no other VM will see the MAC address of a VM on the inside) as well as the IP address in the case of a NAT router.

Fenced network as in vCloud Director means to have the same subnet in the vApp network as in the organization network. Well, thats not very special because a directly connected vApp network does that, too, right? Yes, but this time this is with a vShield Edge router in between!

Configuring vApp Network Fencing

Even the configuration of a fenced network is not most intuitive: Before I started fiddling with that feature, I only knew the option was there but never actually saw the option not disabled in vCloud Director’s web portal. To actually configure a vApp in fenced mode, you need to first directly connect the vApp network to the organization network.

create-bridged-vapp-net1

Then, in the next screen, you will see the option “Fence vApp” enabled. So check “Fence vApp” and click “Next”.

select-fenced

Now, the vApp will be created and you will see “Fenced” in the connectivity column of the vApp. Further, a vShield Edge is deployed even though we selected chose to directly connect to the organization network!

connectivity-fenced

Behind the Scenes

Everyone with a networking background will feel a piercing pain in his chest having the same subnet in front of and behind the same router! Don’t worry, I feel that too! The next question I came up with after the pain faded away was “How in hell does that work?”. Though I had a network background, I did not come across this scenario, yet. Well, I haven’t heard of “Proxy ARP” at the time. Please check the article on ARP at Wikepedia.com in case you are not aware of the ARP protocol before you continue reading.

Proxy ARP means to respond to ARP Requests in place of a different machine. In this case, the vShield Edge device response to request for MAC addresses behind the internal NIC. It then receives the Ethernet frame destined to the VM in the vApp network and forwards it out its internal network interface.

proxy-arp

Of course, the same procedure has to be gone through for the response of the inner VM back to the outer machine.

I am the type who has to copy functionality with a Linux system to fully understand what is going on and to be fully satisfied.  If you would like to do that, too, check the second link below to see how it works!

  • http://geekafterfive.com/2012/04/24/deepdive-vcloud-vapp-networks/
  • http://www.sjdjweis.com/linux/proxyarp/

Nesting ESXi on ESXi 5.1 in vCloud Director 5.1

Nesting hypervisors – especially ESXi – is becoming more and more popular with löab/testing and/or development in mind. I just set up such an environment using vCloud Director and found some bad bad news:

We used vCloud Director 5.1 with vSphere 5.1 underneath and everything running on Distributed vSwitch version 5.1, too (has to be as vCloud Director needs VXLAN). Deploying ESXi 5.1 in vCloud Director 5.1 has become quite easy as the everything made its way into the GUI to enable virtual virtualization hardware assist.

Networking

But there are two issues with networking:

  1. The virtual port group to connect virtual ESXi hosts to, has to have promiscuous mode enables.
  2. That same port group has to have forged transmits enabled,too!

While issue number one can be resolved by editing the vCloud Director database (read http://www.virtuallyghetto.com/2011/10/missing-piece-in-creating-your-own.html for more information), problem number two is very bad news. Why? Well, the latest version of Distributed vSwitch rejects all three security policies by default. That means promisuous mode, MAC address changes and forged transmits are set to “reject”. In earlier version those where set to reject/accept/accept, remember!? (refer to http://kb.vmware.com/kb/2030982 to see how default settings evolved) So, forged transmits were accepted already.

Well, as a result, editing the vCloud Director database to enable promiscous mode on provisioned port groups is not enough anymore. Right now, the only solution is to manually or scriptedly reconfigure port groups every time a vApp with virtual ESXi hosts was started.

That really sucks, my friends! And I think we need a solution here quick! So in case you know one, please share!

VM Configuration

SCSI Adapter

I followed the instructions on http://www.virtuallyghetto.com/2011/10/missing-piece-in-creating-your-own.html and executed the following SQL command to add “ESXi 5.x” as an OS type in vCloud Director:

INSERT INTO guest_osfamily (family,family_id) VALUES ('VMware ESX/ESXi',6);
INSERT INTO guest_os_type (guestos_id,display_name, internal_name, family_id, is_supported, is_64bit, min_disk_gb, min_memory_mb, min_hw_version, supports_cpu_hotadd, supports_mem_hotadd, diskadapter_id, max_cpu_supported, is_personalization_enabled, is_personalization_auto, is_sysprep_supported, is_sysprep_os_packaged, cim_id, cim_version) VALUES (seq_config.NextVal,'ESXi 4.x', 'vmkernelGuest', 6, 1, 1, 8, 3072, 7,1, 1, 4, 8, 0, 0, 0, 0, 107, 40);
INSERT INTO guest_os_type (guestos_id,display_name, internal_name, family_id, is_supported, is_64bit, min_disk_gb, min_memory_mb, min_hw_version, supports_cpu_hotadd, supports_mem_hotadd, diskadapter_id, max_cpu_supported, is_personalization_enabled, is_personalization_auto, is_sysprep_supported, is_sysprep_os_packaged, cim_id, cim_version) VALUES (seq_config.NextVal, 'ESXi 5.x', 'vmkernel5Guest', 6, 1, 1, 8, 3072, 7,1, 1, 4, 8, 0, 0, 0, 0, 107, 50);

This sets the diskadapter_id value to 4 which refers to the LSI Logic SAS adapter. The problem with this adapter and virtual ESXi hosts is, that disks on this controller appear as remote disks to ESXi:

Screenshot-from-2013-04-03-113552

This might not seem to be a big problem as ESXi still installs on the HDD it can see and will boot properly. But working with Host Profiles you will run into trouble: Grabbing a Host Profile from this ESXi host, the profile will include this disk and will not be able to find the disk as you apply the profile to a different host. As a result, you would have to dig into the Host Profile disabling sub profiles to see the other host in compliance.

To avoid this problem, use LSI Logic Parallel instead. If you haven’t already executed the big SQL statements above, execute this instead:

INSERT INTO guest_osfamily (family,family_id) VALUES ('VMware ESX/ESXi',6);
INSERT INTO guest_os_type (guestos_id,display_name, internal_name, family_id, is_supported, is_64bit, min_disk_gb, min_memory_mb, min_hw_version, supports_cpu_hotadd, supports_mem_hotadd, diskadapter_id, max_cpu_supported, is_personalization_enabled, is_personalization_auto, is_sysprep_supported, is_sysprep_os_packaged, cim_id, cim_version) VALUES (seq_config.NextVal,'ESXi 4.x', 'vmkernelGuest', 6, 1, 1, 8, 3072, 7,1, 1, 3, 8, 0, 0, 0, 0, 107, 40);
INSERT INTO guest_os_type (guestos_id,display_name, internal_name, family_id, is_supported, is_64bit, min_disk_gb, min_memory_mb, min_hw_version, supports_cpu_hotadd, supports_mem_hotadd, diskadapter_id, max_cpu_supported, is_personalization_enabled, is_personalization_auto, is_sysprep_supported, is_sysprep_os_packaged, cim_id, cim_version) VALUES (seq_config.NextVal, 'ESXi 5.x', 'vmkernel5Guest', 6, 1, 1, 8, 3072, 7,1, 1, 3, 8, 0, 0, 0, 0, 107, 50);

Should those entries already be in your database, execute this to change the diskadapter_id value from 4 to 3:

UPDATE guest_os_type SET diskadapter_id=3 WHERE display_name='ESXi 4.x';
UPDATE guest_os_type SET diskadapter_id=3 WHERE display_name='ESXi 5.x';

Don”t forget to restart the vcd service after that:

$ service vmware-vcd restart

IP Assignment

Every virtual NIC attached to a vCloud Director controlled VM, is most likely going to be connected to a virtual network. Once connected, an IP address has to be assignd. This assignment can be done in either of the following ways:

  • Static – IP Pool
  • Static – Manual
  • DHCP

But where is the “None”-option? Some reasons for having “None” as an option:

  1. What if I want to use NIC teaming on a vSwitch for the mangement network? In that case the same IP would be valid for both NICs which cannot be configured.
  2. With ESXi, no NIC ever has an IP address directly configured to it, so none of the options above would apply!
  3. For VM networks, we probably use NICs only to forward VM traffic. ESXi itself might not even have an IP address in that network.

So far, I used DHCP for NICs that should not have an IP address at all and “Static – Manual” for the first NIC of a NIC teaming group that carries VMkernel Port traffic. Works – but its not perfekt.

Hope that helped!

vCloud vCloud Director – VMRC Bandwidth Usage

Last week I was asked about the estimated bandwidth requirement for a VMRC based console connection through a vCloud Director cell. Well, I did not know at the time, so I set up a little test environment. The results I want to share with you now.

In vSphere the question for the bandwidth consumption of a vSphere Client console window is rather pointless. Unless we are talking about a ROBO (remote offices and branch offices) installation, console connections are made from within the company LAN where bandwidth is much less of an issue.

vsphere client console1 vCloud Director   VMRC Bandwidth Usage

Figure 1: Remote console connection with vSphere Client.

The fat dashed line indicates the connection made from vSphere Client directly to ESXi in order to pick up the image of a virtual machine’s console.

With vCloud Director things are a bit different: Customers have access to a web portal and create console connections to their VMs through the VMRC (virtual machine remote console) plug-in. Though the plug-in displays the console image, the connection to ESXi is not initiated by it. Instead the VMRC connects to the vCD cell’s console proxy interface. vCD then connects to ESXi. This means a vCD cell acts as a proxy for the VMRC plug-in.

vmrc bandwidth vCloud Director   VMRC Bandwidth Usage

Figure 2: Remote console through the vCloud Director web portal.

Of course, the browser could be located inside the LAN, but especially in public cloud environments this traffic will flow through your WAN connection.

Testing the Bandwidth

The bandwidth consumed by a single remote console connection depends on what is being done inside VM. So, In my testings I monitored bandwidth in three different situations:

  1. writing some text in notepad
  2. browsing the internet
  3. watching a video

Of course, the configured screen resolution and color depth has to be considered, too. But this is not going to be a big evaluation of the performance but rather an attempt to give you – and myself – an impression and rough values to work with.

To get the actual values, I used the Advanced Performance Charts to monitor the console proxy NIC of my vCloud Director cell:

Screenshot 06242012 083126 PM 300x206 vCloud Director   VMRC Bandwidth Usage

Figure 3: Network performance of a vCD cell during a VMRC connection.

I started the testing after 8 PM, so please ignore the the spikes on the left. The first block of peaks after 8 PM is the result of writing text in notepad. I did not use a script to simulate this workload which is probably the reason why values are not very constant. Towards the end, I reached a fairly high number of keystroke per second – probably higher than what would be an average value. The estimated average bandwidth is around 1400 KBps. After that, I started a youtube video. The video was high resolution but the player window remained small. Still, I reached an average of maybe 3000 KBps! Browsing a few web sites and scrolling the browser window seems to create a slightly lower amount of network I/Os. Most likely, a realistic workload includes reading sections before scrolling, so the bandwidth consumption would be even lower than the measured average of – let’s say – 1600 KBps.

As we have seen the protocol used for the VMRC connection is not a low bandwidth implementation. Implementing your cloud, you should definitely keep that in mind. A single VMRC connection does not harm anyone, but having several 10 concurrent connections might congest your WAN connection depending on what you have there. Also could a single customer influence performance of another!

How do we solve this? Well, if you have a problem with VMRC bandwidth this is a limitation of your WAN connection. All you can do from the vCloud Director’s side is set a limit on the maximum number of concurrent connections per VM:

vcd 300x218 vCloud Director   VMRC Bandwidth Usage

Figure 4: Customer Policies: Limits

But this works only for connections to the same VM! A more professional solution would include the an appliance placed in front of the vCD cells that performs traffic shaping per VMRC connection. Maybe your load balancer can do this!

© 2019 v(e)Xpertise

Theme by Anders NorénUp ↑