CategoryvCloud Director

Deleting vCloud Director Temporary Data

Recently, I learned a bit about vCloud Director internal database and table structure which I am going to share with you here.

vCD holds two types of tables worth pointing out: the ORTZ and INV tables. The latter store information about vCenter Server inventory (INV) and are kept up-to-date as the inventory changes. This could be due to changes vCD makes itself or those carried out be an administrator. When the vCD service starts up it connects to vCenter to read in its inventory and update its INV tables. The QRTZ tables are used for heart beating and synchronization between cells (at least from what I understood, no guarantees).

Why am I telling you this? Both types of tables can be cleared without losing any vital data. You can make use of this knowledge whenever you feel your vCloud DB is out of sync with the vCenter inventory. This happens for example in the case where you have to restore your vCenter Database from a Backup without restoring vCloud’s database.

WARNING: Following this procedure is completely unsupported by VMware GSS. Follow the instructions below on your own risk or when you have no support for your installation anyway 😉

  1. Shutdown both vCloud Director cells
  2. Clear the QRTZ and INV tables using “delete from <table name>”
  3. Start one of the cells and watch it start (/opt/vmware/vcloud-director/log/cell.log)
  4. Start the other cells

Here some SQL statements that will automate step 2 for you:

delete from QRTZ_SCHEDULER_STATE;
delete from QRTZ_FIRED_TRIGGERS;
delete from QRTZ_PAUSED_TRIGGER_GRPS;
delete from QRTZ_CALENDARS;
delete from QRTZ_TRIGGER_LISTENERS;
delete from QRTZ_BLOB_TRIGGERS;
delete from QRTZ_CRON_TRIGGERS;
delete from QRTZ_SIMPLE_TRIGGERS;
delete from QRTZ_TRIGGERS;
delete from QRTZ_JOB_LISTENERS;
delete from QRTZ_JOB_DETAILS;

delete from compute_resource_inv;
delete from custom_field_manager_inv;
delete from cluster_compute_resource_inv;
delete from datacenter_inv;
delete from datacenter_network_inv;
delete from datastore_inv;
delete from datastore_profile_inv;
delete from dv_portgroup_inv;
delete from dv_switch_inv;
delete from folder_inv;
delete from managed_server_inv;
delete from managed_server_datastore_inv;
delete from managed_server_network_inv;
delete from network_inv;
delete from resource_pool_inv;
delete from storage_pod_inv;
delete from storage_profile_inv;
delete from task_inv;
delete from vm_inv;
delete from property_map;

 

Do not run TSM as a vCD Workload!

I just got called in for a Tivoli Backup troubleshooting. The symptoms seen were extremely strange:

The TSM proxy successfully connected to vCenter and a backup job could be started. In vSphere Client we could see that the VM to be backed up was snapshotted. The next step would be to attach die VMDK to the TSM virtual machine but instead it was attached to an entirely different VM 😀 Of course, the backup job failed.

Looking at the TSM VM, I found out it was part of a vApp deployed through vCenter Orcestrator and vCloud Director. I figured this was probably a bad idea to run TSM proxy in a vCD vApp for several reasons:

1. TSM is going to back up vCloud Director VMs and running that same backup server as a vCD VM itself seemed strange. Any scripts or similar to backup the entire vCD vApp workload would probably try to back up the TSM proxy, too.

2. TSM talks to vCenter requesting the creating of snapshots and attachment of VMDKs to itself. As vCD VM the VM is marked as controlled by vCD and any changes through vSphere Client are not recommended. But exactly this would happen when a VMDK gets attached to TSM for backup.

So the first try was to clone the vCloud VM into an ordinary vCenter VM and shut the vApp down. Booom, works! We resolved the issue quickly but unfortunately, the actual technical cause for this is still unknown to us. So in case one of you knows what exactly was going on, please drop me a mail 🙂

cheers
Mathias

vCloud Director: Low Performance Powering On A vApp

I am working in a project including vCloud Director as well as most other parts of VMware’s cloud stack for a while now. Until a couple of days ago, everything was running fine regarding the deployment process of vApps from vCloud Director UI or through vCenter Orchestrator. Now we noticed that starting and stopping vApps takes way too long: Powering on a single VM vApp directly connected to an external network takes three steps in vCenter:

  1. Reconfigure virtual machine
  2. Reconfigure virtual machine (again)
  3. Power On virtual machine

The first step of reconfigure virtual machine showed up in vCenter right after we triggered the vApp power on in vCloud Director. From then it took around 5min to reach step two. Once step 2 was completed, the stack paused for another 10min before the VM was actually powered on. This even seemed to have implications on vCenter Orchestrator including timeouts and failed workflows.

We spent an entire day on trying to track the problem down and came up with the opinion that it had to be inside vCloud Director. But before we went into log files, message queues etc, we decided to simply reboot the entire stack: BINGO! After the reboot the problem vanished.

Shutdown Process:

  1. vCO
  2. vCD
  3. vCD NFS
  4. VSM
  5. vCenter
  6. SSO
  7. DB

Then boot the stack in reverse order and watch vCloud Director powering on VMs withing seconds 😉

Import vCenter Operations (vCOps) as an OVF into a vCloud Director environment

Having a standard in the virtualization world like OVF or OVA is a real cool thing. Nevertheless from time to time I stumble about some issues with it. Just happened while trying to import vCenter Operation to a vCloud environment at a customer site.

Even though you can import OVF (unfortuneatly no OVA files -> extract the OVA first, e.g. with the ovftool or tar) it hasn’t worked in the vCloud environment, since the vCOPS vAPP consists of two virtual machines and wants to have additional information (EULA, network, etc.) during the deployment process.

So my next idea was to deploy the OVF in the vCenter just to import it afterwards into the vCloud director.

0005

After the importing process i wasn’t able to start the VM either

001

Unable to start vAPP ….., Invalid vApp properties: Unknown property vami.ip0.VM_2 referenced in property ip.

0021

To solve this problem, I moved the two virtual machines (analytics and UI) out of the vApp in the vCenter and deactivated their vApp properties in the virtual machine settings.

003

If any error message occur during this configuration part and prompt wants you to refresh the browser, ignore this by clicking NO (worked only in Chrome for me)

004

Now you can navigate to the vCloud Director, import both virtual machines to your vApp.

0005

and voila. The vCenter Operations VMs can boot in your vCloud environment.

006

Keep in mind that you eventually need to change the IP addresses of your vCenter Operations components. Just follow the instructions on virtualGhetto

vCloud Director – Fenced Networks Explained

I recently stumbled across vCloud Director fenced networks again and wondered how they actually work. Reading the documentation did not help too much:

Fencing allows identical virtual machines in different vApps to be powered on without conflict by isolating the MAC and IP addresses of the virtual machines.

This is pretty much all I could find in the vCloud Director User’s Guide, not very deep dive, right? Then I thought “This is probably a vShield Edge feature, so I should check that documentation, too!” but I was disappointed: The document does not even contain the word “fenced” – damn! Instead, I found that vShield Edge can do load balancing for HTTP traffic – I might be writing about this soon. Looking at the description quoted above, I wondered “Doesn’t a NAT-routed network do that, too?”. I mean without a doubt we could have the same IP addresses and MAC addresses behind two NAT-routed vApp networks, right? NATed to the Organization Network, this should not be a problem. Bottom line: the VMware documentation does not tell how vApp network fencing actually works. So I did a little research which I would like to share with you now:

What Does a Fenced vApp Network Actually Do?

The term “fenced” only means that this vApp is to some extent isolated from the rest of the network. Going with this definition, any vCloud Director network that is not a direct connected one can be called “fenced”, because a vShield Edge device isolates the vApp from other vApps and possibly the internet. This isolation includes the MAC address (outside the vApp, no other VM will see the MAC address of a VM on the inside) as well as the IP address in the case of a NAT router.

Fenced network as in vCloud Director means to have the same subnet in the vApp network as in the organization network. Well, thats not very special because a directly connected vApp network does that, too, right? Yes, but this time this is with a vShield Edge router in between!

Configuring vApp Network Fencing

Even the configuration of a fenced network is not most intuitive: Before I started fiddling with that feature, I only knew the option was there but never actually saw the option not disabled in vCloud Director’s web portal. To actually configure a vApp in fenced mode, you need to first directly connect the vApp network to the organization network.

create-bridged-vapp-net1

Then, in the next screen, you will see the option “Fence vApp” enabled. So check “Fence vApp” and click “Next”.

select-fenced

Now, the vApp will be created and you will see “Fenced” in the connectivity column of the vApp. Further, a vShield Edge is deployed even though we selected chose to directly connect to the organization network!

connectivity-fenced

Behind the Scenes

Everyone with a networking background will feel a piercing pain in his chest having the same subnet in front of and behind the same router! Don’t worry, I feel that too! The next question I came up with after the pain faded away was “How in hell does that work?”. Though I had a network background, I did not come across this scenario, yet. Well, I haven’t heard of “Proxy ARP” at the time. Please check the article on ARP at Wikepedia.com in case you are not aware of the ARP protocol before you continue reading.

Proxy ARP means to respond to ARP Requests in place of a different machine. In this case, the vShield Edge device response to request for MAC addresses behind the internal NIC. It then receives the Ethernet frame destined to the VM in the vApp network and forwards it out its internal network interface.

proxy-arp

Of course, the same procedure has to be gone through for the response of the inner VM back to the outer machine.

I am the type who has to copy functionality with a Linux system to fully understand what is going on and to be fully satisfied.  If you would like to do that, too, check the second link below to see how it works!

  • http://geekafterfive.com/2012/04/24/deepdive-vcloud-vapp-networks/
  • http://www.sjdjweis.com/linux/proxyarp/

Nesting ESXi on ESXi 5.1 in vCloud Director 5.1

Nesting hypervisors – especially ESXi – is becoming more and more popular with löab/testing and/or development in mind. I just set up such an environment using vCloud Director and found some bad bad news:

We used vCloud Director 5.1 with vSphere 5.1 underneath and everything running on Distributed vSwitch version 5.1, too (has to be as vCloud Director needs VXLAN). Deploying ESXi 5.1 in vCloud Director 5.1 has become quite easy as the everything made its way into the GUI to enable virtual virtualization hardware assist.

Networking

But there are two issues with networking:

  1. The virtual port group to connect virtual ESXi hosts to, has to have promiscuous mode enables.
  2. That same port group has to have forged transmits enabled,too!

While issue number one can be resolved by editing the vCloud Director database (read http://www.virtuallyghetto.com/2011/10/missing-piece-in-creating-your-own.html for more information), problem number two is very bad news. Why? Well, the latest version of Distributed vSwitch rejects all three security policies by default. That means promisuous mode, MAC address changes and forged transmits are set to “reject”. In earlier version those where set to reject/accept/accept, remember!? (refer to http://kb.vmware.com/kb/2030982 to see how default settings evolved) So, forged transmits were accepted already.

Well, as a result, editing the vCloud Director database to enable promiscous mode on provisioned port groups is not enough anymore. Right now, the only solution is to manually or scriptedly reconfigure port groups every time a vApp with virtual ESXi hosts was started.

That really sucks, my friends! And I think we need a solution here quick! So in case you know one, please share!

VM Configuration

SCSI Adapter

I followed the instructions on http://www.virtuallyghetto.com/2011/10/missing-piece-in-creating-your-own.html and executed the following SQL command to add “ESXi 5.x” as an OS type in vCloud Director:

INSERT INTO guest_osfamily (family,family_id) VALUES ('VMware ESX/ESXi',6);
INSERT INTO guest_os_type (guestos_id,display_name, internal_name, family_id, is_supported, is_64bit, min_disk_gb, min_memory_mb, min_hw_version, supports_cpu_hotadd, supports_mem_hotadd, diskadapter_id, max_cpu_supported, is_personalization_enabled, is_personalization_auto, is_sysprep_supported, is_sysprep_os_packaged, cim_id, cim_version) VALUES (seq_config.NextVal,'ESXi 4.x', 'vmkernelGuest', 6, 1, 1, 8, 3072, 7,1, 1, 4, 8, 0, 0, 0, 0, 107, 40);
INSERT INTO guest_os_type (guestos_id,display_name, internal_name, family_id, is_supported, is_64bit, min_disk_gb, min_memory_mb, min_hw_version, supports_cpu_hotadd, supports_mem_hotadd, diskadapter_id, max_cpu_supported, is_personalization_enabled, is_personalization_auto, is_sysprep_supported, is_sysprep_os_packaged, cim_id, cim_version) VALUES (seq_config.NextVal, 'ESXi 5.x', 'vmkernel5Guest', 6, 1, 1, 8, 3072, 7,1, 1, 4, 8, 0, 0, 0, 0, 107, 50);

This sets the diskadapter_id value to 4 which refers to the LSI Logic SAS adapter. The problem with this adapter and virtual ESXi hosts is, that disks on this controller appear as remote disks to ESXi:

Screenshot-from-2013-04-03-113552

This might not seem to be a big problem as ESXi still installs on the HDD it can see and will boot properly. But working with Host Profiles you will run into trouble: Grabbing a Host Profile from this ESXi host, the profile will include this disk and will not be able to find the disk as you apply the profile to a different host. As a result, you would have to dig into the Host Profile disabling sub profiles to see the other host in compliance.

To avoid this problem, use LSI Logic Parallel instead. If you haven’t already executed the big SQL statements above, execute this instead:

INSERT INTO guest_osfamily (family,family_id) VALUES ('VMware ESX/ESXi',6);
INSERT INTO guest_os_type (guestos_id,display_name, internal_name, family_id, is_supported, is_64bit, min_disk_gb, min_memory_mb, min_hw_version, supports_cpu_hotadd, supports_mem_hotadd, diskadapter_id, max_cpu_supported, is_personalization_enabled, is_personalization_auto, is_sysprep_supported, is_sysprep_os_packaged, cim_id, cim_version) VALUES (seq_config.NextVal,'ESXi 4.x', 'vmkernelGuest', 6, 1, 1, 8, 3072, 7,1, 1, 3, 8, 0, 0, 0, 0, 107, 40);
INSERT INTO guest_os_type (guestos_id,display_name, internal_name, family_id, is_supported, is_64bit, min_disk_gb, min_memory_mb, min_hw_version, supports_cpu_hotadd, supports_mem_hotadd, diskadapter_id, max_cpu_supported, is_personalization_enabled, is_personalization_auto, is_sysprep_supported, is_sysprep_os_packaged, cim_id, cim_version) VALUES (seq_config.NextVal, 'ESXi 5.x', 'vmkernel5Guest', 6, 1, 1, 8, 3072, 7,1, 1, 3, 8, 0, 0, 0, 0, 107, 50);

Should those entries already be in your database, execute this to change the diskadapter_id value from 4 to 3:

UPDATE guest_os_type SET diskadapter_id=3 WHERE display_name='ESXi 4.x';
UPDATE guest_os_type SET diskadapter_id=3 WHERE display_name='ESXi 5.x';

Don”t forget to restart the vcd service after that:

$ service vmware-vcd restart

IP Assignment

Every virtual NIC attached to a vCloud Director controlled VM, is most likely going to be connected to a virtual network. Once connected, an IP address has to be assignd. This assignment can be done in either of the following ways:

  • Static – IP Pool
  • Static – Manual
  • DHCP

But where is the “None”-option? Some reasons for having “None” as an option:

  1. What if I want to use NIC teaming on a vSwitch for the mangement network? In that case the same IP would be valid for both NICs which cannot be configured.
  2. With ESXi, no NIC ever has an IP address directly configured to it, so none of the options above would apply!
  3. For VM networks, we probably use NICs only to forward VM traffic. ESXi itself might not even have an IP address in that network.

So far, I used DHCP for NICs that should not have an IP address at all and “Static – Manual” for the first NIC of a NIC teaming group that carries VMkernel Port traffic. Works – but its not perfekt.

Hope that helped!

Accessing the vCloud Director Appliance’s embedded Oracle 11g R2 XE database

As you might know, the vCD Appliance runs an embedded Oracle 11g R2 Express database for storing vCloud Director’s data. This article on Duncan Epping’s blog yellow-bricks.com reads the access credentials to that database but I always wondered how to actually access it!

Just recently, I realized that the default security settings of an automatically provisioned port group with VXLAN looked like this:

sec

Everything rejected! That is – by the way – the new default on Distributed Switches since vSphere 5.1 but that is no good if we want to run virtual ESXi hosts on vCloud Director. The reason is that ESXi requires promiscuous mode.

If you want to run ESXi on vCloud Director, please refer to this and that in order to see the SQL statements to be executed to enable promiscuous mode and to make ESXi visible as a guest OS in the VM creation wizard.

I am now going to show you – and this is what the article is about – how to execute the SQL statements using the embedded Oracle 11g Express database:

Taking a look into the appliance (log on using root as the username and Default0 as the password) we can find out how the vcd service connects to the database:

vcd:~ # cat /opt/vmware/vcloud-director/etc/global.properties
# Database connection settings
database.jdbcUrl = jdbc:oracle:thin:@127.0.0.1:1521\/xe
database.username = vcloud
database.password = DESrBZagGihwZPdqXu1z6Pmr85v43Eu5ktVgXG0FIWQ=
 
# Product display name
product.display_name = VMware vCloud Director
...
vcd:~ #

As you can the from the database.jdbcUrl line, the database service listens on port 1521 of the local loopback interface (127.0.0.1). A look at netstat confirmes this:

vcd:~ # netstat -tulpen
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       User       Inode      PID/Program name 
...          </code>
tcp        0      0 127.0.0.1:1521          0.0.0.0:*               LISTEN      1001       8553       3181/tnslsnr       
...
vcd:~ #

On the appliance itself there doesn’t seem to be any command line tools capable of connecting to that oracle service and executing SQL commands. But I found a java based tool called sqlline that comes with my installation of Ubuntu:

mathias@x220:~$ sudo apt-get install sqlline

For Windows, take a look at http://sqlline.sourceforge.net/ to obtain the software.

What you need to do is download the JDBC driver for Oracle from http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-112010-090769.html. Be sure to pick the right distribution. For me this was the ojdbc6.jar file. So I put the file into my /usr/share/java/ folder which contains a lot of other JAR files, too, which will be available in the java classpath once the JVM starts up.

There is still one more issue to overcome: The oracle service listens on localhost and we don’t have any connectivity to that interface and port number from outside the appliance. The solution could be an SSH tunnel. You could use Putty to create that tunnel or the following command:

$ ssh -L 1521:127.0.0.1:1521 &lt;IP of vCD Appliance&gt;

Be sure to leave that SSH connection open. Otherwise the tunnel dies with it.

Now, execute sqlline on your work station:

mathias@x220:~$ sqlline -u 'jdbc:oracle:thin:@127.0.0.1:1521/xe' -n vcloud -p VCloud -d oracle.jdbc.driver.OracleDriver --verbose=true

You will see an error message similar to

Error: READ_COMMITTED and SERIALIZABLE are the only valid transaction levels (state=99999,code=17030)
java.sql.SQLException: READ_COMMITTED and SERIALIZABLE are the only valid transaction levels
 ...

but that is fine – it still worked for me. Now, to test the connection, you could for example type

0: jdbc:oracle:thin:@127.0.0.1:1521/xe&gt; SELECT * FROM all_tables;

and you should see a list of all available tables.

If that worked, go ahead and execute the SQL commands as described in the other articles I mentions above.

Have fun!

vCloud Director Appliance stops working – out of disk space

Playing around with vCloud Director Appliance in my lab, I downloaded vApps to a CIFS share to make  b ackups of my vApp Templates. The first download worked perfectly – no complaints. The second download failed with a very vacuous message like “download failed” or something similar. Sorry for not knowing the exact wording but I didn’t expect that to become a blog post icon wink vCloud Director Appliance stops working   out of disk space

After trying a few more times, I decided to reboot the appliance. Nothing. vCloud Director didn’t even come back up now! A port scan showed that 80 and 443 were not listening:

Nmap done: 1 IP address (1 host up) scanned in 2.53 seconds
mathias@x220:~$ nmap 10.173.10.22
 
Starting Nmap 6.00 ( http://nmap.org ) at 2013-03-24 14:31 CET
Nmap scan report for 10.173.10.22
Host is up (0.089s latency).
Not shown: 997 closed ports
PORT     STATE SERVICE
22/tcp   open  ssh
111/tcp  open  rpcbind
3389/tcp open  ms-wbt-server
 
Nmap done: 1 IP address (1 host up) scanned in 2.53 seconds
mathias@x220:~$

So, one more time it was necessary to take a look at the log files:

mathias@x220:~$ ssh root@10.173.10.22
# This is a dummy banner. Replace this with your own banner, appropriate for
# the VA
root@10.173.10.22's password: 
Last login: Sun Mar 24 09:22:57 2013
vcd:~ # cd /opt/vmware/vcloud-director/logs/
vcd:/opt/vmware/vcloud-director/logs #

In cell.log I found some good pieces of information:

java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(Unknown Source)
        at sun.nio.cs.StreamEncoder.writeBytes(Unknown Source)
        at sun.nio.cs.StreamEncoder.implFlushBuffer(Unknown Source)
        at sun.nio.cs.StreamEncoder.implFlush(Unknown Source)
        at sun.nio.cs.StreamEncoder.flush(Unknown Source)
        at java.io.OutputStreamWriter.flush(Unknown Source)
        ...
log4j:ERROR Failed to flush writer,
java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(Unknown Source)
        at sun.nio.cs.StreamEncoder.writeBytes(Unknown Source)
        at sun.nio.cs.StreamEncoder.implFlushBuffer(Unknown Source)
        at sun.nio.cs.StreamEncoder.implFlush(Unknown Source)
        at sun.nio.cs.StreamEncoder.flush(Unknown Source)
        at java.io.OutputStreamWriter.flush(Unknown Source)
 ...

I replaced the rest of the Java stack trace for you as the interesting part is in the very first line: “No space left on device”. So, obviously for some reason the disk filled up not leaving any space for the creation of files necessary to completely power on the application (cloud even be the PID file, that cannot find space on disk).

vcd:~ # df -h
Filesystem      Size  Used Avail Use% Mounted on<
/dev/sda3        28G   27G  1.3M 100% /<
udev            1.3G  108K  1.3G   1% /dev
tmpfs           1.3G  663M  594M  53% /dev/shm
/dev/sda1       128M   21M  101M  17% /boot
vcd:~ #

The df command shows, that the root volume filled up entirely! Now, lets find out why that is: I used the du command to find heavily used folders:

vcd:/ # du -h --max-depth=1
0    ./proc
3.2G    ./u01
20K    ./srv
44M    ./var
4.0K    ./media
8.0K    ./mnt
565M    ./usr
133M    ./lib
15M    ./boot
4.0K    ./selinux
663M    ./dev
11M    ./lib64
22G    ./opt
264K    ./tmp
7.3M    ./etc
40K    ./home
152K    ./root
7.5M    ./bin
16K    ./lost+found
0    ./sys
8.7M    ./sbin
27G    .
vcd:/ #

As you can see, there is one extraordinarily big folder: /opt, so lets drill down to see where this comes from.

vcd:/ # cd /opt/
vcd:/opt # du -h --max-depth=1
8.0K    ./keystore
22G    ./vmware
22G    .
vcd:/opt # cd vmware/
vcd:/opt/vmware # du -h --max-depth=1
<code>22G    ./vcloud-director</code>
...
22G    .
vcd:/opt/vmware # cd vcloud-director/
vcd:/opt/vmware/vcloud-director # cd vcloud-director/
vcd:/opt/vmware/vcloud-director # du -h --max-depth=1
...
22G    ./data
22G    .
vcd:/opt/vmware/vcloud-director # cd data/
vcd:/opt/vmware/vcloud-director/data # du -h --max-depth=1
...
22G    ./transfer
22G    .
vcd:/opt/vmware/vcloud-director/data #

The 22GB come from the /opt/vmware/vcloud-director/data/transfer folder. The transfer folder is used for activities like uploading and downloading media of VMs to and from vCloud Director and this folder filled up when I exported my vApp earlier:

vcd:/opt/vmware/vcloud-director/data/transfer # du -h --max-depth=1
...
22G    ./cb3f993c-ebee-4317-8067-d31ff2bf4bdb
22G    .
vcd:/opt/vmware/vcloud-director/data/transfer # cd cb3f993c-ebee-4317-8067-d31ff2bf4bdb/
vcd:/opt/vmware/vcloud-director/data/transfer/cb3f993c-ebee-4317-8067-d31ff2bf4bdb # ls -hl
total 22G
-rw------- 1 vcloud vcloud 8.1G Mar 24 05:41 vm-44998f9b-6ae0-4338-9f25-3f7026b4db75-disk-1.vmdk
-rw------- 1 vcloud vcloud 296M Mar 24 05:25 vm-5beaf927-2c35-4f6b-b60b-73112f8980fc-disk-0.vmdk
-rw------- 1 vcloud vcloud  11G Mar 24 05:25 vm-6d96a394-ab53-4ccd-bd5a-bce6ba33ee51-disk-0.vmdk
-rw------- 1 vcloud vcloud 1.3G Mar 24 05:04 vm-b0a03be7-8c51-4288-9340-60ce585c5783-disk-1.vmdk
-rw------- 1 vcloud vcloud 1.5M Mar 24 05:04 vm-b0a03be7-8c51-4288-9340-60ce585c5783-disk-2.vmdk
-rw------- 1 vcloud vcloud  92K Mar 24 05:04 vm-b0a03be7-8c51-4288-9340-60ce585c5783-disk-3.vmdk
-rw------- 1 vcloud vcloud 576M Mar 24 05:05 vm-b0a03be7-8c51-4288-9340-60ce585c5783-disk-4.vmdk
vcd:/opt/vmware/vcloud-director/data/transfer/cb3f993c-ebee-4317-8067-d31ff2bf4bdb #

In a subfolder I found a lot of VMDK files belonging to my exported virtual machines.

So what can we do? First, I decided to remove all files from the transfer folder:

vcd:~ # rm -fR /opt/vmware/vcloud-director/data/transfer/*

Next, we add a second hard disk with an appropriate size. After attaching the disk through vSphere (Web) Client, we need Linux to rescan for new disks:

vcd:~ # fdisk -l
 
Disk /dev/sda: 32.2 GB, 32212254720 bytes
255 heads, 63 sectors/track, 3916 cylinders, total 62914560 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0000d874
 
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1            2048      272383      135168   83  Linux
/dev/sda2          272384     4483071     2105344   82  Linux swap / Solaris
/dev/sda3   *     4483072    62914559    29215744   83  Linux<
<code>vcd:~ #
vcd:~ # echo "- - -" &gt; /sys/class/scsi_host/host0/scan
vcd:~ # echo "- - -" &gt; /sys/class/scsi_host/host
host0/ host1/ host2/ 
vcd:~ # echo "- - -" &gt; /sys/class/scsi_host/host1/scan
vcd:~ # echo "- - -" &gt; /sys/class/scsi_host/host2/scan
vcd:~ #
vcd:~ # fdisk -l
 
Disk /dev/sda: 32.2 GB, 32212254720 bytes
255 heads, 63 sectors/track, 3916 cylinders, total 62914560 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0000d874
 
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1            2048      272383      135168   83  Linux
/dev/sda2          272384     4483071     2105344   82  Linux swap / Solaris
/dev/sda3   *     4483072    62914559    29215744   83  Linux
 
Disk /dev/sdb: 214.7 GB, 214748364800 bytes
255 heads, 63 sectors/track, 26108 cylinders, total 419430400 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
 
Disk /dev/sdb doesn't contain a valid partition table
vcd:~ #

Now, lets format the disk with a filesystem: I put the file system directly on the disk without creating any partition. That works and I don’t need to resize the partition should I ever decide to increase the capacity on the disk.

vcd:~ # mkfs.ext3 /dev/sdb

Finally, we need to mount the file system to the transfer folder: Open /etc/fstab with the VIM text editor

vcd:~ # vim /etc/fstab

and add the following line:

/dev/sdb        /opt/vmware/vcloud-director/data/transfer/      ext3    defaults        0       0

Now mount the file system:

vcd:~ # mount -a
vcd:~ # mount
...
/dev/sdb on /opt/vmware/vcloud-director/data/transfer type ext3 (rw)
vcd:~ #

The last line shows that the file system was mounted successfully.Before we start vCloud Director we have to make sure the user “vcloud” has access to the file system. Otherwise, we get

Error starting application (com.vmware.vcloud.vdc.impl.TransferServerSpoolAreaVerifier@4f2938f): Transfer spooling area is not writable: /opt/vmware/vcloud-director/data/transfer

in cell.log instead of

Successfully verified transfer spooling area: /opt/vmware/vcloud-director/data/transfer
vcd:~ # chown -R vcloud:vcloud /opt/vmware/vcloud-director/data/transfer
vcd:~ #

So we should try and start the vcloud director service:

vcd:~ # /etc/init.d/vmware-vcd start
Starting vmware-vcd-watchdog:                                                                                                             done
Starting vmware-vcd-cell                                                                                                                  done
vcd:~ # /etc/init.d/vmware-vcd status
vmware-vcd-watchdog is running
vmware-vcd-cell is running
vcd:~ #

The last two line in cell.log look a lot better now

Successfully bound network port: 80 on host address: 10.173.10.22
Successfully bound network port: 443 on host address: 10.173.10.22

and vCloud Director is back online!

Hope that helped!

vShield Manager fails to enable VXLAN on the cluster

Hi folks, I just experienced big trouble enabling VXLAN on a cluster. I dug deeper to find the issue and now would like to share my findings with you.

The Problem

I installed vShield Manager )(VSM) in an HA/DRS cluster in order to get VXLAN to run with vCloud Director 5.1. The “preparation” step of the “Network Virtualization” section in VSM’s failed  with no really great error message: “Not Ready”.

First, I suspected problems with DHCP on the VLAN dedicated to VXLAN traffic but then I realized something: In vSphere, I could see the port group being created but no VMkernel Ports were deployed. So the issued had to be prior to configuring IP addresses via DHCP.

Easy Fix

I started a tail -f on the following files:

  • /var/log/vmkernel.log
  • /var/log/hostd.log
  • /var/log/esxupdate.log

In esxupdate.log I found something interesting:

Sorry we lost this figure due to an HDD crash. We are going to replace it soon!

Sorry we lost this figure due to an HDD crash. We are going to replace it soon!

From the log above, we can tell that the “preparation” step includes downloading and installing  a VIB file to actually bring the functionality of VXLAN into ESXi. The download happens from a URL on vCenter Sever. The message tells pretty clearly that the file could not be downloaded due to DNS resolution problems.

Fix: I miss-configured the DNS server setting during prior ESXi host configuration. 

I did some research on the web about that installation/preparation process, too, and found other issues related to the procedure: On viktorious.nl it was a firewall related problem. ESXi’s host firewall disallowed connecting to web servers from within ESXi Shell. Enabling the “httpClient” firewall rule solved the problem.

Hope that helped!

vCloud vCloud Director – VMRC Bandwidth Usage

Last week I was asked about the estimated bandwidth requirement for a VMRC based console connection through a vCloud Director cell. Well, I did not know at the time, so I set up a little test environment. The results I want to share with you now.

In vSphere the question for the bandwidth consumption of a vSphere Client console window is rather pointless. Unless we are talking about a ROBO (remote offices and branch offices) installation, console connections are made from within the company LAN where bandwidth is much less of an issue.

vsphere client console1 vCloud Director   VMRC Bandwidth Usage

Figure 1: Remote console connection with vSphere Client.

The fat dashed line indicates the connection made from vSphere Client directly to ESXi in order to pick up the image of a virtual machine’s console.

With vCloud Director things are a bit different: Customers have access to a web portal and create console connections to their VMs through the VMRC (virtual machine remote console) plug-in. Though the plug-in displays the console image, the connection to ESXi is not initiated by it. Instead the VMRC connects to the vCD cell’s console proxy interface. vCD then connects to ESXi. This means a vCD cell acts as a proxy for the VMRC plug-in.

vmrc bandwidth vCloud Director   VMRC Bandwidth Usage

Figure 2: Remote console through the vCloud Director web portal.

Of course, the browser could be located inside the LAN, but especially in public cloud environments this traffic will flow through your WAN connection.

Testing the Bandwidth

The bandwidth consumed by a single remote console connection depends on what is being done inside VM. So, In my testings I monitored bandwidth in three different situations:

  1. writing some text in notepad
  2. browsing the internet
  3. watching a video

Of course, the configured screen resolution and color depth has to be considered, too. But this is not going to be a big evaluation of the performance but rather an attempt to give you – and myself – an impression and rough values to work with.

To get the actual values, I used the Advanced Performance Charts to monitor the console proxy NIC of my vCloud Director cell:

Screenshot 06242012 083126 PM 300x206 vCloud Director   VMRC Bandwidth Usage

Figure 3: Network performance of a vCD cell during a VMRC connection.

I started the testing after 8 PM, so please ignore the the spikes on the left. The first block of peaks after 8 PM is the result of writing text in notepad. I did not use a script to simulate this workload which is probably the reason why values are not very constant. Towards the end, I reached a fairly high number of keystroke per second – probably higher than what would be an average value. The estimated average bandwidth is around 1400 KBps. After that, I started a youtube video. The video was high resolution but the player window remained small. Still, I reached an average of maybe 3000 KBps! Browsing a few web sites and scrolling the browser window seems to create a slightly lower amount of network I/Os. Most likely, a realistic workload includes reading sections before scrolling, so the bandwidth consumption would be even lower than the measured average of – let’s say – 1600 KBps.

As we have seen the protocol used for the VMRC connection is not a low bandwidth implementation. Implementing your cloud, you should definitely keep that in mind. A single VMRC connection does not harm anyone, but having several 10 concurrent connections might congest your WAN connection depending on what you have there. Also could a single customer influence performance of another!

How do we solve this? Well, if you have a problem with VMRC bandwidth this is a limitation of your WAN connection. All you can do from the vCloud Director’s side is set a limit on the maximum number of concurrent connections per VM:

vcd 300x218 vCloud Director   VMRC Bandwidth Usage

Figure 4: Customer Policies: Limits

But this works only for connections to the same VM! A more professional solution would include the an appliance placed in front of the vCD cells that performs traffic shaping per VMRC connection. Maybe your load balancer can do this!

© 2019 v(e)Xpertise

Theme by Anders NorénUp ↑