TagNetworking

Migration of a Distributed Switch to a new vCenter – Important Things to Know

I promised an article on this topic several times and finally I found the time to do some further testing on this issue I saw a few months ago.

With a customer I had this discussion of how to migrate to a newly created vCenter while keeping ESXi hosts running with the old distributed switch. (There might be several reasons, e.g. migrating the vCenter & DB to a new OS, data-loss and recovery problems, migration p2v or v2p, etc.)

Starting out with a vCenter 5.0, we were thinking about ways to achieve a seamless migration. We came up with 4 solutions:

  1. Migrate virtual machines from vDS to vSS. Then connect ESXi servers to the new vCenter, create a new vDS and migrate VMs back to vDS. Since this can get very complex and also prune to mistakes in a larger environment, we looked for other possibilities….
  2. Automating those steps via scripts… Gabrie van Zanten have written about this and distributed a script for exactly this scenario
  3. The third idea was to check if we can export the vDS settings out of the vCenter database and include it in the new instance…. Since I work around directly in databases only when I see no other options the last idea came in mind by using a new feature in vSphere 5.1
  4. Upgrade the old vCenter… Install a new vCenter 5.1 and use the export/import feature of the web client like it’s explained here

This new feature was exactly what we were looking for…. unfortunately it never really worked out in my environment…

In my opinion the following process should do what I want:

  1. Backup the vDS on the old vCenter
  2. Restore the vDS on the new vCenter (preserve the original distributed switch and port group identifiers)
  3. Disconnect and remove the ESXi hosts from the old vCenter
  4. Connect the ESXi to the new vCenter

Unfortunately my result were the following

Capture21

Even though the import was showing no errors and the vDS ID in net-dvs showed the same ID as in the backup file I wasn’t able to get some harmony between the ESXi and the vCenter (even a second recovery of the vDS now hasn’t worked)

After some troubleshooting I found out that it’s necessary to keep the right order in this migration process:

  1. Backup the vDS on the old vCenter
  2. Disconnect and remove all of the ESXi hosts on the old vCenter
  3. Connect all ESXi hosts to the new vCenter
  4. Restore the vDS on the new vCenter (preserve the original distributed switch and port group identifiers)

Keeping this order is the only way the migration works as expected. If you have done the migration like I have described it the first time or if you don’t migrate all ESXi hosts at the same time (can’t recommend this), you need to add the ESXi hosts to the recovered vDS manually.

Capture5

Capture6

Just make sure you select the same vmnics as you have configured it in the original vDS.

Capture7

and finally the error was gone…migration successful

Capture8

All of those operations can be done without interrupting the Service of the VM….so a seamless migration to a new vCenter can be done very easily once you keep in mind the correct order how to do it

My First Autoneg Failure

Believe it or not: until yesterday I haven’t seen any auto negotiation failures yet. icon biggrin My First Autoneg Failure I thought maybe there are others out there who haven’t seen it happen yet, so I am sharing this with you now.

For those of you who don’t know the term, here follows a little explanation: As you connect e NIC to a switch, the switch port and NIC will negotiate speed and duplex mode. The speed value can take a value of 10, 100, 1000 or 10000 mbit while duplex can be either full or half. Sometimes this autoneg fails and then weird things happen icon wink My First Autoneg Failure

Before I actually saw the issue happening myself, I thought the NIC would end up with something like 100/half instead of 1000/full. But in my case yesterday, it was a bit different:

Sorry we lost this figure due to an HDD crash. We are going to replace it soon!

Sorry we lost this figure due to an HDD crash. We are going to replace it soon!

As you can see, the NIC is configured for autoneg and shows a link status on “Connected’, proper duplex configuration but the speed value seems a bit odd icon wink My First Autoneg Failure I guess we haven’t  arrived at 65 Gbit NICs yet icon razz My First Autoneg Failure

It was a guess into the blue, but when I nailed the NIC to 1000/full everything started working just fine icon smile My First Autoneg Failure

Monitoring vSphere 5.1 with Nagios/Icinga – Part 1

Monitoring can and should be a very important topic in a lot of environments that I see being addressed with Nagios or Icinga very often lately. Typically, there are a few ways of monitoring a vSphere environment from Nagios/Icinga. In this series of articles, I am going to show you which options you have and tell you about their pros and cons with regards to quantity – the amount of information you can gather – and quality with regards to scalability. In this and following posts, I will assume you have a certain knowledge and familiarity with Nagios/Icinga reading this. This means I will not walk you through every single step of configuring the system to monitor the environment but rather point you in the direction.

Overview of Monitoring

In this first part, I would now like to introduce the most common interfaces for monitoring vSphere just to give you an overview of how information can be accessed.

What to monitor?

Before we start digging into various interfaces and possibilities to monitor object, we should think about what we actually want to monitor. So basically, this question comes down to “What components of the environment need monitoring?” We can categorize those components into 3 layers:

Sorry we lost this figure due to an HDD crash. We are going to replace it soon!

Sorry we lost this figure due to an HDD crash. We are going to replace it soon!

Infrastructure
  • Storage (performance, capacity, …)
  • Networking (switches, routers, …)
  • ESXi host hardware state
vSphere
  • ESXi hosts (config. changes, performance metrics, hardware state, …)
  • vCenter Server (config. changes, free resources)
  • vCenter database (service availability)
  • Virtual Machines (power state, reachability, performance metrics)
Services
  • service checks for web servers, database servers, remote desktop, etc. etc.
  • service performance like response times
  • applies to vCenter Server and database, too

As you can see, you are going to monitor performance and service metrics in a lot of places. The difficult thing about monitoring performance is to define appropriate thresholds for warning and critical conditions. I recommend you have a look at Duncan’s great article on esxtop which lists a lot of metrics and their suggested thresholds (http://www.yellow-bricks.com/esxtop/) or drop the idea of monitoring performance from Nagios entirely and rely on vCenter Operations or similar solutions instead.

vSphere Interfaces for Monitoring

Now that we know what to monitor we can think about how to actually gather the information. VMware provides a few interfaces that could potentially be used for our purposes:

ICMP
ESXi hosts and vCenter servers repond to ICMP ECHO requests. In case you don’t know: ICMP ECHO request and response are commonly known as “ping” icon wink Monitoring vSphere 5.1 with Nagios/Icinga   Part 1 Well, not a very great and detailed way of monitoring but it is something and something is better than nothing, right? icon biggrin Monitoring vSphere 5.1 with Nagios/Icinga   Part 1

Secure Shell
ESXi hosts can be configured to start an SSH server on port 22 which will allow connection from the root user. This can be used to remotely execute commands on the host and parse their output. The same is applicable for vCenter Server Appliance systems. Based on Linux, they provide access via SSH, too.

vSphere API
The vSphere API is a SOAP based interface provided by ESXi host and vCenter Servers. It allows programmatical access to all vSphere objects, their states and configuration. The service listens on port 443 and can be accessed using one of the SDKs provided by VMware. For the purpose of monitoring from Nagios/Icinga, the SDK for Perl comes in very handy.

SNMP
The simple network management protocol (SNMP) is implemented in all ESXi hosts. While ESXi hosts can be polled for SNMP data on UDP port 161 and be configured to send traps. vCenter servers can send traps only.
ESXi servers support a large variety of MIBs allowing a lot of information to be gathered and checked. Since version 5.1, VMware supports a total of 44 MIB files which can be downloaded from vmware.com.

CIM
The Common Information Model defines a standard way to access elements in an IT environment  and relations between components. Created by the DTMF (Distributed Management Task Force), CIM targets the unification of current standards like IPMI, SNMP, etc. ESXi hosts run the sfcb service that listens on port 5989 for SSL connections. WSMAN (Web Service Management) clients can connect, authenticate and query information from the system.

Stay tuned for part 2 to see how to make use of the interfaces just discussed and find out about advantages of each.

Simulating a WAN Connection in the Lab – Part 2

In part 1 I showed you how to simulate a layer 3 WAN connection in your lab. To accomplish this, we simple configured Linux to be a router between two networks and put some traffic shaping on the connection to simulate a WAN line.

It is now time to cover layer 2 connections between two sites. The basic configuration is the same as in part 1: A Linux virtual machine with two NICs, each connected to the other end of the simulated sites – in this case: port groups.

figure1 Simulating a WAN Connection in the Lab   Part 2

Connecting the Sites on Layer 2

Just to recall part 1: We connected two port groups / sites on layer 3 by placing a Linux VM as a router between the two networks. We had different IP subnets on each side (10.173.10.0/24 and 10.173.11.0/24) with the Linux VM owning an IP address in each network (10.173.10.1 and 10.173.11.1). For the other systems on each site that Linux VM served as the default gateway.

Connecting the sites on layer 2, we will configure the two port groups to be two different VLANs (just as in part 1) BUT to appear as a single VLAN! Sounds a bit confusing, right? Well, remember that we configured VLANs to simulate two different geographical sites. Connecting them on layer 2, they will appear to be a single site and isn’t that exactly what we want to achieve? Of course it is! So, instead of two different IP subnets on each site / port group we will have a single subnet span both sites.

figure4 Simulating a WAN Connection in the Lab   Part 2

Epic Fail: Linux Bridge

My first try to bridge the two VLANs using a Linux VM failed miserably. I configured a software bridge (like a vSwitch) inside the Linux OS and hooked it to eth0 and eth1 – the two virtual NICs of that VM. I accepted promiscuous mode on the port groups and enabled it on the vNICs. Everything should work – well, in theory. In reality, a bug (not sure who is responsible) seems to prevent his from working. I found this thread on the issue in the VMware Community forums: http://communities.vmware.com/thread/262520. Only took to 3 days to find out it wouldn’t work – anything for the fame, right?  This is what I tried and what would have been the much nicer solution:

figure3 Simulating a WAN Connection in the Lab   Part 2

The Solution: Proxy ARP

I remembered, that vCloud Director – or rather vShield Edge – does something very similar which is called a “vApp fenced network”. In this scenario, a network is deployed for the vApp network and it is hooked to an organization network via vShield Edge. BUT, vShield Edge is not acting as a router here: we have the same subnets on each side of the VSE appliance. I described how that works here some time ago: vCloud Director – Fenced Networks Explained. We can use that same technique of “Proxy ARP” to connect our port groups on layer 3:proxy-arp

Proxy ARP

In this figure above, you can see the basic concept of proxy ARP. The system (r1) sitting in between the two switches (sw1 and sw2) responds to ARP requests from SRC to DST in place of DST. SRC will receive r1′s left MAC address and establish it as DST’s MAC address in its ARP table. Sending something for SRC to DST – for example a ping – the packet will be address witch r1′s MAC address but the IP address of DST so that r1 will forward it to DST. The same concept is true for the other direction. Technically, this is rather a routing setup than a layer 2 connection but it serves our purpose of having the same subnets on both sides of the WAN connection.

Configuring Linux for Proxy ARP

The Manual Way

Proxy ARP is pretty tricky thing to configure, so it might come in handy to have drawing of what we want to achieve:

figure5 Simulating a WAN Connection in the Lab   Part 2

  • network is 10.173.10.0/24
  • default gateway on 10.173.10.1
  • proxy ARP system on 10.173.10.2
  • site1-VM on 10.173.10.10
  • site2-VM on 10.173.10.20
  • site1-VM and default gateway on eth0′s side of the linux VM
  • site2-VM on eth1′s site of the linux VM

To make this work, we have to do some IP configuration, remove and add some rule from the routing table and finally enable proxy ARP and routing inside the Linux VM. The easiest way to do this is to edit /etc/network/interfaces. Putting the configuration there makes it persistent, too:

auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
        address 10.173.10.2
        netmask 255.255.255.0
        network 10.173.10.0
        broadcast 10.173.10.255
        gateway 10.173.10.1
        up sysctl net.ipv4.conf.eth0.proxy_arp=1
        up ip route del 10.173.10.0/24 dev eth0
        up ip route add 10.173.10.1 dev eth0
        up ip route add 10.173.10.10 dev eth0

auto eth1
iface eth1 inet static
        address 10.173.10.2
        netmask 255.255.255.0
        network 10.173.10.0
        broadcast 10.173.10.255
        up sysctl net.ipv4.conf.eth1.proxy_arp=1
        up ip route del 10.173.10.0/24 dev eth1
        up ip route add 10.173.10.20 dev eth1

Now lets see what happens: Note that eth0 and eth1 are both configured to have the exact same IP addresses, network masks and as a result network and broadcast addresses. The default gateway is set for eth0 but not for eth1. Secondly, each interface is enabled for proxy ARP using the sysctl command. The configuration of both interfaces eth0 and eth1 create routing table entries like this:

10.173.10.0/24 dev eth0  ...
10.173.10.0/24 dev eth1  ...

They tell the system that every IP in the 10.173.10.0/24 subnet is behind eth0 AND eth1, which is simply wrong in two aspects:

  1. Not every IP is behind eth0/eth1 as some of them are behind eth0 while other reside behind eth1.
  2. No single IP address can be behind eth0 and eth1 at the same time.

Therefore, those wrong routing table entries have to be removed with the ip route del lines.

Finally, the system must be told which IP in the 10.173.10.0/24 subnet can be found behind which network interface. The routing table entries for this are established by the ip route add lines.

Now, set the net.ipv4.ip_forward flag to 1 in /etc/sysctl.conf and enable the configuration with sysctl -p – just as described in part 1.

Reboot or try /etc/init.d/networking restart but I didn’t test that. We want the configuration to be persistent across reboots anyway, so this is a good time to test it. icon wink Simulating a WAN Connection in the Lab   Part 2

The Automated Way

The bad news about the configuration above is that it is very static. Whenever a system is created or deleted on either one or the other side of the proxy ARP router, you will have to open a shell to the system and remove or create routing table entries respectively. Luckily, there is a way to automate this: parprouted. It does not only add and delete entries in the routing table but also configure the whole proxy ARP setup for you. Your /etc/network/interfaces file will look a lot simpler:

auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
        address 10.173.10.2
        netmask 255.255.255.0
        network 10.173.10.0
        broadcast 10.173.10.255
        gateway 10.173.10.1

auto eth1
iface eth1 inet static
        address 10.10.10.10
        netmask 255.255.255.0
        network 10.10.10.0
        broadcast 10.10.10.255

The IP address chosen for eth1 can be any arbitrary one. It doesn’t have to be taken from any subnet the system is connected to but is has to be set! Otherwise, parprouted will fail.

Now, all you have to do is execute

parprouted eth0 eth1

after every reboot of your system or simple wrap it into an init script (the debian package for parprouted does not bring one, so we write our own):

/etc/init.d/parprouted

#! /bin/sh
# /etc/init.d/parprouted
INTERFACES="eth0 eth1"
DAEMON="/usr/sbin/parprouted"

case "$1" in
start)
        echo "Starting parprouted..."
        $DAEMON $INTERFACES
;;
stop)
        echo "Stopping parprouted..."
        killall parprouted
;;
*)
        echo "Usage: /etc/init.d/parprouted {start|stop}"
        exit 1
;;
esac
exit 0

Make the file executable with

chmod +x /etc/init.d/parprouted

and insert it into the boot process:

update-rc.d parprouted defaults

A warning about missing LSB tags will show but that can be ignored. Now – if not already done so – start parprouted using your init script:

/etc/init.d/parprouted start

or reboot the system.

DHCP Relay

Still, proxy ARP does not fully behave like a real layer 2 connection between the two port groups. What is missing is DHCP: right now, a system on the left side can send a DHCP request but will never receive a response if the DHCP server resides on the right side of the router. We need the proxy ARP router to “relay” DHCP traffic:

apt-get install isc-dhcp-relay

During the installation you should be asked about

  1. the address of the DHCP server and
  2. the interfaces to listen for DHCP requests on.

The resulting /etc/default/isc-dhcp-relay configuration file should have the following contents:

SERVERS="10.173.10.10"
INTERFACES="eth0 eth1"
OPTIONS=""

Traffic Shaping

WAN simulation – rate limitation and latency – can be established in the exact same way I described in part 1. Just download the wan-simulate script and execute is as described.

Okay folks, I know that was probably not the easiest thing to understand and configure but I hope, with the instructions above and some minor changes adjusting the configuration to your environment, you will be able to replicate this in your lab. I wasn’t sure whether the configuration was complex enough to provides this as an appliance. Please let me know if an appliance would be of any use for you! Have fun!

Simulating a WAN Connection in the Lab – Part 1

I am currently working on a proof of concept setup in my lab to test DRBD accross WAN connections to see how it behaves and to analyze the impact of DRBD Proxy on WAN scenarios. But there are other situation in which you might want to simulate high latency, low bandwidth, lossy connections: VMware Site Recovery Manager (SRM), vSphere stretch clusters, VMware View to test PCoIP settings for connections from remote locations, etc etc.

So, what I did to circumvent the problem of not having a WAN line in my lab was to set up a Linux VM with Debian Squeeze and to configure it as a link between two sites (port groups) shaping traffic down to the characteristics I wanted to see. There are two ways to do this: on layer 3 and on layer 2.

Either way, install a Debian Squeeze VM with two virtual NICs attached and connect each NIC to a different port group (they represent the two ends of the WAN connection). Please note, that the two port groups are configured to be in different VLANs! This way, we can simulate physical separation of the two port groups / sites.

figure1 Simulating a WAN Connection in the Lab   Part 1

In this first part 1 of the the tutorial, I am going to show you how to do this on layer 3:

Connecting the Sites on Layer 3

“Layer 3″ means to have different subnets on each site. In this case, the Linux VM will act as a router. The easiest way to do this, is to configure the router VM to be the default gateway in the respective subnets.

figure2

Preparing Debian to be a Router

First, edit the /etc/network/interfaces file to configure the correct IP addresses at each interface.

vim /etc/network/interfaces

It should look something like this:

auto lo
iface lo inet loopback

allow-hotplug eth0
iface eth0 inet static
        address 10.173.10.1
        netmask 255.255.255.0
        network 10.173.10.0
        broadcast 10.173.10.255
        gateway 10.173.10.1

allow-hotplug eth1
iface eth1 inet static
        address 10.173.11.1
        netmask 255.255.255.0
        network 10.173.11.0
        broadcast 10.173.11.255

Of course, adjust the IP addresses, network masks, network addresses and broadcast addresses according to your needs. Save the changes and type

/etc/init.d/networking restart

to have the settings become effective.

The next step is to enable IP forwarding on this system.

vim /etc/sysctl.conf

Find the line that reads

# net.ipv4.ip_forward=1

and remove the comment:

net.ipv4.ip_forward=1

Now run

sysctl -p

to apply the configuration changes.

Setting up Traffic Shaping

Traffic shaping with Linux can be accomplished using the tc command. It is a quite complex thing to do but there are a few pretty good tutorials out there as well as code snippets. I used netem to simulate latencies and tbf (token bucket filter) to put a rate limit on the link and wrapped it into this little script: wan-simulate. The script starts out like this and might require some changes from you:

#!/bin/bash

###############################
# CONFIGURE HERE
###############################
INT1_NAME="eth0"
INT2_NAME="eth1"
BW="1024kbit"
LOSS="0.1%"
DUPE="1%"
CORRUPT="0.1%"
RTT_MS="100"

Above, you can set the two network interfaces to the two sides of the virtual router, the desired maximum bandwidth, the lossyness of the link, the package duplication and corruption rate as well as the target latency in milliseconds. To make the WAN simulation a bit more realistic, the latency will vary by 10% from packet to packet. To configure the bandwidth be aware of the units used by tc.

kbps   Kilobytes per second

mbps   Megabytes per second

kbit   Kilobits per second

mbit   Megabits per second

bps or a bare number   Bytes per second

To start the WAN simulation make the script executable

# chmod +x wan-simulate

and run

# ./wan-simulate start

To stop the simulation type

# ./wan-simulate stop

and execute

# ./wan-simulate status

to check the current configuration.

Verifying Line Characteristics

To test my configuration I used ping to validate latency and iperf for bandwidth. First, lets to the testing without the wan simulation:

root@site1:~# ping -c 4 10.173.11.10
PING 10.173.11.10 (10.173.11.10) 56(84) bytes of data.
64 bytes from 10.173.11.10: icmp_req=1 ttl=63 time=0.596 ms
64 bytes from 10.173.11.10: icmp_req=2 ttl=63 time=0.625 ms
64 bytes from 10.173.11.10: icmp_req=3 ttl=63 time=0.607 ms
64 bytes from 10.173.11.10: icmp_req=4 ttl=63 time=0.631 ms

--- 10.173.11.10 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2998ms
rtt min/avg/max/mdev = 0.596/0.614/0.631/0.033 ms
root@site1:~# iperf -c 10.173.11.10
------------------------------------------------------------
Client connecting to 10.173.11.10, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  3] local 10.173.10.10 port 43542 connected with 10.173.11.10 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec    702 MBytes    589 Mbits/sec
root@site1:~#

Now, I activate the wan simulation and redo the testing:

root@site1:~# ping -c 4 10.173.11.10
PING 10.173.11.10 (10.173.11.10) 56(84) bytes of data.
64 bytes from 10.173.11.10: icmp_req=1 ttl=63 time=95.5 ms
64 bytes from 10.173.11.10: icmp_req=1 ttl=63 time=99.2 ms (DUP!)
64 bytes from 10.173.11.10: icmp_req=2 ttl=63 time=105 ms
64 bytes from 10.173.11.10: icmp_req=3 ttl=63 time=97.5 ms
64 bytes from 10.173.11.10: icmp_req=4 ttl=63 time=96.0 ms

--- 10.173.11.10 ping statistics ---
4 packets transmitted, 4 received, +1 duplicates, 0% packet loss, time 3001ms
rtt min/avg/max/mdev = 95.549/98.790/105.536/3.615 ms
root@site1:~# iperf -c 10.173.11.10
------------------------------------------------------------
Client connecting to 10.173.11.10, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  3] local 10.173.10.10 port 43543 connected with 10.173.11.10 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.4 sec  1.23 MBytes    992 Kbits/sec
root@site1:~#

This ping test was actually pretty amazing as it shows the very rare case of a duplicate package! See the “DUP!” after the third line of the ping? It reports icmp_req=1 just like the one before. This means the ICMP response package was received twice!

In some cases, it might be desirable to connect two sites on layer 2 instead in order to have a single subnet span multiple physical sites. I soon will be needing this in my lab, too, and I will share this with you as soon as possible in part 2.

PVLAN – A Widely Underutilized Feature

Hello folks, after some time of laziness I am now back with a short note on Private VLANs. I figured use cases for the Private VLAN feature are widely unknown in the field and therefore PVLAN is probably one of the most underutilized features of vSphere out there. Avoiding redundancy on the internet I don’t want to explain PVLAN here, so please read

in case you are not familiar with the concept, yet.

Use Case 1: Saving VLAN IDs

This is probably the only use case people are usually familiar with. Here is the situation: A hosting provider is concerned with providing network connectivity to multiple customers / tenants. For security reasons maybe, customers are to be isolated on layer 2: VLANs. With a possible values ranging from 0 through 4095 for VLAN IDs, we can create up to 4096 VLANs. Some VLAN IDs will be used by the provider himself (storage, vMotion, FT, Management,  etc etc). Though, I see more and more switches supporting 4096 VLANs, many of them are still restricted to a much lower number – 256 for example. You get the point? Only a fraction of possible VLAN IDs can actually be used restricting the number of customers / tenants a hosting provider can support. Further, using VLANs you will end up defining hundreds of subnets and reconfiguring your routers to support newly created networks.

With PVLAN, a provider can now try to put as many VMs into the Isolated secondary PVLAN only using a single VLAN ID but still separating hundreds of VMs from each other. In case a customer requires multiple VMs to talk to each other that traffic could go over a router in a Promiscuous secondary PVLAN or a Community could be created to relieve load from that router. But that would require a precious VLAN ID.

Use Case 2: DMZ

Yeah, you are reading this right! PVLANs can be used in a DMZ environment perfectly well! Think about it:

  1. In a Primary PVLAN the same IP subnet is to be used.
  2. PVLANs allow to further control traffic flow on layer 2 without the requirement for routers or firewalls.

Doesn’t that sound like something you want in a DMZ?

In a DMZ environment, we usually have two firewalls: an outer one protecting the DMZ from the internet and an inner one protecting the organization network from the DMZ and the internet.

The outer firewall protects VMs by filtering traffic to applications not supposed to be reachable from the internet. Still, DMZ systems can be intruded. In this case, when an attacker has gained control over a certain system in the DMZ, he or she gets unfiltered access to all other systems in the same DMZ. No firewall will stop him or her from accessing systems on certain ports.

PVLANs can help to restrict that access! Put your DMZ routers/firewalls in the Promiscuous secondary PVLAN and create Communities for multi-tiered applications. This way, intruding a VM the attacker will be contained in the Community secondary PVLAN unable to send ANY traffic to systems not part of that application.

pvlan-dmz

This dramatically restrains the attacker in the number of possible targets for further attacks!

Use Case 3: Backup Network

From time to time, I see environments that use the old-school way of making backups of systems: agents in VMs, a dedicated backup network, every VM equipped with a second NIC attached to that backup network and a central backup server. What happens here is that people wrap their heads around security for production VLANs – maybe multiple of them – but then put all VMs together in a backup network that allows every VM to talk to every other VM in the company. That creates a big big broadcast domain and gives viruses and other stuff a red carpet for spreading over to other systems.

Now here is my suggestion: On that backup network create  a PVLAN. Configure the backup server(s) to be in the Promiscuous secondary PVLAN and put all VMs into Isolated. This way every backup agent will be able to connect to the backup server but no VM will be able to talk to another VM!

pvlan-backup

Use Case 4: Desktop Virtualization

In a VDI environment a valid concern regarding security is the ability of desktops to communicate directly with each others. Like this, users can exchange data without using enterprise storage or even viruses could spread easily. With PVLAN just put desktop VMs into an “Isolated” PVLAN and place central storage in “Promiscuous” PVLAN. Done 🙂
pvlan-vdi

 

Hope that inspires you a bit and I will see PVLANs more often in the future.

vSphere Ghost NICs

Only three days ago, I came across something I first considered to be a bug. It turned out it was normal behavior that I just haven’t seen, yet. So I figured it might be worth posting:

The summary tab of a VM showed two network port groups and I thought “Oops, that VM is supposed to be connected to single network only!” I looked into the Edit Settings windows which only showed a single network adapter. That confused me a bit to be honest!

I didn’t really figure it out until I asked my colleague Fabian Lenz who pointed me to snapshots: I turned out, the VM was connected to that other network with that ghost NIC. Then a snapshot was taken. After that, the NIC was replaced with an vmxnet3 adapter and connected to a different port group. With that vmxnet3 NIC connected to one network and the snapshot holding info on the long removed e1000 connected to another, the summary tab showed both connections!

Hope that helps you saving some time!

vCloud vCloud Director – VMRC Bandwidth Usage

Last week I was asked about the estimated bandwidth requirement for a VMRC based console connection through a vCloud Director cell. Well, I did not know at the time, so I set up a little test environment. The results I want to share with you now.

In vSphere the question for the bandwidth consumption of a vSphere Client console window is rather pointless. Unless we are talking about a ROBO (remote offices and branch offices) installation, console connections are made from within the company LAN where bandwidth is much less of an issue.

vsphere client console1 vCloud Director   VMRC Bandwidth Usage

Figure 1: Remote console connection with vSphere Client.

The fat dashed line indicates the connection made from vSphere Client directly to ESXi in order to pick up the image of a virtual machine’s console.

With vCloud Director things are a bit different: Customers have access to a web portal and create console connections to their VMs through the VMRC (virtual machine remote console) plug-in. Though the plug-in displays the console image, the connection to ESXi is not initiated by it. Instead the VMRC connects to the vCD cell’s console proxy interface. vCD then connects to ESXi. This means a vCD cell acts as a proxy for the VMRC plug-in.

vmrc bandwidth vCloud Director   VMRC Bandwidth Usage

Figure 2: Remote console through the vCloud Director web portal.

Of course, the browser could be located inside the LAN, but especially in public cloud environments this traffic will flow through your WAN connection.

Testing the Bandwidth

The bandwidth consumed by a single remote console connection depends on what is being done inside VM. So, In my testings I monitored bandwidth in three different situations:

  1. writing some text in notepad
  2. browsing the internet
  3. watching a video

Of course, the configured screen resolution and color depth has to be considered, too. But this is not going to be a big evaluation of the performance but rather an attempt to give you – and myself – an impression and rough values to work with.

To get the actual values, I used the Advanced Performance Charts to monitor the console proxy NIC of my vCloud Director cell:

Screenshot 06242012 083126 PM 300x206 vCloud Director   VMRC Bandwidth Usage

Figure 3: Network performance of a vCD cell during a VMRC connection.

I started the testing after 8 PM, so please ignore the the spikes on the left. The first block of peaks after 8 PM is the result of writing text in notepad. I did not use a script to simulate this workload which is probably the reason why values are not very constant. Towards the end, I reached a fairly high number of keystroke per second – probably higher than what would be an average value. The estimated average bandwidth is around 1400 KBps. After that, I started a youtube video. The video was high resolution but the player window remained small. Still, I reached an average of maybe 3000 KBps! Browsing a few web sites and scrolling the browser window seems to create a slightly lower amount of network I/Os. Most likely, a realistic workload includes reading sections before scrolling, so the bandwidth consumption would be even lower than the measured average of – let’s say – 1600 KBps.

As we have seen the protocol used for the VMRC connection is not a low bandwidth implementation. Implementing your cloud, you should definitely keep that in mind. A single VMRC connection does not harm anyone, but having several 10 concurrent connections might congest your WAN connection depending on what you have there. Also could a single customer influence performance of another!

How do we solve this? Well, if you have a problem with VMRC bandwidth this is a limitation of your WAN connection. All you can do from the vCloud Director’s side is set a limit on the maximum number of concurrent connections per VM:

vcd 300x218 vCloud Director   VMRC Bandwidth Usage

Figure 4: Customer Policies: Limits

But this works only for connections to the same VM! A more professional solution would include the an appliance placed in front of the vCD cells that performs traffic shaping per VMRC connection. Maybe your load balancer can do this!

© 2019 v(e)Xpertise

Theme by Anders NorénUp ↑