IMO: Is SMP fault tolerance even useful? My view on it!

Maish Saidel-Keesing has written a post about the fault-tolerance topic with multiple vCPUs a few weeks ago. He has valid points in his argumentation, but anyway I want to give you a little bit of my view on this topic (IMO).

With fault-tolerance two VMs are running nearly symmetrical on 2 different ESXi hosts with one (primary) processing IO and the other one dropping it (secondary). With the release of vSphere 6.0 VMware will support this feature with a VM of up to 4vCPU and 64 Gbyte memory. [More Details here]

I try to summarize the outcome Maish’s argumentation:

FT is not the big deal feature since it only protects against a hardware failure of the ESXi host without any interruptions in the service of the protected VM. It does NOT detected or deal with a failure at Operating Systems and Application level.

So what Maish think we really need are cluster-mechansims on application level and if legacy applications don’t.

I would in general not disagree with this opinion. In an ideal world all applications would be stateless, scaleable and protectable with a load-balancer in front of them. But we will need 1X or more years until all applications are built in such a new ‘modern’ way. We will not get rid of the legacy applications in the short-term.

Within the last 4 years of beeing an instructor I received one questions nearly every time when delivering a vSphere class:

‘Can we finally protect our SMP-VMs now with Fault Tolerance? No?! Awww :(‘

So I would not say there is a not a need out there for this feature. Being involved in some bidding last year we had very often the requirement to deliver a system for automation-solutions within large building-complexes (airports, factories, etc.).

Software being used in such domains are sometimes legacy application par excelente (ironic) programmed with a paradigm long before agile/restful/virtualization played a role in the tech-world.  Sometimes you can licence a cluster feature (and pay 10 time as much as for a 1-node licence) – sometimes you can’t cluster it and need other ideas or workaround to increase the availability.

Some biddings were not won because of opponents who where able to deliver solutions that can (on the paper) tolerate an hardware outage without any service-/session impact.

For me with SMP-FT typical design-considerations come into play:

  • How does the cluster work? Does it work on application/OS-level or does it only protect for a general outage?
  • What were failure/failover reasons in the past? (e.g. vCenter – in most cases I had a failure here it was because of Database problem [40%], Active Directory / SSO problem [10%], a hardware failure [45%] or rest [5%])  -> A feature like FT would protected against a huge amount of failure experienced in the past. Same considerations can be taken into account for all kind of applications (e.g. virtual load-balancer, Horizon View Connection Server etc.)
  • How much would a suitable solutions cost to make, buy or update?

Sure we need to get rid of legacy applications, but to be honest… this will be a very long road (the business decides and pays it) and once we have gotten to the point where the legacy applications are gone – the next generation of legacy applications is in place that need to be transformed (Docker?! 😉 ).

We should see FT as it is. A new tool within our VMware toolkit to fit specific requirements and protect VMs (legacy/new ones) on a new level with pros- and cons (as always). IMO every tool / feature that gives us more opportunities to protect the IT is very welcome.

VMware Update to vSphere 5.5 and Horizon View 6.0 – vCenter service not working properly

A few days ago I received a mail of a former student of mine. They have updated their VMware environment to the latest vSphere 5.5U2 and afterwards Horizon View from 5.2 to 6.0.

From a procedural point of view it has seemed that everything has worked fine. But on a second look he has realized that in the Horizon View Manager dashboard the vCenter was marked red (‘service is not working properly ‘) and pool operations were not working anymore.

vCenter service not working properly

From a systematic troubleshooting perspective I recommended him to check that the connectivity between the Connection and Server was doing fine. OSI Layer 1-4 were working well (ports haven’t been changed as well between the VMware versions). For the connectivity check of layer higher than 4 I told him to check the ‘classical-access-logs’ to see a problem with the authentication.



and to verify that the service-account has proper vCenter access and the correct permissions set within a role.

And voila –> the service user’s vCenter permission was removed during the upgrade (-> All other permissions were still in place).  Maybe a malfunction during the SSO / AD-LDS upgrade. Unfortuneatly I am not able to have closer look to do a root-cause analysis of it.

Anyway! If you observe similiar issues –> a) Use a systematic approach to verify system-communication or b) check directly the vCenter permissions.

VMware VCAP5-DCD (Datacenter Design) – Exam experience and learning philosophies

To be honest, I didn’t wanted to write a blog post about my VCAP5-DCD exam experience, since there are soooo many good articles and posts already online. Anyway, a lot of people were asking me what resources I used to be prepared for this exam to achieve the ….


First of all… thanks a lot to everyone else who has created posts about their experiences. I think I read all of the existing posts about the VCAP5-DCD exam.

The following content is structured in the following way. If you are just interested in real facts… please go directly to the resources part 😉

  • What type of ‘learner’ am I
  • Why I did the exam
  • What resources are useful to pass the Exam
  • Personal hints and tipps on doing the exam

What type of ‘learner’ am I?

I don’t even know if the expression learner exists in the English language. Anyway I believe that everyone who is extending his knowledge needs to find out HOW he is learning in the best way. I don’t just focus here on VMware, I try to be as general as possible in the following description.

My school career showed me, that I am not a good learner in a traditional way.  If I have to read book with pure theory where I don’t have any practical relevant relationship I cannot focus on more than 2 pages. Even if I read 50 pages out of the book my mind was only really active for the first 3 minutes.

So what does this mean? I personally need a practical relationship to some ‘issues’/’events’ I experienced in my life. This experience can derive from the following:

1. having experienced something in the real world (“live the challenge”) -> Maximum personal involvement, but pretty expensive (time/cost consuming)

2. talking and listening to someone who had an experience of something similar (“feel the challenge”) -> Medium personal involvement, but it might be hard to find the right people in the correct domain (e.g. User groups, conventions, tech talks, vBeer, …)

3. reading from someone who had an experience (“read the challenge”) -> personal involvement is low, inexpensive within the world wide web.

The more personal an information is absorbed by me, the more I realize the challenge, the better and more attentive I can read/learn new things.

This is exactly the reason why I love technical blogs. People are describing things more concrete and related to their experiences in a very personal way. This is something a technical documentation or book typically does not (of course there are exceptions).

Technical documentations and books are pretty good resources and very important as well, but in my case I need a personal experience first and afterwards I can read the technical documentation much more attentive with much more take-aways, since I am than aware and can think about concrete usage of the information.

Another important thing for me is the following. If I am confronted with a lot of new information over multiple days in a specific time (Web-ex, classroom teach, breakout-sessions) I personally need a few weeks to handle all this data.

When I am ‘attacked’ by a lot of information which I was not able to process (e.g. during the class) I need a break from those topics. Even though I am not mentally- and active working on those things my brain seems to make progress on the data subconsciously (‘excuse me for the non-scientifical correctness’). And in many cases suddenly something is happening with me that I call ‘illumination’ … everything out of nothing makes totally sense  from one second to another (‘no joke, from time to time some mathematical facts I have never understood in school are suddenly illuminated in my mind 😉 you see it might take a verrry long time…. next step…  find out to accelerate the illumination phase).

As a third thing important fact it is mentionable, that I need pressure. Without time pressure, my efficiency is typically decreasing a lot.

To summarize it all…. What do I personally need to extend my knowledge in the best way? Personal involvement AND time for the illumination….

So let’s see how this works all out for the VCAP-DCD exam.

Why I did the exam

The why is always important. During my career I have met so many people with all kind of environments, I worked in a lot of projects and talked to so many experts and there is one thing I realised pretty soon.

It is incredibly important to have a good architectural design of an IT system. And it is so easy to screw IT systems up if you don’t do it right. Since I am working in the IT field as a professional (and not only as a geek/nerd who loves technologies) I was always impressed about people working in a very structured/methodological way. Today (until SKYNET rises) IT systems are supporting people AND/OR businesses. This leads to the fact that an IT system needs to align to a business. If you only look at an IT system as a summary of technical best practice you will probably have a great technical solution, but it will not be the best solution for a business itself.

The idea of creating/collecting business requirements and design/transform these information (and probably even implement them) into a solid technical solution is in my opinion a skill every architect should be capable of.

I knew from several discussions, blogs, books that VMware’s highest certification (#VCDX) is exactly about approving this skill set. Since the VCDX is still a long term goal for me, the VCAP-DCD exam was the right one to take.

So I decided to take the VCAP-DCD exam in the beginning of 2013. And I found so many good excuses to postpone it month for month since than (Projects, Master thesis, …). Since I am only focussing on VMware in my job I have already read the most common VMware literature that is recommend for any kind of vSphere related exam (VMware vSphere Design, Clustering Technical Deep Dive, …). I was often involved in Design tasks/creations within my job (projects/trainings/discussions) so the exam preparation was kind of long-term preparation with many situations where suddenly the (knowledge-) illumination has kicked in (ILLUMINATION).

As time was passing in August 2014 I was giving myself a deadline that I MUST pass the VCAP-DCD until December 2014 (PRESSURE).  So I started to learn more concrete to the blueprint…and to be honest… it was a real exciting AND effective way of learning since I already had the personal involvement and practical relationship during all of these years.

What resources are useful to pass the Exam

Now I am getting concrete about the resource I used to learn and pass the exam.

  • Exam blueprint : First of all – as for each VMware exam, the exam blueprint is the baseline for each kind of certification. I decided to take the 5.1 version because of the fact that I have worked almost one year in a very large vSphere 5.1 environment and requested the exam authorization somewhen in 2013.
  • Clustering Technical Deepdive & vSphere Design : IMO those 2 books are a compulsory reading if you want to extend your knowledge in the vSphere field.
  • #vBrownbag VCAP5-DCD Video sessions : When I met Alistaire Cooke (a big contributor of vBrownbag), during the vRockstar party at VMworld I was not aware that 1 week later I the video sessions will have a large portion   of my successfully passed exam. In those video session very good experts are talking about every objective of the exam blueprint. A must watch for everyone who wish to take the VCAP-DCD exam. Nick Marhshall has collected all Video parts together on his blog . Personally I have only focussed on those topics where I thought that I have the least knowledge.
  • VMware VCAP-DCD51 Doc package: Jason Langer created a great document about VCAP-DCD relevant documents and structured them according the exam blueprint. Most of the files are official VMware documents, that are probably out-dated now if you want to take the 5.5 exam. But anyway since the design methodologies (Paper about differences of conceptual vs. logical vs. physical design) are not pinned to a specific version it is a really useful resource as well.
  • VMware Best practices Technical Whitepaper
  • Blogs Blogs Blogs about VCAP-DCD : I will not be able to mention all I have read, but just google VCAP-DCD and you will find a lot of entries. Everyone has made different experiences with the exam… some are focussing on the timing, on the technical challenges in the exam or how they have learned for it. Just a few links I had in my bookmark list:
  • Gather hands-on design and technology experience

I know a lot of architects, telling me that an architect must not know too much technical details about their solution. Honestly I agree to a specific point of the view that an architect MUST not know every configured item in the physical design by heart. But at the same time I believe am sure the more detailed knowledge an architect has, the better his design decisions will be. So it is a big advantage if you want to study for the DCD exam that you are professional on a vSphere Operational/Administrational point-of-view. I would recommend everyone to do the VCAP-DCA exam first. It makes life and the learning much easier if you are very familiar with the technical details of a vSphere environment.

Personal hints and tipps on doing the exam

Doing a design exam is hard to learn for, since this something where the personal experience plays an important role. If you have worked in the VMware design field for a specific time, you know the technology very well and you want to improve this knowledge, define a concrete time frame when you want to do the exam. One month before the exam, start to use the resources mentioned above. Read them, understand them, try to think how these design methodologies, technical best practices would have changed your past projects.  Try not just to learn those things, understanding and illumination is the key topic to successfully pass the exam. The DCD is an exam where you always will be up to a point that you do not know everything.

One more hint… stay calm…the technical implementation especially of the design parts where you drag and drop items into a logical design is pretty bad… The flash application was hanging up 3 times during my exam and I needed to talk all the time with the pearson administrator so that he was able to restart the system and test. Those things are pretty pretty annoying…but IMO if you can’t change things, make the best out of it…. stay calm..don’t get nervous and take a mental break.

Make sure that the items are connect to each other (if you move one item all other connected items should move as well) BUT DON’T try to mark all of them in the end… the system always crashed in my case…

I am not sure if that was an unlucky accident in my exam or is fixed in the 5.5 version… but I would not risk it anymore.

So everyone who is going to go the VCAP-DCD journey… good luck and if you want more information about it, feel free to ask me.


Killing me softly/hardly/forcely. How to kill a VM via #PowerCLI

Having observed some problems with VMs that were not able to be shutdown/powered-off properly via PowerCLI I tried to find a solution.

From time to time Shutdown-VMGuest didn’t worked and even an Stop-VM with the kill option were not working as expected. I knew that ESXTOP and ESXCLI have the options to kill a VM process/world if there are no other options. But since I wanted to achieve this in PowerCLI this blog post from the year 2011 gave me the correct hint.

We can use ESXCLI via PowerCLI to fulfil that task  😉 *whoopwhoop*.

I was missing a feature to kill those worlds without authenticating directly against an ESXi-host and since ESXCLI and it’s namespaces have changed a little within the last years I wanted to document now how this can be achieved in vSphere 5.X.

First of all connect to the vCenter via

Connect-VIServer $vCenterFQDN

Get the VM-Object you are going to kill

$vm = Get-VM $nameOfTheVM

find the ESXi-Host where the VM is running on

$esxi = Get-VMHost -VM $vm

and load the ESXCLI functionality

$esxcli = Get-EsxCli -VMhost $esxi

Now it’s time to extract the WorldID out of the ESXCLI VM PROCESS LIST data

$worldid = $esxcli.vm.process.list() | where{$_.Displayname -eq $hostname} | Select WorldID

and kill the VM with the options soft, hard, force


VOILA the VM should be definitely killed right now. This ESXCLI commands is  not being tracked by the VPXA, so no events of the ‘kill’ are written down in the database. (With great knowledge comes great responsibility, right? ;-))

If you are running this command against a VM as part of an HA-Cluster. The HA-mechanism will reboot the VM after the kill. In this scenario you need to disable the HA-protection of the VM (so it is removed from the HA protected list) before you are going to kill it via.

$vm | Set-VM -HARestartPriority Disabled

I hope this information might be useful to some of you guys.

Please use the Code-Snippet here to see the fully-functional (Kill-VM.ps1) script.



CPU ready spikes & Host Power Management

We just observed some strange frequently occurring CPU ready spikes in our environment (screenshot).

cpu ready spikes

This effect occurred very frequently on each virtual machine. What caused it?

-> The Host Power Management mode which was balanced. After setting it to high-performance the spikes disappeared.

I know this might have to do with the specific hardware we are using, but since I heard about such effects from time to time, think about disabling the power management mode on the ESXi once you observe strange performance symptoms. IMO server are deserving high performance and nothing less 😉

I will change my mind once I have measured the financial benefit that might occur with the balanced power management mode. So if you have any concrete facts and number. Please post it here.



IMO: #VMworld 2014 recap VMware EVO:RAIL (part 2)

This is part 2 of my IMO #VMworld wrap up. Read my about thoughts of a new product called EVO:RAIL

IMO: #VMworld 2014 recap on VMware NSX (part 1)

IMO: #VMworld 2014 recap VMware EVO:RAIL (part 2)

IMO: #VMworld 2014 recap VMware vCloud Air (part 3)

IMO: #VMworld 2014 recap vSAN and vVol (part 4)

IMO: #VMworld 2014 recap Automation & Orchestration (part 5)



EVO:RAIL is a pretty cool so called hyper-converged solution provided by VMware and partner vendors like (DELL, EMC, Fujitsu, INSPUR, net one, SUPERMICRO, HP, Hitachi). Summarized Evo:Rail delivers a complete vSphere-suite (including vCenter, vSAN, Enterprise+ & vRealize suite) bundled with 4 computing nodes which is from a technical perspective ready to be productive in less than 30minutes (the record at the EVO:RAIL challenge was <16 minutes).

Such a solution is a thing I thought about a long time ago (it was one of the outcomes of my master-thesis on the software-defined datacenter in small-/medium sized enterprises) especially for small environments where the admins want to focus on operating the running systems (or better: delivering an IT-service) rather than implementing, installing and configuring basic infrastructure (Yeah I know this is going to be a shift in the future for me as a trainer who delivers a lot of install, configure manage classes and did installations as part of my consultancy/implementation jobs).

IMO VMware did a very smart move not to get into the role of a hardware vendor and did a cooperation with existing and well-known partners to deliver the solution specified/managed via the EVO:RAIL engine by VMware. The established sales channel to customer and companies can be used. Especially small- and medium sized business will be attracted by this solution as long as the pricing/capex ist affordable for them. Which means from a business perspective the following: VMware delivers the software (vSphere, vRealize and the EVO-engine) and the vendor delivers the hardware & support. The business-management (#beersherpa) guy inside of me says…. perfect… everyone stays at its core competencies and bundle the power together to bring a much better solution for the customer (One contact point for support, a completely integrated and supported virtualization stack, shortest implementation times).

I believe for the big x86 vendors this solution is just a next step in becoming a commodity. Isn’t the whole software-defined datacenter thing about decoupling software from hardware, creating/using a smart VMware controlled control plane and a commodity data plane which is responsible for the concrete data processing based on the control plane logic? We don’t or will not care anymore if the hardware (switch, storage, computing nodes) is HP, Cisco, Juniper, IBM, etc. We will care about the control plane.

With EVO:RAIL it will get even tougher for the hardware vendors to differentiate from each other and the competition in the end can only be won by the price (in the small/medium sized market). I want to add that I missed the chance in the EVO:RAIL demo room to have a discussion about this topic from a vendor perspective (damn you VEEAM party 😉 ), so if you have done anything similar or have own opinions please comment on this post or contact me directly.

The use cases of EVO:RAIL can vary (Management Clusters, DMZ, VDI, small production environments) a lot and I believe that this is a product is a pretty good solution which will be triggered from a bottom-up perspective within the companies (I am referring to my bottom-up / top-down approach of bringing innovation in companies at the NSX post (link)). Administrators will love to reduce the setup time of a complete vSphere environment.

Especially for VDI solutions I can imagine a brilliant use case for the EVO:RAIL, which means next step… VMware please bundle the VMware Horizon View licence into EVO:RAIL and integrate the View setup into the Evo- engine :-).

Useful links around EVO:RAIL:

Do not run TSM as a vCD Workload!

I just got called in for a Tivoli Backup troubleshooting. The symptoms seen were extremely strange:

The TSM proxy successfully connected to vCenter and a backup job could be started. In vSphere Client we could see that the VM to be backed up was snapshotted. The next step would be to attach die VMDK to the TSM virtual machine but instead it was attached to an entirely different VM 😀 Of course, the backup job failed.

Looking at the TSM VM, I found out it was part of a vApp deployed through vCenter Orcestrator and vCloud Director. I figured this was probably a bad idea to run TSM proxy in a vCD vApp for several reasons:

1. TSM is going to back up vCloud Director VMs and running that same backup server as a vCD VM itself seemed strange. Any scripts or similar to backup the entire vCD vApp workload would probably try to back up the TSM proxy, too.

2. TSM talks to vCenter requesting the creating of snapshots and attachment of VMDKs to itself. As vCD VM the VM is marked as controlled by vCD and any changes through vSphere Client are not recommended. But exactly this would happen when a VMDK gets attached to TSM for backup.

So the first try was to clone the vCloud VM into an ordinary vCenter VM and shut the vApp down. Booom, works! We resolved the issue quickly but unfortunately, the actual technical cause for this is still unknown to us. So in case one of you knows what exactly was going on, please drop me a mail 🙂


VMware Single Sign On and Active Directory Logon Restrictions – LDAP Error Code 49

Have you ever had issues using Active Directory User with Logon restriction in your Sphere 5.1 environment? I think so, otherwise you probably haven’t landed here 😉

Since I haven’t found any explanations how to deal with this situation I tried to summarize it.

We all know a good role-concept in vSphere is a very important thing. That’s why we typically create our own roles and attach (service/personal) active-directory users/groups to those specific rules.


So far, this is nothing new for us and not very problematic at all if SSO is probably configured to use the Active directory as an identity source.


Once we are forced to deal with strong security policies, we might run into some issues that our vCenter User is defined with a logon restriction.


From a security perspective it definitely makes sense to restrict the access for the Users (e.g. service-accounts) to specific Computers. From a logical point of view, having access to the vCenter (vc01 in the picture) should be enough. Unfortunately things in the IT world are not always that logical, which leads us to the fact that we can’t login to our vCenter after having configured a restriction.


Credentials are not valid? Sounds for me like a typo or wrong password, but a closer look at the imsTrace.log (located at %ProgramFiles%/VMware/Infrastructure/SSOServer/logs) gave me another hint.


‘javax.naming.AuthenticationException: [LDAP: error code 49 – 80090308: LdapErr: DSID-0C0903A9, comment: AcceptSecurityContext error, data 531, v1db1 ];’

a short online search for LDAP error codes gave me the information what error code 49 and data 531 exactly means.


531 means: not permitted to logon at this workstation, even though we are permitted to log on against the vCenter Server.

The only way to be able with this user to connect against the vCenter is adding the 2 configured ldap-server which are used by single-sign on to the User as well.



Once this is done, we are able to logon with this user at our vCenter.


Nevertheless. For many organisations this is not a suitable solution, to grant access to the Domain Controllers itselves. Unfortuenately I haven’t found a satisfying workaround for this scenario so far. I’m not sure what the exact problem is, but it seems that the SSO service is just brokering the user-credentials to one of the configured ldap-servers. If the user has no access to log on at this machine….beduumm…*failsound*…

The only way to deal with it, is by getting into a mixed environment of vCenter 5.1 and 5.5 components. The new SSO component in vSphere 5.5 is much more robust and better working than the old one and regarding to VMware it is supported to use the SSO from 5.5 with vCenter 5.1.

I’m not a fan of such a mixture, but sometimes we have no other choice to to do it that way. Make sure to update SSO and update the vSphere Web-client to the newest Version of 5.5.

Once we have done the upgrade, we need to configure the new active directory identity source with the integrated windows authentication. In the first step, remove the old AD configuration setting. In the second step we add a new identity source and choose Active Directory (Integrated Windows Authentication).


And voila, we are able to logon to the vCenter with a user log on restricted only to the vCenter machine itself. That’s what we typically want to have.

As a summary, the only thing we can do at the moment for using logon restricted active directory users with vSphere is…

  1. Using no restrictions for your vCenter User.
  2. Running a mixed vCenter infrastructure with vCenter 5.1 and 5.5 components.
  3. Upgrade directly to vCenter 5.5.

Since I struggled several days on this issue, I hope this summary might safe you some time during your vSphere implementation and the active directory integration.

Update Manager Error: Cannot create a ramdisk

We ran into a problem with Update Manager on vSphere 5.1 lately:

Cannot create a ramdisk of size 329MB to store
the upgrade image. Check if the host has
sufficient memory.

The help provided in vSphere documentation says

The ESXi host disk must have enough free space to store the contents of the installer DVD.The corresponding error code is SPACE_AVAIL_ISO.

Yeah, thanks 😀 That what the error message said in the first place 😀 But it might as well describe your exact problem, so please check for a possible resolution.

Well, didn’t do it for us. We had enough space … Hmmm … after a while of struggling, it turned out to be lock down mode which caused the error! After disabling it (via DCUI, as vCenter failed with a different error message) VUM remediated the host like a charm 🙂 WTF!?

So far, it didnt reproduce on any other host but let’s see what happens in the future 😉


vCenter SSO Update to 5.5 Fails (Error 1603) – What Else to Do?

Like many others I struggeld a lot with updating to vSphere 5.5. During the installation of Single Sign On the step of importing lookupservice data failed and an automatic rollback started.

Eventhough this issue was recognized by VMware and it has been dealt with it in an Update (KB 2060511), some people (like me) were still having problems with this issue (even though I checked the certificate and registry setting).

If it’s the case in your environment, please try the following:

Define JAVA_HOME as environment variable and point it to C:\Program Files\VMware\Infrastructure\jre


Afterwards the installer should run probably. Keep in mind to delete the CSI folder in %ProgramData%/VMware/CSI as it is mentioned in the KB.

I came up with this idea since I ran into some similiar issue during the SSL Automation Tool Usage. This tool called a script which was checking the nslookup-service. After a manual call of this script I recognized that the missing JAVA_HOME variable was preventing the check-script from executing and as a consequence forced the SSL automation tool to fail.

Keep in mind to set the JAVA_HOME variable if you are using the vCenter automation tool and let me know if you are still not able to upgrade SSO after setting the environment variable.

© 2019 v(e)Xpertise

Theme by Anders NorénUp ↑