Pretty excited to post this script as it was actually the
first PowerCLI script I had ever worked on.
To this day it is still set as a scheduled task and runs daily in my
production environment and works like a champ.
I knew PowerCLI was extremely powerful at that time, but had never taken
the time to look at the cmdlets and what they had to offer. It opened a whole new world as to how I
manage a large virtualized infrastructure, and has really saved me a ton of
time.
Anyway, back to the script and why I started it. We had just recently deployed a fairly large
XenApp farm deployed via Provisioning Server with the backend hypervisor at
vSphere 4. We scaled out using VMs and
the solution overall seemed to work pretty well once we worked through some of
the pain points of migrating from a large non provisioned Citrix
environment. However we noticed that
every so often some of the guests would sort of lose their way back to the PVS
boxes and would no longer accept new terminal connections. Originally we would hard reset them, but then
we realized we were kicking off active sessions (which wasn't good).
This sort of went unnoticed for a while, and soon we had
several VMs in this state. Terminal
sessions would stay active, new sessions could not be substantiated, and
console access would be pretty much locked.
Since these VMs stayed in this locked state, they were not subject to
the nightly\early morning reboots that are scheduled through XenApp. This exacerbated into having a subset of our provisioned VMs that weren’t
usable and we were not really aware. Not
a huge deal since we scaled out, but not something you want to continue.
One day I was working on one of the servers in this half
hung state and noticed that the VMware tools where shown as “Not Running” in
the vCenter console. Finally I had
something to “key” on to identify these servers. Then the wheels started spinning on how to
resolve this. I had recently attended a
VMware User Group meeting promoting this tool called the vEcoShell. I was amazed by the power it possessed. After realizing it was a framework for
running PowerCLI commands, I immediately dug in and started looking into how to
resolve my issue. I couldn't immediately
restart servers once they reported their VMware tools as “Not Running” because
it would kick legit users off. I also
didn’t want to continue to build up servers that were not getting regularly
restarted and losing the ability to accept new users.
The solution I came up with was to run a scheduled task to
restart these lost souls right after our scheduled early morning XenApp Farm
restarts. Here is the script:
001
002 003 004 005 006 007 008 009 010 |
Connect-viserver YourvCenter $hungVMs = get-Cluster -Name YourPVSXENAPPCLUSTER | get-vM | where-object {$_.powerstate -eq "PoweredOn"} | % {get-view $_.ID} |where {$_.guest.toolsstatus -like "*not*" } foreach ($hungVM in $hungVMs) { Restart-VM -VM $hungVM.Name -Confirm:$false } |
Shown above is a simple but effective script. It’s simply scanning the entire cluster for
VMs that report as “Powered On” and -like of “*not*”. The reason I chose to find objects using a
like vs a match is that there is two different scenarios I saw that reported
their VMware tools when the servers were hung.
“Not Running” and “Not Installed” are both valid states identifying
servers that were hung and not responding to our scheduled restarts. Using a -like will include both states, which
works out great.
After that it hard
resets the VM, and the OS is hard reset, and it’s still within the reboot
window. The reason I can hard reset
it…PVS brings it back to a pristine state FTW!
Please test this heavily before implementing.