VMworld 2014 recap

Sitting at airport on way back from a very successful VMworld week and thought i would put some my thoughts into a quick post.  I am just going to briefly mention my highlights.  Some of these are existing products and nothing new to most people and then a few new things too.

vCAC
  • It is like they say the future and the future is looking good.
  • vCloud director only available as service provide offering in near future…
Application director 
RaaS (Recovery as a service) 
vCloud Air everything
  • vRealize Air Automation
EVO:RAIL
NSX
  • Virtualize the network and all the new features including 3de party integration, but the most exciting is the micro-segmentation from a security perspective.
Hands on Lab
  • This was amazing and a great success.  Look forward to running though all those lab sessions in the future.
AirWatch
  • 8th most taken lab at VMworld and received a lot of attention!

vCOPS alert: analytics resource: number of resources exceeds supported limit

Received the following admin alert in vCOPS:

analytics resource: number of resources exceeds supported limit

This is due to the default max number of resources set for vCOPS is 10 000.
This was not enough of our environment and therefore we had to increase this max number.  The way vCOPS handles resources also contributed to this alert since it does not by default delete non-existant resources.  In another post I provide information on how to change this behaviour:
http://virtualrealization.blogspot.com/2014/08/vcops-admin-alerts-for-vin-adapter.html

Solution:

Add the following line to the file on the analytics server:
“/usr/lib/vmware-vcops/user/conf/analytics/advanced.properties

maxNumberOfResourcesSupported= 999999 (can set your own number here)

restart the analytics server

  • SSH into analytics server and login as admin user
  • vcops-admin restart

vCOPS: admin alerts for VIN adapter: resources do not receive data from this adapter resource

I was getting a lot of admin alarms within vCOPS for VIN adapters not able to receive data for resources.

VIN adapter instance: 200 resources do not receive data from this adapter resource

I believe the primary reason for this is because my vCloud environment is so dynamic with users deleting and creating new VM’s that vCOPS and VIN are not keeping up and non-existent resources are causing the alerts.

By default vCOPS does not delete non-existant resources but I found the following KB 2020638 which provides information on changing the schedule for when deletion of old objects will take place from vCOPS.

Resolution:

I changed the following on analytics VM for file:  /usr/lib/vmware-vcops/user/conf/controller/controller.properties

deleteNotExisting = true
deletionSchedulePeriod = 12
deletionPeriodInHours = 6

restart the analytics server

  • SSH into analytics server and login as admin user
  • vcops-admin restart

This only solved part of the problem for me and I also had to disable access for the VIN user account to the vCloud director cluster so that it does not discover those VM’s.
I have another post which provides information on this:

Links:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2020638

Snapshot consolidation failed with error

Virtual machine was showing following alert:

Virtual machine disk consolidation is needed.  Virtual machine consolidation needed status

After trying to consolidate the virtual machine snapshots I received the following error:

Status: An error occurred while consolidating disks: Could not open/create change tracking file

Cannot complete the operation because the file or folder ds:///vmfs/volumes/*.vmdk already exists

Troubleshooting:
SSH to ESXi host and browse the volume where the VM resides on.
cd into the VM folder and found flat files for each of the vmdk disks but the VM does not show that any snapshots exists.
Tried to vMotion the VM and received same error
Tried to storage vMotion the VM and received same error
Tried creating a new snapshot and delete it again but still could not consolidate afterwards
Found the following KB 2013520 which outlines the same problems but just for committing a snapshot that already exists.
The issue seems to exist due to corrupted CTK files.  These files are associated with each delta disk and flat file for the virtual machine.
Resolution:
  • Power off the VM
  • Create a temp folder in the VM folder on the datastore
  • Move all the CTK files into the temporary folder.  The files names will look like this “*-ctk.vmdk”
  • Right click the VM and select snapshot -> Consolidate
On a side note:
After doing some further investigation I found that the there was a very large snapshots on this VM (> 150Gb) and after committing the snapshot it seems to have corrupted the CTK files.  
CTK file description
Always nice learning something new each day 🙂

The CTK files describe the VMDK characteristics/attributes of which the actual ‘data’ is stored in the in the *-flat.vmdk file.

Like in our case if you have multiple disks attached to a VM, then you’ll have multiple descriptor files that link to their actual disks.  You can actually view the contents of the file  which provides information such as chain IDs, the type of VMDK, it’s data disk, hw version, etc.
These files are also used for change block tracking (CBT) and VMware has a good KB to read on this:
Links:

Datastore unmount error: Cannot unmount volume because file system is busy

Just this week ran into this issue while trying to unmount some stale datastore from vCenter server.

Troubleshooting:

SSH into the ESXi host and browse the datastore.
On the datastore I found the following files listed with following names:
vsantraces*

Found KB 2069171 specifying this problem where the VSAN modules stores the traces needed to debug VSAN related problems which is keeping a lock on the datastore.

Resolution:

Since we are not using VSAN I had to disabled the VSAN services on the ESXi host to remove the datastore.

  • To verify open files for vsantrace:
    • lsof | grep vsantraced | grep volumes
  • Run command to stop services:
    • /etc/init.d/vsantraced stop
  • Perform a Refresh for Storage
  • Unmount the datastore
  • Run command to start the services: 
    • /etc/init.d/vsantraced start

If you are however using VSAN then you need to change the trace location with the following command:
esxcli vsan trace set -p

To verify open files for vsantrace
lsof | grep vsantraced | grep volumes

On a side note:

Initially the unmount of the datastore still did not exceed and this was due to my SSH sessions which has the datastore open 🙂 Close the SSH session before trying to unmount.

You could potentially still have problem with unmount when following scenario exists: 
  • Scratch for host is configured on the problem datastore. Modify the ScratchConfig value and change it to another datastore. For more information, see Creating a persistent scratch location for ESXi 4.x and 5.x (1033696).
  • Coredump is configured to write to a file in the problem datastore in an ESXi 5.5 host. For more information see, Move the coredump to a File from the VMware vSphere 5.5 documentation Center.
    • Run following command to check for paths:
      • esxcli system coredump file list
    • Verify the UUID and compare to datastore name.
    • Run following command to remove the coredump files. 
      • esxcli system coredump file remove –force
Links:

VIN guest operating system management: limited permissions for vcenter server

vCenter Infrastructure Navigator has a single account which it uses to access VMs.
This is set within vCenter Server’s infrastructure navigator screen.

I wanted to limit which datacenters inside vCenter Server VIN would be able to see as the functions this account can perform.

Resolution:

Create a guest operating system management role within vCenter Server with minimal privileges for VIN:

  • Navigate to administration
  • Select roles
  • Click add roles
  • Enable the following with checkbox:
    • Virtual machine > Interaction -> Guest operating system management by VIX API 
    • Virtual machine > Interaction -> Console interaction
  • Provide role name
  • On the vCenter entity root level click manage tab
  • Select permissions and add
  • Select user and assign the newly create limited VIN role.

This will provide the necessary privileges to enable the discovery process for user selected.

In my case I also did not want this service account to view my vCloud director datacenter so I added the user to the datacenter permissions with “no access” role.

From inventory menu select Infrastructure navigator.
Select settings tab
Here you can now set the new user account.
Make sure to enable access to VMs

vCOPS – Custom UI LDAP error – "One or more users already exist and haven’t been imported"

The regular vcops-vshere WEB GUI was easy and straight forward to configure for LDAP authentication since it uses the vCenter privileges.

However vCOPS custom GUI web interface uses a custom configuration for LDAP authentication. I am also using sAMAccountName for Username Field on LDAP settings.  This is not available and have to type in manually.

After setting up LDAP I tried to import the same users but receive the following error message:
“One or more users already exist and haven’t been imported”
The main problem here is that both webpages uses the same useraccount tables in postgres database. This causes duplicates since the username to be added to vcops-custom is already in the database.
Strange observation is that the useraccounts created by vcops-vshere don’t show up under the “Not Grouped” group name.
Resolution:
Reviewing the useraccounts in database shows both username and \username.
VMware support provide a detailed KB on how to lookup the users in the postgres database and rename the account with duplicate usernames.
For vCenter Operations 5.0.x: 
  • Open SSH to Analytics VM and run command:
    • # su postgres
    • # psql alivevm
  • Run these commands at the psql prompt to export the user account table:
    • alivevm=> \o /tmp/useraccount.csv
    • alivevm=> SELECT userid, username, description FROM useraccount ORDER BY userid;
  • Exit the psql session with \q:
    • alivevm=> \q
  • Run this command:
    • more /tmp/useraccount.csv
    • Review the useraccount.csv file and determine if there are any duplicate usernames and make note of the related userid(s). 
  • To disable the duplicate user accounts:  (This is only a rename of the useraccount)
    • Repeat Step 2 to log in to the database again. 
    • Run this update statement to rename the username of the user account
      • alivevm=> UPDATE useraccount SET username = username||’_disabled’ WHERE userid in (‘2′,’3′,’N’);
      • Note: Replace the user ID in the IN statement (in this example, 2,3,N) with the user ID(s) of the duplicate user accounts you want to disable.
  • Run these commands with admin user on the UI VM to restart the vCenter Operations services:
    • admin@firstvm-external:> vcops-admin restart
The best way to mitigate this problem is to make sure that the users log in to vcops-vsphere with “\username” and not just their username.  This will allow for the user with username only to be added to vcops-custom.  
This just and issue when using sAMAccountName.  If using the userpricipalname it will import the user as with username of  “username@domainname”.
Link: