Esxi 4.1 configuration issues

Have been looking at an ESX 4.1 cluster recently in which one host was being a bit truculent. After a BIOS upgrade the host seemed to be behaving itself so after a couple of weeks of stress testing with no issues I added it back into the cluster.

However, nothing would migrate to it and I eventually tracked down three separate issues (note to self):

  1. Moving hosts between clusters can confuse the standard switch configuration. You really need to re-create the standard switches each time you add to a cluster.
  2. An I/P address was wrong (typo)
  3. The hash of the NFS datastore UUID was different. On the other hosts in the cluster the datastore had been added with I/P address and on this one it had been added with NFS server name. ESX essentially thought they were different datastores as 4.1 uses a crude hash to create a UUID (I would have though it should pick up a UUID from the datastore itself).

Caveat implementor.

The vSphere limit effect

VMware Limit Effect on Linux top

VMware Limit Effect on Linux top

I believe the effects of limiting a guests CPU in vsphere are well understood but I for one don’t like the way VMware implement this. I have just verified the behaviour on ESXi 5.0.0 is the same as 4.1 (which is only to be expected).

Guests get confused, is the bottom line. At least Linux does, which is particularly noticable with “top”. In the screenshot above of a Centos guest, I have limited the CPU to 1000Mhz using virtualcenter. To me, it would be logical that the guest O/S would be presented with a 1000Mhz CPU but this is not the case. The output of /proc/cpuinfo means that Linux thinks it is running on a 3.47Ghz i5. Is it really hard to do, or something?

The upshot is that when the guest is maxed out, as in the screenshot, top shows 100% us time (correct) but only 29% of the CPU. Which is correct if you consider the pCPU but not, if like me you don’t think a guest should have any knowledge of the real hardware. I think Centos should see a 1000Mhz vCPU and top will correctly display 100% CPU Time.

Must check this on other hypervisors (Xen and KVM) sometime.

P.S.
Saddened to hear of the death of Dennis Ritchie today. Someone who I can identify as having a large impact on my life, though I never knew or met him. Such is the world we live in. Eventually we will all exit(0).

Removing old vCD agent from ESXi

Just a note to myself. I picked up how to do this from http://vmwire.com/tag/esxi-vcd-agent-uninstall/ (thank you Hugo).

I was in the situation where a vCD install had been trashed by an I/P re-address. When re-installing vCD, it couldn’t put its agent back on the hosts because the old one was still there.

I’ve now removed the old agent as described above, rebooted the host and “prepared” it from vCD.

 

Danger – Cloud at Work

No blogs for a while, I’m afraid, not due to lack of topics but lack of (priority of) time. I have recently started working on a major cloud project at a large investment bank in the City (of London). This is using vCloud Director and some other nascent products from VMware. All very interesting and challenging and I certainly hope to blog about it when I have the time!

Guest startup and vMotion

I completed an experiment on ESXi 4.0 to confirm what many people have known about guest startup and vmotion since vsphere 3.x which is that the two features don’t work together.

Guest A set to 1 on host X
Guest B set to 1 on host Y
Migrate B from Y to X
B appears in “Any Order”
migrate B back to to Y from X
B appears in “Any Order”

It seems the startup feature is host based and doesn’t know what to do if a guest arrives at or leaves. I will do a bit of digging and see where the startup data is stored: probably on the host somewhere (as opposed to in the guest config).

I think to provide a solution where all the guests in a given datacentre have a defined startup sequence will require a virtualcenter plug-in which stores that information in the database. It will need a “startup resolution” algorithm and a human interface to manage it. As far as I know, no-one as written such a thing but it could be worthwhile.

QNAP NFS datastore etc.

Because I was using a lot of space on my local datastore disk, I decided to move the big iso folder to the QNAP using an NFS datastore. It’s not totally obvious and I refered to http://files.qnap.com/news/pressresource/product/How_to_set_up_QNAP_NAS_as_a_datastore_via_NFS_for_VMware_ESX_4.0_or_above.pdf to get it to work. The main un-obvious thing is that you need to prepend the mountpoint with /share to access the correct area on the QNAP.

To come back to the question about auto-power on of guests, having looked in an environment with several hosts, it is much clearer. Of course, every VM, regardless of it’s state has a host (something I should have known). So when the instructions say “Display the virtual machine’s host in the inventory” that makes total sense. For a given hosts, all the VMs assigned to that host are there. Selecting “properties” from the Virtual Machine Startup/Shutdown panel allows you to change things around.

Incidentally, the “properties” link (and some similar ones in the vsphere client) are a bit over-engineered as my first instinct is to drag and drop the guest VMs on the first panel. That functionality would not be hard to achieve and would be much more intuitive.

The remaining question is what happens to the power settings of a guest when it migrates to another host? I can guess that the “Manual Startup” and “Any Order” properties would migrate with the VM but what if a set of guests are all set to start as number 1 on three different hosts and they all get migrated to one host…?

I guess an experiment is called for…(I only have one machine!)

ESXi 4.1 auto guest power-on

This is not rocket science as there is an easy to find solution at http://communities.vmware.com/message/1602618. This changed from previous versions of vSphere where I think the option was in the VM. Makes much more sense now. I assume (because I have only got one host) that in a cluster of ESXi hosts all guests would be shown and the ones specified for power-on would be powered on on *this* host.

The online help for this topic says:

1 Display the virtual machine’s host in the inventory.

as the first line but I am not too sure what it means by “the virtual machine’s host” (if it does not have one).

For the record, my own picture is:

Power settings for guests.

And I will try an experiment with multiple hosts in the office lab tomorrow.

Obscenities

My brain is screaming obscenities because I have spend the last 2 hours trying to delete a datastore from ESXi4. It’s a small 16G one (the one I was using to try and copy to a USB stick) which resides on a 500G internal SATA drive. I have been trying from the vSphere client and from the command line with more and more severe methods culminating in “Reset System Configuration” from the console. Even that didn’t work which only leaves me the option of re-burning my ESXi on USB stick.

The error is as follows:

Error from Delete Datastore

Don’t bother to ask me to try something. I’ve tried it all.

Poking about in ESXi4.1

Just looking about, waiting for inspiration, slightly guided by the impossible aim of making a USB stick into a datastore. Here’s one snippet from the log file:

Sep  9 17:24:40 vmkernel: 0:00:00:09.737 cpu0:4497)FSS: 3924: No FS driver claimed device
 'mpx.vmhba32:C0:T0:L0:1': Not supported

And here’s another useful command I ran across which lists useful information about all your storage:

/var/log # esxcli corestorage device list
t10.ATA_____ST3500418AS_________________________________________5VM89S46
Display Name: Local ATA Disk (t10.ATA_____ST3500418AS_________________________________________5VM89S46)
Size: 476940
Device Type: Direct-Access
Multipath Plugin: NMP
Devfs Path: /vmfs/devices/disks/t10.ATA_____ST3500418AS_________________________________________5VM89S46
Vendor: ATA
Model: ST3500418AS
Revision: CC38
SCSI Level: 5
Is Pseudo: false
Status: on
Is RDM Capable: false
Is Local: true
Is Removable: false
Attached Filters:
VAAI Status: unknown
Other UIDs: vml.010000000020202020202020202020202035564d3839533436535433353030

mpx.vmhba33:C0:T0:L0
Display Name: Local USB Direct-Access (mpx.vmhba33:C0:T0:L0)
Size: 1896
Device Type: Direct-Access
Multipath Plugin: NMP
Devfs Path: /vmfs/devices/disks/mpx.vmhba33:C0:T0:L0
Vendor: Generic
Model: USB Flash Disk
Revision: 8.07
SCSI Level: 2
Is Pseudo: false
Status: on
Is RDM Capable: false
Is Local: true
Is Removable: true
Attached Filters:
VAAI Status: unknown
Other UIDs: vml.0000000000766d68626133333a303a30

mpx.vmhba32:C0:T0:L0
Display Name: Local USB Direct-Access (mpx.vmhba32:C0:T0:L0)
Size: 15318
Device Type: Direct-Access
Multipath Plugin: NMP
Devfs Path: /vmfs/devices/disks/mpx.vmhba32:C0:T0:L0
Vendor: Single
Model: Flash Reader
Revision: 1.00
SCSI Level: 2
Is Pseudo: false
Status: on
Is RDM Capable: false
Is Local: true
Is Removable: true
Attached Filters:
VAAI Status: unknown
Other UIDs: vml.0000000000766d68626133323a303a30

t10.ATA_____Hitachi_HDS721010CLA332_______________________JP2911HQ0MK9TA
Display Name: Local ATA Disk (t10.ATA_____Hitachi_HDS721010CLA332_______________________JP2911HQ0MK9TA)
Size: 953869
Device Type: Direct-Access
Multipath Plugin: NMP
Devfs Path: /vmfs/devices/disks/t10.ATA_____Hitachi_HDS721010CLA332_______________________JP2911HQ0MK9TA
Vendor: ATA
Model: Hitachi HDS72101
Revision: JP4O
SCSI Level: 5
Is Pseudo: false
Status: on
Is RDM Capable: false
Is Local: true
Is Removable: false
Attached Filters:
VAAI Status: unknown
Other UIDs: vml.01000000002020202020204a50323931314851304d4b395441486974616368

Actually, that looks like

esxcfg-scsidevs -l

in another format.

I did try again to create a filesystem on my USB device with the usual result:

/sbin # vmkfstools -C vmfs3 /dev/disks/mpx.vmhba32:C0:T0:L0
Checking if remote hosts are using this device as a valid file system. This may take a few seconds...
Creating vmfs3 file system on "mpx.vmhba32:C0:T0:L0" with blockSize 1048576 and volume label "none".
/dev/disks/mpx.vmhba32:C0:T0:L0: Permission denied.  (Have you set the partition type to 0xfb?)
Error: Permission denied
/sbin # 

There doesn’t seem to be a way to make other filesystem types from the command line, but then, why would there be?

Moving swiftly on, there are plenty of other commands to play with, e.g.

 /sbin # esxcfg-info

prints hundreds of lines of config.

By my count there are 157 commands to explore in /bin and /sbin. I got that number by the following command:

/bin # find /bin /sbin -type f -perm +100 -exec ls -l {} \; | wc -l

Disabled ESXi 4.1 USB passthrough

I figured out a way to see both my USB sticks in the ESXi busybox shell. If you read my previous post you will know that the system enables passthrough of the non-boot USB devices for use by guests by default.

Whether or not there is a “right” way to do this I don’t know (vmware kb or google don’t turn up much of use).

So, going back to that mine of information, the messages file, I notice that /sbin/chkconfig is run immediately before it states that usb passthrough is enabled. Running /sbin/chkconfig –list shows:

~ # /sbin/chkconfig --list
DCUI on
TSM-SSH on
TSM on
usbarbitrator on
lbtd on
storageRM on
sensord on
vprobed on
vobd on
wsman on
slpd on
sfcbd-watchdog on
sfcbd off
ntpd on
hostd on
iked off
lwiod off
netlogond off
lsassd off
~ #

So as a guess I changed usbarbitrator to off with the following command:

~ # /sbin/chkconfig usbarbitrator off

Rebooting the system and looking again, I can now see both the devices: mpx.vmhba33 (the boot USB) and mpx.vmhba32 (my target datastore). The later appears as a FAT partition, despite my trying to dd my vmfs3 filesystem over the top of it.

I will try more dd experiments at the next free slot.

At least I made some progress!