In a two cell vCloud setup I was having trouble booting the second cell when the first cell was already running. When investigating this issue I noticed that there was something wrong with the permissions on the transfer directory. In the cell.log files I found the following error: “Transfer spooling area is not writable”. The transfer spooling area is the NFS share vCloud needs when working with two or more cells.
In my case the /opt/vmware/vcloud-director/data/transfer directory is an NFS share mounted on both cells and published (exported) from a Linux RedHat host. In the /etc/exports file of the NFS host, the nfs directory is exported with the options rw and no_root_squash.
rw – By default, NFS will export the directory read-only. Quite often you might want to give write access too, for example when user home directories are being exported off a server.
no_root_squash – NFS exports directories with root_squash turned on. This means that root on the client machine will be mapped to the anonymous UID, commonly nobody. The result is that root on the client machine will not be able to access anything in the exported entry. The no_root_squash option prevents this behavior.
When I logged in to the first cell and checked the contents of /opt/vmware/vcloud-director/data, I noticed that the “transfer” directory was owned by the user “vcloud” and group “vcloud”. However on the second cell, this directory was owned by the user “vcloudssh” and group “vcloudssh”. This was also the case for all sub directories of the “transfer” directory.
First let me explain that when I install vCloud on RedHat, I do not permit root access through SSH and I do not permit the user vCloud to logon through SSH. For the SSH sessions I use the user vCloudSSH and this user has sudo permissions.
Now, how come there is this difference? Linux (of course) doesn’t use the user name for permissions, but the userid that is coupled to the username and since the cells only use local user accounts, there is no sync between the cells on which userid is connected to what username. It seemed that I had created the user vcloud and vcloudssh in different order on both cells. As a result on cell1 the vcloudssh user has userid 500 and vcloud user has userid 501. On the second sell I first created the vcloud user which then got userid 500 and vcloudssh user would get userid 501. The /etc/passwd file proofed this:
The first number is the user id, the second is the group id. As you can see they are switch around.
How to fix this?
Easiest way: Just re-install one of the cells. Most admins have done this too many times and can install vCloud blindfolded. But maybe you want to just fix it without re-installing. Then follow these steps.
First we need to know where both user accounts have ownership of files. I ran a ‘find’ command and had the result exported to a file:
- find / -user vcloud > /tmp/owned-by-vcloud.txt
- find / -user vcloudssh > /tmp/owned-by-vcloudssh.txt
Looking at the results, the vCloud user had ownership of:
- /opt/vmware/vcloud-director (and all subdirectories)
The user vCloudSSH had ownership of:
- /home/vcloudssh (and all subdirectories)
- /opt/vmware/vcloud-director/data/transfer (and a few files in subdirectories)
The files in /opt/vmware/vcloud-director/data/transfer and below are files that should not have the vcloudssh user as owner. Which means we can ignore the ownership of those files.
On the second cell I now edit the /etc/passwd file to change only the user ID and the group ID. In this way the home directory and shell will not change. To change the passwd file use ‘vipw -s’ to prevent file corruption because of users locking the /etc/passwd file. To save and exit editing the file type: “:wq”. Same goes for the /etc/group file, use: “vigr -s” and switch the IDs for vcloud and vcloudssh.
After both files have been changed, there is a question on changing the shadow file, you can answer no on this. To make sure the passwords are still correct, I ran the change password command for both accounts:
- passwd vcloud
- passwd vcloudssh
Cell2 OLD situation:
Cell2 NEW situation:
Next we’ll change the ownership of the /opt/vmware/vcloud-director directory and below.
chown -R vcloud:vcloud /opt/vmware/vcloud-director
After this step, both cells shoud list the same userid and groupid for /opt/vmware/vcloud-director/data/transfer, test it on both cells with:
ls -l /opt/vmware/vcloud-director/data/transfer
Now it is time to start the vCloud service again:
service vmware-vcd start
And when you monitor /opt/vmware/vcloud-director/logs/cell.log, you should see the following line passing by as proof that it all worked out fine:
Successfully verified transfer spooling area: /opt/vmware/vcloud-director/data/transfer