To USB or not to USB?
On Sunday evening there was a good discussion on twitter about booting ESXi from USB stick or not. A number of arguments pro and con were made and like many discussions there is no real right or wrong. I decided to write this post to give my arguments on booting ESXi from USB stick.
Booting from USB stick
I’m very much in favor of booting ESXi hosts from USB stick. The USB stick is a very cheap medium and its power consumption is next to nothing. Compared to having an internal hard disk the costs are much lower. Since ESXi only needs the USB stick while booting it would be a waste of money and power to have a hard disk running in these hosts instead of an USB stick. Since ESXi only needs about 2GB, which is a fraction of the size of the smallest hard disk you can buy today, you would be wasting a lot of disk space when running from hard disk. So in hardware cost and power consumption, USB sticks are much cheaper than a hard disk.
Installation of ESXi on USB stick
The install procedure between USB stick or hard disk doesn’t differ that much when timed for one server, but are in favor of USB stick when having to quickly restore a failing disk or USB stick. Normally when I install USB sticks for my customers environment, I make sure I have a few extra USB sticks as spare. I always pre-install them with ESXi without running the configuration. In other words, when a host is booted from this stick, it picks up a DHCP address and then I add it to vCenter, apply a host profile and I’m done. Big advantage of USB install to me is that I can quickly pre-install a number of USB sticks using VMware Workstation on my desktop PC. When replacing a failing hard disk, I would have to do a full reinstall on a new hard disk and then apply a host profile to it. Now, I do admit that the difference in installation time is so small; it probably won’t be the deciding factor between USB and hard disk.
Reliability of USB sticks
Often when talking about booting from USB, I hear people talking about the low reliability of USB sticks. Unfortunately I cannot find any good reports that proof the reliability (or unreliability) of USB sticks and I haven’t encountered such problems myself yet, not saying nobody has had them. I do know that some HP DLxxx series server in the past (long time ago) kept on destroying every USB stick you fed them, so I won’t say there never were problems.
But even when a USB stick fails and has to be replaced, the impact is rather low. Like I already wrote, replacing the failed USB stick with a new pre-installed USB stick is very simple and takes little time. Replacing a failed hard disk would take much more time.
Since ESXi doesn’t use the USB stick after boot, a failure would only pose a problem during boot time, which is when the host already was in maintenance mode and your VMs had already been moved to other hosts. So the impact is very low.
Reliability in large environments
When operating in large enterprise environments, I do see a challenge when having to replace a failing USB stick on a regular basis. For small scale environments where access to the datacenter is easy, it doesn’t hurt when the admin has to get out of his chair once in a while and replace an USB stick. But for larger datacenters, where you have to report one day in advance that you want physical access, this can be a real problem. And of course, running a 1000 hosts with a 1% USB stick failure rate per year, will make you travel to the datacenter 10 times each year. But then again, how high is the hard disk failure rate?
Conclusion
Well, not a real hard conclusion because the failure rate of USB sticks compared to hard disk will be very decisive. My experiences with USB sticks until now are very good and I would always go for USB stick, unless someone can proof me otherwise.
So, please respond in the comments with your experiences with USB stick and hard disk failures, or any other remarks you have.
Tip: also read Setting logfile location, swap file, SNMP and vmkcore partition in ESXi
Dell servers offer the option to boot from SD card, integrated into the system. Reliability is about the same.
We’ve had some bad experiences with HP and USB sticks, as you also mention. It wasn’t the server in our case, the stick was faulty. In any case, I’d advise anyone to keep a backup image/copy of the used medium, just in case it burns out..
USB sticks, sure why not? Even better, SSD drives that you have ready and just replace if one fails, that follow the same principle, right?
Or SD cards for that matter. The fact is that it just doesn’t matter, as long as it works.
To say that it takes longer to replace a HDD, well, that depends on what your recovery plan is. You can just as easily have a replacement HDD ready for action, if you just plan for it.
Money wise, USB keys and SD cards are a big win. Since ESXi is stateless and logs can be logged on a remote syslog server, there is really no requirement for HDD or a RAID1. This is just #useless.
Now I would prefer that the server vendor come along with its own USB key and SD card. HP has a part number for an USB key + HP version of ESXi. That’s just what I want!
Future is VMware Auto Deploy :)
1) ESXi will use the USB after boot as it needs to write the config as well at some point,
2) keep in mind tha you cannot use a randm USB key. It will need to be supported by either VMware or you HW Vendor.
Well, with no write to USB I mean that it is not writing to the USB stick constantly. Failure of USB stick will very very seldom bring down your host.
Unfortunately there is no HCL for USB sticks from VMware and I asked HP and Sun (recent customer has Sun) and they do have USB on their part list, but would also support a wide range of other USB sticks without having a firm list. To be sure, I usually go for the Kingston stick that VMware used to distribute ESXi at VMworld.
On top of that as far as I know scripted installs are not supported on USB, which means either a manual task or PowerCLI for part of the config as not all can be done with host profiles.
Also, local log files can be very useful, especially when the network fails…
Anyway, either is fine with me… As long as you consider the impact.
In VMware’s document “ESXi Installable and vCenter Server Setup Guide” for 4.1, page 14 lists the supported locations for installing and booting ESXi and, unlike in ESXi 4.0, USB is no longer listed.
Yes, it works, but booting ESXi 4.1 Installable from USB doesn’t seem to be supported.
The fact that it’s changed from 4.0 makes me think this is not a documentation error.
Great points. My entire 24 host environment runs ESXi on USB. Never had a single problem and use USB 1gb or 2gb on all deployments. I run Dell R610 hosts with 1gb sd cards built in along with some older hosts with Sandisk 4gb flashdrives. Never had an issue. Love it!!
Hi Gabe, nice post as always. I’m not sure if you saw it but I wrote a post on ESXi storage options last week – http://www.vreference.com/2011/02/10/stateless-diskless-and-feckless/. I think it’s a very interesting design consideration these days, and it is not that well documented how ESXi uses local storage. I’d love to see VMware come up with a whitepaper explaining the details more closely and what they consider best practise.
Hi Nigel
Interesting comment considering IBM ship ESXi 4.1 on USB keys from the factory…Maybe the doc is out of date?
Duncan, being an official VMware person on this thread can you get clarification of the supported config and whetehr there is a typo in the document Nigel references please.
Thanks
David
My understanding is that VMware backup up the ESXi configuration every hour to disk (I assume the same holds true for USB). I’m aware of one company who altered this behavior so that backups were performed every 5 minutes.
In most cases, shipping logs to a syslog server works fine. However, when you have network failures, the logs won’t make it to the syslog server and logging locally is required. Without the required logs, VMware won’t be able to root cause the issue. I have a blog post coming up on this as I have run into it in the past.
Jas
We had a 2/3rds failure rate over 3 years with the HP Xen embedded hypervisor with the HP provided USB sticks so it’s nowhere near 1% per year. We average about 1.5% HDD failures per year but thanks to RAID those are completely non-disruptive and do not require an after hours datacenter visit to fix and doesn’t require uncabling and opening the system. Dell’s solution with the redundant SD card’s on the R710/810 is definitely the way forward since it combines the advantages of cheap solid state with RAID =)
One of my customers ran in to a very interesting issue recently. They used SD cards in their HP systems that where not on the HP HCL.
VM’s ended up running on hosts where the hosts and hostd thought the VM’s where powered off after a vMotion, even though the VMs where actually running (could log on to them) and esxtop showed them as running as well, very scary stuff. We opened a case and the issue was caused by some symlinks not being created on the SD cards. VMware tech support told us they have seen this 2 more times. This is what they send us:
Engineering have not completed their investigation but the errors in the logs indicate the problem was creating the symbolic links on the SD card. This task was failing and causing the VM to go into an unknown state. We have seen 2 other cases of this issue and with the other cases the customers were also running ESXi 4.x with SD/USB storage for the OS. Replacing the device resolved the issue for all instances that we have seen so far.
They have an internal KB article on this which still has to be reviewed before it gets published.
http://kb.vmware.com/kb/1033591
We had to shut down all affected VM’s from within the guest OS and start them again to get this resolved. vMotioning them to an other host did not work since vCenter also thought the VM’s where switched of and tried to do a cold migraton, but that did not work since the files where still locked.
We had to manually compare the esxtop output on all hosts and compare it to the info found in vCenter or a direct VI client connection to find out where VM’s where actually running and if they where running. Took us about two weeks to get everything sorted, very scary stuff
This issue made me reconsider the use of SD and USB cards in production environments.
Hi David,
What IBM ship on the USB key is almost certainly “ESXi Embedded” rather than “ESXi Installable”. HP and Dell do the same. In HPs case they do list a part number of a USB drive they support with their HD/SD/USB install image. It’s about $100 for a 2GB USB drive! ( https://h20392.www2.hp.com/portal/swdepot/displayProductInfo.do?productNumber=HPVM06 )
What the difference is between the Embedded and Installable version, other than a few pre-loaded drivers/management bits, isn’t clear so why one is supported on USB and the other isn’t is not clear.
It’s probably down to VMware not wanting to support ESXi booting from shonky, straight-outta-Ebay USB keys that have previously been dipped in coffee. If you buy ESXi Embedded on a certified USB drive from your server manufacturer you’re almost certainly paying a large premium over the plain media.
I will have a blog article out this week on the ESXi Chronicles blog around “Design Decisions” for USB/Local/BFS.
For lab USB solution is optimal to make testing.
For enterprise a host diskless solution is interesting for different reasons: cheaper (not so much), power consuming (at least 2 HD for host are usually needed… to be honest new 2.5″ HD does not consume so much), more space for RAM (some vendor has change they layout to have only few HD bay and a lot of bank space).
But which kind of diskless solution? IMHO USB/SD is more simple that a boot LAN for small and medium environment (for large, probably a boot from LAN solution or some future solution could be better).
About HCL, I ask (at least) that the hardware vendor support the entire hardware, for this reason IMHO I prefer a vendor solution, like an SD slot with his memory (but first version of ESXi embedded that I’ve seen was all with their internal USB stick).
About availability now there are also some solution with a dual SD (similar to a RAID-1 configuration)… But also a scheduled dd could be a way :)
I’m certainly pro embedded media even after going through the (2) HP USB recalls, (Green key then Black key). In addition to less cost from lack of spinning drives or SSD, the more I can treat the ESXi Host like an appliance, the better. I am lucky to use HP so I do have a certified solution and can run the updated image with CIM providers. I’ve been converted 100% for over 2 years, 300+ systems.
A word of warning, we have had ESXi 3.5 kill the USB due to how often logging writes and eventually it will fail. Unfortunately this is slow death and typically rears its ugly head with the Host going to not responding and taking its VMs with it. 4.x has proven to be much better with ESXi now utilizing the ram disk.
I run scheduled nightly config backups using PowerCLI in case we need to restore.
We run ESXi on Dell r710 on the SD card. One thing that was not obvious to me was setting up the scratch partion. Took me a long time to find any info on it. Also, could never figure out how to put the scatch space on the SD card. We set it up on a LUN. Which has worked fine for us.
http://www.vmware.com/pdf/vsphere4/r40/vsp_40_esxi_e_vc_setup_guide.pdf
Prior to setting up scratch space we would run out of room and not be able to generate log bundles or run updates.
I am working on getting a statement specifically around this DOC issue. The following KB though states that it is supported to use ESXi Installable: http://kb.vmware.com/kb/1010574
I’m working towards my certification now http://www.trainit4less.com/Pages/Virtualization-Training.aspx
Remind me – can you get the vmkcore partition on the USB stick. I don’t think you can. That means you have to put that vmkcore partition on a disk somewhere. I think by default the USB install would put it on local storage…. Technically you can run ESX with a vmkcore – but its not supported configuration. No vmkcore. No dumps…
So your article is quite dated now. Are you still on the USB bandwagon or have you found any faults with this method?
My preferred methods are:
1- Auto Deploy. See my blog section on that
2- USB
If those aren’t possible:
3- local disk
4- boot from SAN
I know this is an old thread, but there you go!
RAID on SD cards? how? never seen it! I always thought it was a single point of failure.
Hi, I know some Cisco UCS Rack mount servers have internal SD cards in RAID1 config. We have delivered this to customers.