Everybody talks about the benefits of Transparent Page Sharing in vSphere environments and how they reduce the amount of memory needed in your vSphere host. Thanks to Transparent Page Sharing memory overcommit in a production environment has become mainstream.
Not long ago I’ve written these two posts:
– “Memory overcommit in production” http://www.gabesvirtualworld.com/memory-overcommit-in-production-yes-yes-yes/ which explains how Transparent Page Sharing works and how to use it in production.
– Another post of mine explains more on memory compression and how ESX starts ballooning and swapping when there is memory contention, see: http://www.gabesvirtualworld.com/memory-management-and-compression-in-vsphere-4-1/.
While writing that post I started thinking about the impact of Large Pages and Transparent Page Sharing on memory usage and how this would make it a more difficult in your day to day admin job to monitor your real free memory on the ESX host. This could therefore lead to less VMs per host.
Let me explain what the problem is.
No Transparent Page Sharing with Large Pages
When Large Pages are enabled in the guest OS and the VM is running on a host that supports Large Pages, ESX will not perform Transparent Page Sharing on the VM’s memory.
As you can read in my post on memory compression, the way memory management in ESX works is that after writing 4K pages to physical memory, these pages are then indexed and Transparent Page Sharing (TPS) will make sure only unique pages are stored in physical memory. Should there be memory pressure on the host’s memory; ESX will look inside the VMs using the ballooning driver to reclaim idle memory. If that is not enough to reduce the memory pressure, ESX 4.1 will try to compress the 4k pages and if these compressed pages are smaller than 2k, they will be stored in the VMs compression cache (part of the VMs memory). Otherwise the pages will be swapped out to disk. This cache inside the VMs memory is normally limited to 10% of the VMs memory. If compressed pages don’t fit in the VM compression cache, the page is decompressed and swapped out to disk.
With Large Pages however, ESX will not use Transparent Page Sharing on these Large Pages, since chances are small that multiple pages of 2MB are equal to each other and big savings could be made. ESX does however build an index based on 4K pages and when there is memory pressure and ballooning kicks in, those 2MB pages are broken into 4K pages to be used for Transparent Page Sharing.
Monitoring your memory usages
The result of this behavior is that it becomes more difficult to see how much free memory you have in your environment. Keep in mind that normally not all of the Guest memory is stored in Large Pages. Read this post from my “Open Line” colleague Dominique Hermans on Large Pages for Windows http://techdom.nl/.
But what potentially can happen is that with a number of VMs running on a host, that shared memory ratio is very low and vCenter will start warning about high memory usage of the host memory. Just before you order a new host you notice that adding extra VMs on the host doesn’t change the memory usage of the host, because adding these VMs forced breaking down the Large Pages and with Transparent Page Sharing kicking in, the memory savings become bigger.
An important metric in esxtop that can give you an idea on how much extra memory could potentially be won, is as Duncan explained, the COWH value (Copy on Write Pages hints – amount of memory in MB that is potentially shareable).
Real World impact?
So how big is the ‘issue’ I’m telling you about? Well, it depends very much on what kind of VMs you are running. Remember that the OS is rarely stored in Large Pages. Only applications running on top of the OS benefit from and use Large Pages.
Running 10 VMs with MSSQL on it on one host, chances are that you have quite some memory stored in large pages and none of this application memory will be shared. So impact is quite big and it will be much more difficult to see the real memory usage than when just one or two applications are using large pages.
In these scenario’s you have to watch your real memory usage very carefully and check the esxtop value COWH before ordering a new host. With this post “Large Pages, Transparent Page Sharing and how they influence the consolidation ratio” I want to make clear that the consolidation ratio is influenced when you use your normal tools and only look in vCenter to check your memory usage. Using many applications with Large Pages enabled, needs better tools.
In short, using Large Pages is a trade-off between memory savings and performance benefits. Will the use of Large Pages really bring any benefit to your application? Did you ever test it?
Edit: Seeing some reactions and having a chat with Duncan, I see I might have not been completely clear.
When saying “Large Pages is a trade-off between memory savings and performance benefits”, I’m not saying that using Large Pages is a performance impact !!! Large Pages perform better than small pages if the Guest OS and application can handle them. But, when using Large Pages the monitoring of your memory usage with the standard tools, gives you the wrong impression of memory load on your host.
When saying “Will the use of Large Pages really bring any benefit to your application” I’m questioning if for every application that can do Large Pages you should use it. Running SQL Server using 48GB of RAM will certainly benefit from Large Pages, doing so for a small SQL Server might not bring the performance benefit, but does give you unclear sight of real memory usage.
Edit 2: Forbes Guthrie wrote a reply blogpost that maybe says even better what I was trying to say :-) Read it at: Larges Pages – a problem of perception and measurement.
Also read these replies from Frank and Duncan on this post:
Frank Denneman: RE: IMPACT OF LARGE PAGES ON CONSOLIDATION RATIOS
Duncan Epping: Re: Large Pages (@gabvirtualworld @frankdenneman @forbesguthrie)
Extra info to read:
Good discussion on VMware communities | http://communities.vmware.com/thread/211585?start=15&tstart=0 |
Large Page Performance white paper | http://www.vmware.com/files/pdf/large_pg_performance.pdf |
How cool is TPS? | http://www.yellow-bricks.com/2011/01/10/how-cool-is-tps/ |
How many pages can be shared if Large Pages are broken up? | http://www.yellow-bricks.com/2010/11/07/how-many-pages-can-be-shared-if-large-pages-are-broken-up/ |
KB Article 1020524 (TPS and Nehalem) | http://www.yellow-bricks.com/2010/05/27/kb-article-1020524-tps-and-nehalem/ |
Just for completeness sake: http://frankdenneman.nl/2011/01/re-impact-of-large-pages-on-consolidation-ratios/nnI guess the main conclusion is something I disagree with. Using Large Pages to back small pages has proved itself many times, I have conducted multiple performance tests in past with regards to for instance XenApp and Large Pages increased usercount with roughly 10-15% and CPU was generally less spikey as well.nnIt might slightly complicate monitoring but as Frank also says vSphere is perfectly capable of handling Large Page and breaking them up if and when required due to memory pressure.
Let me clarify that by the way, COWH shows you the amount of memory in MB that potentially can be collapsed when Large Pages that are containing Small Pages are broken up to Small Pages.nnThere is a difference in type of pages and how they are backed. I am preparing a post on that as we speak.
“doing so for a small SQL Server might not bring the performance benefit, but does give you unclear sight of real memory usage.”nnThat is not entirely correct. Even with “small on large” you will benefit from less TLB misses!
My response.nnhttp://www.yellow-bricks.com/2011/01/26/re-large-pages-gabvirtualworld-frankdenneman-forbesguthrie/