<In
sert
Pic
ture
Here
>
Me
mo
ry O
ve
rco
mm
it…
wit
ho
ut
the
Co
mm
itm
en
t
Sp
eaker:
Da
n M
ag
en
heim
er
Ora
cle
Co
rpo
rati
on
2008
Memory Overcommit without the Commitment (Xen Summit 2008) -Dan Magenheimer
Overv
iew
•What is overcommitment?
•akaoversubscription or overbooking
•Why (and why not) overcommitmemory?
•Known techniques for memory overcommit
•Feedback-directed ballooning
Memory Overcommit without the Commitment (Xen Summit 2008) -Dan Magenheimer
CP
U o
verc
om
mit
men
t
On
e 4
-CP
U p
hysic
al
serv
er
Fo
ur underutilized 2
-cp
u v
irtu
al
serv
ers
� ��� � ���� ���� ���
Xe
n s
up
po
rts
CP
U
ov
erc
om
mit
me
nt
(aka
“co
nso
lid
ati
on
”)
Memory Overcommit without the Commitment (Xen Summit 2008) -Dan Magenheimer
I/O
overc
om
mit
men
t
On
e 4
-CP
U p
hysic
al
serv
er
wit
h a
1G
b N
IC
Fo
ur underutilized 2
-cp
u v
irtu
al
serv
ers
each
wit
h a
1G
b N
IC
� ��� � ���� ���� ���
Xe
n s
up
po
rts
I/O
o
ve
rco
mm
itm
en
t
Memory Overcommit without the Commitment (Xen Summit 2008) -Dan Magenheimer
Mem
ory
overc
om
mit
me
nt?
??
On
e 4
-CP
U p
hysic
al
serv
er
w/4
GB
RA
M
Fo
ur underutilized 2
-cp
u v
irtu
al
serv
ers
each
wit
h 1
GB
RA
M
X
� ���� ��� SO
RR
Y!
Xen
doesn’t
su
pp
ort
m
em
ory
overc
om
mit
men
t!
� ���
X
Memory Overcommit without the Commitment (Xen Summit 2008) -Dan Magenheimer
Wh
y d
oesn
’t X
en
overc
om
mit
mem
ory
?
•Memory is cheap –buy more
•“W
hen you overbook memory excessively, perform
ance takes a hit”
•Most consolidated workloads don’t benefit much
•Overcommit requires swapping –and Xen doesn’t do I/O in the
hypervisor
•Overcommit adds lots of complexity and latency to important
features like save/restore/migration
•Operating systems know what they are doing and canuse all the
memory they can get
Memory Overcommit without the Commitment (Xen Summit 2008) -Dan Magenheimer
Wh
y d
oesn
’t X
en
overc
om
mit
mem
ory
?
•Memory is cheap –buy more… except when you’re out of slots or need BIG dimms.
•“W
hen you overbook memory excessively, perform
ance takes a
hit”…
yes, but true of overbooking CPU and I/O too. Sometimes tradeoffs have to be made.
•Most consolidated workloads don’t benefit much…but some workloads do!
•Overcommit requires swapping –and Xen doesn’t do I/O in the
hypervisor…
only if black-box swapping is required
•Overcommit adds lots of complexity and latency to important
features like save/restore/migration… some techniques maybe… we’ll see…
•Operating systems know what they are doing and canuse all the
memory they can get…
but an idle or lightly-loaded OS may not!
Memory Overcommit without the Commitment (Xen Summit 2008) -Dan Magenheimer
Wh
y s
ho
uld
Xen
su
pp
ort
mem
ory
overc
om
mit
me
nt?
•Competitive reasons
“VMwareInfrastructure’s exclusive ability to overcommit memory gives it an
advantage in cost per VM that others can’t match” *
•High(er) density consolidation can save m
oney
•Sum of guest working sets is often smaller than
available physical memory
•Inefficient guest OS utilization of physical memory
(cacheingvs“hoarding”)
* E. Horschman, Cheap Hypervisors: A Fine Idea --If you can afford them
, blogposting,
http://blogs.vmware.com/virtualreality/2008/03/cheap-hyperviso.htm
l
Memory Overcommit without the Commitment (Xen Summit 2008) -Dan Magenheimer
Pro
ble
m s
tate
men
t
Oracle OnDemand businesses
(both internal/external):
•would like to use Oracle VM (Xen-based)
•but uses memory overcommit extensively
The Oracle VM team was asked… can we:
•implement memory overcommiton Xen?
•get it accepted upstream?
Memory Overcommit without the Commitment (Xen Summit 2008) -Dan Magenheimer
Mem
ory
Overc
om
mit
Investi
gati
on
•Technology survey
•understand known techniques and implementations
•understand what Xen has today and its limitations
•Propose a solution
•OK to place requirements on guest
•e.g. black-box solution unnecessary
•soon and good is better than late and great
•phased delivery OK if necessary
•e.g. Oracle Enterprise Linux now, Windows later
•preferably high bang for the buck
•e.g. 80% of value with 20% of cost
Memory Overcommit without the Commitment (Xen Summit 2008) -Dan Magenheimer
Tech
niq
ues f
or
mem
ory
overc
om
mit
•Ballooning
•Content-based page sharing
•VMM-driven demand paging
•Hot-plug m
emory add/delete
•Ticketed ballooning
•Swapping entire guests
Black-box or gray-box*…
or white-box?
* T. Wood, et al. Black-box and Gray-box Strategies for Virtual M
achine Migration, In Proceedings NSDI ’07.
http://www.cs.umass.edu/~twood/pubs/NSDI07.pdf
Memory Overcommit without the Commitment (Xen Summit 2008) -Dan Magenheimer
WHAT IF…..?
Operating systems were able to:
•recognize when physical memory is not being used
efficiently and communicate relevant statistics
•surrender memory when it is underutilized
•reclaim m
emory when it is needed
And Xen/domain0 could balance the allocation of
physical memory, just as it does for CPU/devices?
…. Maybe this is already possible?!?
Memory Overcommit without the Commitment (Xen Summit 2008) -Dan Magenheimer
Ballo
on
ing
(g
ray-b
ox)
•In-guest device driver
•“steals” / reclaims memory via guest in-kernel APIs
•e.g. get_free_page() and MmAllocatPagesforMdl()
•Balloon inflation increases guest memory pressure
•leverages guest native memory management algorithms
•Xenhas ballooning today
•mostly used for domain0 autoballooning
•has problems, but recent patch avoids worst*
•Vmware and KVM have it today too
Issues:
•driver must be installed
•not available during boot
•reclaim may not be fast enough; potential out-of-memory conditions
* J. Beulich, [PATCH] linux/balloon: don’t allow ballooning down a domain below a reasonable lim
it,, Xen
developers
archive, http://lists.xensource.com/archives/htm
l/xen-devel/2008-04/m
sg00143.htm
l
KV
M
Currently implemented by:
Memory Overcommit without the Commitment (Xen Summit 2008) -Dan Magenheimer
Co
nte
nt-
based
pag
e s
hari
ng
(bla
ck-b
ox)
•One physical page frame used for multiple identical pages
•sharing works both intra-guest and inter-guest
•hypervisorperiodically scans for copies and “merges”
•copy-on-write breaks share
•Investigated on Xen, but never in-tree* **
•measured savings of 4-12%
•Vmware*** has had for a long time, KVM soon ****
Issues:
•Performance cost of discovery scans, frequent share set-up/tear-down
•High complexity for relatively low gain
* J. Kloster
et al, On the feasibility of memory sharing: content based pagesharing in the xenvirtual m
achine monitor,
Technical Report, 2006. http://w
ww.cs.aau.dk/library/files/rapbibfiles1/1150283144.pdf
** G. Milos, M
emory COW in Xen, Presentation at XenSummit, Nov 2007.
http://w
ww.xen.org/files/xensummit_fall2007/18_GregorM
ilos.pdf
*** C. Waldspurger. Memory Resource Management in Vmware
ESX Server, In Proceedings OSDI’02,
http://w
ww.usenix.org/events/osdi02/tech/walspurger/waldspurger.pdf
**** A. Kivity, Memory overcommitwith kvm, http://avikivity.blogspot.com/2008/04/m
emory-overcommit-w
ith-kvm.htm
l
KV
M
Currently implemented by:
Memory Overcommit without the Commitment (Xen Summit 2008) -Dan Magenheimer
Dem
an
d p
ag
ing
(b
lack-b
ox)
•VMM reclaims m
emory and swaps to disk
•VMware
has today
•used as last resort
•randomized page selection
•Could potentially be done on Xen via
domain0
Issues:
•“Hypervisor” must have disk/net drivers
•“Semantic gap”* �
Double paging
* P. Chen et al. When Virtual is Better than Real. In Proceedings HOTOS ’01. http://w
ww.eecs.umich.edu/~pmchen/papers/chen01.pdf
KV
M
Currently implemented by:
Memory Overcommit without the Commitment (Xen Summit 2008) -Dan Magenheimer
Ho
tplu
g m
em
ory
ad
d/d
ele
te (
wh
ite-b
ox)
•Essentially just ballooning with:
•larger granularity
•less fragmentation
•potentially unlim
ited maximum memory
•no kernel data overhead for unused pages
Issues:
•Not widely available yet (for x86)
•Larger granularity
•Hotplugdelete requires defragmentation
J. Schoppet al, Resizing M
emory with Balloons and Hotplug, Ottaw
a Linux Symposium 2006,
https://ols2006.108.redhat.com/reprints/schopp-reprint.pdf
Currently implemented by:
Memory Overcommit without the Commitment (Xen Summit 2008) -Dan Magenheimer
“T
ickete
d” b
allo
on
ing
(w
hit
e-b
ox)
•Proposed by Ian Pratt*
•A ticket is obtained when a page is surrendered to
the balloon driver
•Original page can be retrieved if Xen hasn’t given
the page to another domain
•Similar to a system-wide second-chance buffer
cache or unreliable swap device
•Never implemented (afaik)
* http://lists.xensource.com/archives/html/xen-devel/2008-05/msg00321.html
Memory Overcommit without the Commitment (Xen Summit 2008) -Dan Magenheimer
Wh
ole
-gu
est
sw
ap
pin
g (
bla
ck?
-bo
x)
•Proposed by KeirFraser*
•Forced save/restore of idle/low-priority guests
•Wake-on-LAN-like m
echanism causes restore
•Never implemented (afaik)
Issues:
•Very long latency for guest resume
•Very high system I/O overhead when densely overcommitted
* http://lists.xensource.com/archives/html/xen-devel/2005-12/msg00409.html
Memory Overcommit without the Commitment (Xen Summit 2008) -Dan Magenheimer
Ob
serv
ati
on
s
•Xen balloon driver works well
•recent patch avoids O-O-M
problems
•works on hvm if pv-on-hvm drivers present
•ballooning up from “memory=xxx” to “maxmem=yyy” works (on
pvm domains)
•ballooned-down domain doesn’t restrict creation of new domains
•Linux provides lots of memory-status inform
ation
•/proc/meminfo and /proc/vmstat
•Committed_AS is a decent estimator of current memory need
•Linux does OK when put under memory pressure
•rapid/frequent balloon inflation/deflation just works… as long as
remaining available Linux memory is not toosmall
•properly configured Linux swap disk works when necessary;
obviates need for “system-wide” demand paging
•Xenstore tools work for two-way communication even in hvm
Memory Overcommit without the Commitment (Xen Summit 2008) -Dan Magenheimer
Pro
po
sed
So
luti
on
:
Feed
back-d
irecte
d b
allo
on
ing
(g
ray-b
ox)
Use relevant Linux m
emory statistics to control balloon size
•Selfballooning:
•Local feedback loop; immediate balloon changes
•Eagerly inflates balloon to create memory pressure
•No management or domain0 involvement
•Directed ballooning:
•Memory stats fed from each domainUto domain0
•Policy module in domain0 determ
ines balloon size, controls
memory pressure for each domain (not yet implemented)
Memory Overcommit without the Commitment (Xen Summit 2008) -Dan Magenheimer
Imp
lem
en
tati
on
:
Feed
back-d
irecte
d b
allo
on
ing
•No changes to Xenor domain0 kernel or drivers!
•Entirely implemented with user-land bash scripts
•Self-ballooning and stat reporting/monitoring only (for now)
•Committed_ASused (for now) as memory estimator
•Hysteresisparameters
--settable to rate-limit balloon changes
•Minimum m
emory floor enforced to avoid O-O
-M conditions
•same maxmem-dependent algorithm as recent balloon driver bugfix
•Other guest requirements:
•Properly sized and configured swap (virtual) disk for each guest
•HVM: pv-on-hvm drivers present
•Xenstore tools present (but not for selfballooning)
Memory Overcommit without the Commitment (Xen Summit 2008) -Dan Magenheimer
Feed
back-d
irecte
d B
allo
on
ing
Resu
lts
•Overcommit ratio
•7:4
w/default configuration(7 512MB loadedguests, 2GB phys memory)
•15:4
w/aggressive config(15 512MB idleguests, 2GB phys memory)
•for pvm guests, arbitrarily higher due to “maxmem=”
•Prelim
inary perform
ance
(Linux kernel make after make clean, 5 runs, mean of middle 3)
� ���S
elf
ball
oo
nin
gco
stl
y f
or
larg
e-m
em
ory
do
main
s b
ut
bare
ly n
oti
ceab
le f
or
sm
all
er-
mem
ory
do
main
s
10
96
50
12
02
80
77
51
72
se
lf
91
50
11
86
79
77
3256
Off
46
50
11
20
95
77
51024
Off
10
94
90
12
01
93
77
52
38
Se
lf
10
6
36
8
Min
(MB
)
10
96
50
12
02
80
77
5S
elf
85
20
11
67
79
77
5512
Off
10
89
40
12
09
10
47
78
Se
lf
09
54
12
17
85
2048
Off
Do
wn
Hyste
resis
Majo
r p
ag
e
fau
lts
Ela
psed
(sec)
Sys
(sec)
User
(sec)
Mem
ory
(MB
)
ballo
on
ing
27
%
slo
we
r
3%
s
low
er
Memory Overcommit without the Commitment (Xen Summit 2008) -Dan Magenheimer
Do
main
0 s
cre
en
sh
ot
wit
h m
on
ito
rin
g t
oo
l an
d x
en
top
sh
ow
ing
mem
ory
ove
rco
mm
itm
en
t
Memory Overcommit without the Commitment (Xen Summit 2008) -Dan Magenheimer
Fu
ture
Wo
rk
•Domain0 policy m
odule for directed ballooning
•some combination of directed andself-ballooning??
•Im
proved feedback / heuristics
•Combine multiple memory statistics, check idle time
•Prototype kernel changes (“white-box” feedback)
•Better “idle memory” metrics
•Benchmarking in real world
•More aggressive m
inimum m
emory experiments
•Windows support
Memory Overcommit without the Commitment (Xen Summit 2008) -Dan Magenheimer
Co
nclu
sio
ns
•Xen doesdo memory overcommit today!
•Memory overcommit has some perform
ance impact
•but still useful in environments where high VM density is more
important than max perform
ance
•Lots of cool research directions possible for
“virtualization-aware” OS memory m
anagement
<In
sert
Pic
ture
Here
>
Me
mo
ry O
ve
rco
mm
it…
wit
ho
ut
the
Co
mm
itm
en
t
Sp
eaker:
Da
n M
ag
en
heim
er
Ora
cle
Co
rpo
rati
on
2008