21
Evaluation of X32 ABI for Virtualization and Cloud Jun Nakajima Intel Corporation 1 1 Wednesday, August 29, 12

Evaluation of X32 ABI for Virtualization and Cloud

Embed Size (px)

DESCRIPTION

As the Linux 3.4 has x32 ABI support, we think it's the right time for us to evaluate x32 ABI for virtualization and the cloud. The x32 ABI provides a model in x86-64 psABI with 32-bit address space, with 16 64-bit integer registers (8 additional integer registers), 8 additional SSE registers, etc., compared to the conventional x86 32-bit ABI. We look at performance improvements and memory saving gained by x32 applications, and discuss how x32 can be helpful for Xen and the clouds.

Citation preview

Page 1: Evaluation of X32 ABI for Virtualization and Cloud

Evaluation of X32 ABI for Virtualization and Cloud

Jun Nakajima

Intel Corporation

1

1Wednesday, August 29, 12

Page 2: Evaluation of X32 ABI for Virtualization and Cloud

Agenda

• What is X32 ABI?• Benefits for Virtualization & Cloud• Evaluation of X32• Summary

2

2Wednesday, August 29, 12

Page 3: Evaluation of X32 ABI for Virtualization and Cloud

x32 ABI Basics• x86-64 ABI but with 32-bit longs and pointers

• 64-bit arithmetic

• Fully utilize modern x86

• 8 additional integer registers (16 total)

• 8 additional SSE registers

• SSE for FP math

• 64-bit kernel is required- Linux kernel 3.4 has support for x32

3

Same memory footprint as x86 with advantages of x86-64. No hardware changes are required.

3Wednesday, August 29, 12

Page 4: Evaluation of X32 ABI for Virtualization and Cloud

i386 x86-64 x32

Integer registers 6 15 15

FP registers 8 16 16

Pointers 4 bytes 8 bytes 4 bytes

64-bit arithmetic No Yes Yes

Floating point x87 SSE SSE

Calling convention Memory Registers Registers

PIC prologue 2-3 insn None None

ABI Comparison

4

4Wednesday, August 29, 12

Page 5: Evaluation of X32 ABI for Virtualization and Cloud

The x32 Performance Advantage• In the order of Expected Contribution:

- Efficient Position Independent Code- Efficient Function Parameter Passing- Efficient 64-bit Arithmetic- Efficient Floating Point Operations

5

X32 is expected to give a 10-20% performance boost for C, C++

5Wednesday, August 29, 12

Page 6: Evaluation of X32 ABI for Virtualization and Cloud

Efficient Position Independent Code

i386 psABI call __i686.get_pc_thunk.cx addl $_GLOBAL_OFFSET_TABLE_, %ecx movl!y@GOT(%ecx), %eax movl!x@GOT(%ecx), %edx movl!(%eax), %eax imull! (%edx), %eax movl!z@GOT(%ecx), %edx movl!%eax, (%edx) ret

__i686.get_pc_thunk.cx: movl!(%esp), %ecx ret

extern int x, y, z;void foo () { z = x * y; }

x32 psABI movl x@GOTPCREL(%rip), %edx movl y@GOTPCREL(%rip), %eax movl (%rax), %rax imull (%rdx), %rax movl z@GOTPCREL(%rip), %edx movl %rax, (%rdx) ret

6

X32 PIC code is shorter and faster

6Wednesday, August 29, 12

Page 7: Evaluation of X32 ABI for Virtualization and Cloud

Efficient Function Parameter Passing

i386 psABI subl $28, %esp movl 32(%esp), %eax movl %eax, 8(%esp) movl 40(%esp), %eax movl %eax, 4(%esp) movl 36(%esp), %eax movl %eax, (%esp) call bar addl $28, %esp ret

void bar (int x, int y, int z);void foo (int x, int y, int z) { bar (y, z, x); }

x32 psABI subl $8, %esp movl %edi, %eax movl %esi, %edi movl %edx, %esi movl %eax, %edx call bar addl $8, %esp ret

7

X32 passes parameters in registers

7Wednesday, August 29, 12

Page 8: Evaluation of X32 ABI for Virtualization and Cloud

Efficient 64-bit Integer Arithmeticextern long long x, y, z;

void foo () { z = x * y; }

8

i386 psABI ! movl!x+4, %edx! movl!y+4, %eax! imull! y, %edx! imull! x, %eax! leal!(%edx,%eax), %ecx! movl!y, %eax! mull!x! addl!%ecx, %edx! movl!%eax, z! movl!%edx, z+4! ret

x32 psABI! movq!x(%rip), %rax! imulq! y(%rip), %rax! movq!%rax, z(%rip)! ret

X32 provides very efficient 64-bit integer support (3 instructions vs. 10 instructions).

8Wednesday, August 29, 12

Page 9: Evaluation of X32 ABI for Virtualization and Cloud

Efficient Floating Point Operation

i386 psABI subl $12, %esp movsd bar, %xmm0 mulsd %xmm0, %xmm0 movsd %xmm0, (%esp) fldl (%esp) addl $12, %esp ret

extern double bar;float foo () { return bar * bar; }

x32 psABI movsd bar(%rip), %xmm0 mulsd %xmm0, %xmm0 ret

9

X32 doesn’t use X87 to return FP values

9Wednesday, August 29, 12

Page 10: Evaluation of X32 ABI for Virtualization and Cloud

Agenda

• What is X32 ABI?• Benefits for Virtualization & Cloud• Evaluation of X32• Summary

10

10Wednesday, August 29, 12

Page 11: Evaluation of X32 ABI for Virtualization and Cloud

Characteristics of Virtualization

• TLB misses in guests can be more expensive- Need to walk through both guest page tables

and HAP (Hardware Assisted Paging) page tables

• Utilization of cache can be higher because of additional components- Hypervisor, driver domains- Other VMs

11

11Wednesday, August 29, 12

Page 12: Evaluation of X32 ABI for Virtualization and Cloud

Advantage of x32 in Virtualization

• Over x86- Generic performance advantages of x32- Better cache utilization

• fewer instructions, passing arguments in registers

• Over x86-64- Efficient guest page table structures

• Only 4GB virtual address space– Single PML4 and 4 PDP directory entries in a row

- Better cache utilization• 4-byte pointers

12

12Wednesday, August 29, 12

Page 13: Evaluation of X32 ABI for Virtualization and Cloud

Advantage of x32 in Cloud

• Performance/memory is crucial in Cloud- 32-bit address space is still sufficient for

many apps- Use memory for data, not for 8-byte pointers

• Use unified (64-bit) kernel- Use x32 ABI for 32-bit apps- Traditional x86 and x86-64 apps can run as

well

13

Add x32 support to the list of OS images for Cloud

13Wednesday, August 29, 12

Page 14: Evaluation of X32 ABI for Virtualization and Cloud

How should we use x32 in Xen?

• x32 apps in HVM Linux- Xen doesn’t need to know about x32

• x32 apps in PV Linux- Performance with x32 can regress because

of 64-bit PV issues- Should be run in HVM container

• x32 PV Linux- 32-bit PV Linux with more registers used- Practically insignificant

14Run x32 apps in HVM

14Wednesday, August 29, 12

Page 15: Evaluation of X32 ABI for Virtualization and Cloud

Agenda

• What is X32 ABI?• Benefits for Virtualization & Cloud• Evaluation of X32• Summary

15

15Wednesday, August 29, 12

Page 16: Evaluation of X32 ABI for Virtualization and Cloud

Configurations

• VM- HVM (Linux) with 1GB/4GM memory- VCPUs (1, 2)- i386, x32, x86-64 apps

• Simple Web Applications- mysql-server (5.1.52) (run remotely, 64-bit)- apache (2.2.22), php (5.4.4), varnish

• Compiled for i386, x32, x86-64 ABI

16

16Wednesday, August 29, 12

Page 17: Evaluation of X32 ABI for Virtualization and Cloud

Simple Web Apps (1)

$ ab -c $concurrency -n $requests http://$guestip/drupal/index.php

17

0"

50"

100"

150"

200"

250"

100/1000" 10/1000"

i386"

x86_64"

x32"

8000#

8050#

8100#

8150#

8200#

8250#

#1000/100000#

i386#

x86_64#

x32#

No Varnish Enable X-Drupal-Cache Enable Varnish Enable X-Drupal-Cache

Request (/sec.) (Higher is better)1GB, 2 VCPUs 1GB, 1 VCPU

Concurrency/Requests

17Wednesday, August 29, 12

Page 18: Evaluation of X32 ABI for Virtualization and Cloud

Simple Web Apps (2)$ ab -c $concurrency -n $requests http://$guestip/drupal/index.php

18

180.00%

185.00%

190.00%

195.00%

200.00%

205.00%

210.00%

215.00%

220.00%

225.00%

230.00%

100/1000% 10/1000%

x86_64%

x32%

No Varnish Enable X-Drupal-Cache

4GB, 2 VCPUsRequest (/sec.) (Higher is better)

Concurrency/Requests

18Wednesday, August 29, 12

Page 19: Evaluation of X32 ABI for Virtualization and Cloud

Issues Found

• Some apps are not x32 clean- Code assumes 64-bit long and pointers using __amd64__ or __x86_64__

- Assembly code that assumes 64-bit long and pointers

19

19Wednesday, August 29, 12

Page 20: Evaluation of X32 ABI for Virtualization and Cloud

Agenda

• What is X32 ABI?• Benefits for Virtualization & Cloud• Evaluation of X32• Summary

20

20Wednesday, August 29, 12

Page 21: Evaluation of X32 ABI for Virtualization and Cloud

Summary

• Generic advantages of x32 apps- See for details and status of x32 ABI:

• https://sites.google.com/site/x32abi/

• Virtualization and Cloud can take more advantage of x32 apps

• Preliminary results look promising- x32 was always better than i386 or x86-64

with simple Web apps*

• Build and test more Web apps

21*: Intel internal measurements21Wednesday, August 29, 12