20
Performance Optimization on Huawei Public and Private Cloud Jinsong Liu <[email protected]> Lei Gong <[email protected]>

Performance Optimization on Huawei Public and Private Cloud · Performance Optimization on Huawei Public and Private Cloud Jinsong Liu Lei Gong

  • Upload
    others

  • View
    17

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Performance Optimization on Huawei Public and Private Cloud · Performance Optimization on Huawei Public and Private Cloud Jinsong Liu  Lei Gong

Performance Optimization on

Huawei Public and Private Cloud

Jinsong Liu <[email protected]>

Lei Gong <[email protected]>

Page 2: Performance Optimization on Huawei Public and Private Cloud · Performance Optimization on Huawei Public and Private Cloud Jinsong Liu  Lei Gong

2

Agenda

• Optimization for LHP

• Balance scheduling

• RTC optimization

Page 3: Performance Optimization on Huawei Public and Private Cloud · Performance Optimization on Huawei Public and Private Cloud Jinsong Liu  Lei Gong

3

Agenda

• Optimization for LHP

• Balance scheduling

• RTC optimization

Page 4: Performance Optimization on Huawei Public and Private Cloud · Performance Optimization on Huawei Public and Private Cloud Jinsong Liu  Lei Gong

4

LHP (Lock Holder Preemption)

• More obvious in virtualization

– vCPU scheduling

– Task preemption

• Then

– Potentially blocking the progress of other vCPUs waiting to

acquire the same lock

– Increasing synchronization latency

– Performance degradation

Page 5: Performance Optimization on Huawei Public and Private Cloud · Performance Optimization on Huawei Public and Private Cloud Jinsong Liu  Lei Gong

5

LHP (Lock Holder Preemption)

• How to solve or alleviate?

– PLE (Pause Loop Exiting)

– DLHS (Delay LH scheduling)

– Co-scheduling

– Balance scheduling

Page 6: Performance Optimization on Huawei Public and Private Cloud · Performance Optimization on Huawei Public and Private Cloud Jinsong Liu  Lei Gong

6

PLE

• Hardware support

– VMCS configuration

• Optimization for Lock Waiters

– VM Exit actively

– Avoid waste vCPU cycles for invalid spin

• Pros.

– Supported by upstream

• Cons.

– Setting appropriate values of ple_gap and ple_windows is difficult

• Workloads adjustment

– Find an appropriate vcpu to yield

Page 7: Performance Optimization on Huawei Public and Private Cloud · Performance Optimization on Huawei Public and Private Cloud Jinsong Liu  Lei Gong

7

DLHS (Delay Lock Holder Scheduling)

• Background & precondition

– Usually, lock holders are under interrupt disable contexts

– Normally, the period of holding lock is shortly

– Hardware support (e.g. intel VT-X)

• interrupt window exiting

– Software support

• Hrtimer, …

Page 8: Performance Optimization on Huawei Public and Private Cloud · Performance Optimization on Huawei Public and Private Cloud Jinsong Liu  Lei Gong

8

DLHS (Delay Lock Holder Scheduling)

• Solution– Set a grace period for LH before scheduling

– If one vCPU is LH • Start one hrtimer, and

• Set interrupt window exiting for VMCS

• If the hrtimer expire– Clear interrupt window

– Continue to schedule for vCPU

– Judge the vCPU release the lock• PLE happened

• Interrupt window exiting happen

• then– Cancel hrtimer

– Release grace period

– Schedule the vCPU immediately

Page 9: Performance Optimization on Huawei Public and Private Cloud · Performance Optimization on Huawei Public and Private Cloud Jinsong Liu  Lei Gong

9

0

50

100

150

200

250

300

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Un

it: s

ec

Lo

we

r is

be

tte

r

Hackbench results(CPU overcommit 1:3)

VM1-patched

VM2-patched

VM3-patched

VM1

VM2

VM3

DLHS – performance

Page 10: Performance Optimization on Huawei Public and Private Cloud · Performance Optimization on Huawei Public and Private Cloud Jinsong Liu  Lei Gong

10

Agenda

• Optimization for LHP

• Balance scheduling

• RTC optimization

Page 11: Performance Optimization on Huawei Public and Private Cloud · Performance Optimization on Huawei Public and Private Cloud Jinsong Liu  Lei Gong

11

Co-scheduling & Balance scheduling

Guest

vCPU vCPU vCPU

pCPU pCPU pCPUTim

e X

Co-scheduling Balance-scheduling

Guest

vCPU vCPU vCPU

pCPU pCPU pCPU

Disperse all vCPUs AMASRun all vCPUs on Time x

Page 12: Performance Optimization on Huawei Public and Private Cloud · Performance Optimization on Huawei Public and Private Cloud Jinsong Liu  Lei Gong

12

Co-scheduling

• CPU fragmentation

– Reduces CPU utilization

– Delay vCPU execution

• Priority inversion

– Degrades I/O performance

xxx vCPU0 xxx vCPU0

vCPU1 vCPU1I/O

T0 T1 T2 T3 T4

pCPU 0

pCPU 1

Page 13: Performance Optimization on Huawei Public and Private Cloud · Performance Optimization on Huawei Public and Private Cloud Jinsong Liu  Lei Gong

13

Balance scheduling

• Balances vCPU siblings on pCPUs

– without precisely scheduling the vCPUs simultaneously

• How to?

– Uses a bitmap to record all used pCPUs for VM

– Scheduler adjustment• Enqueue & dequeue

• Migration/find_idle_cpu/select_task_rq etc.

Page 14: Performance Optimization on Huawei Public and Private Cloud · Performance Optimization on Huawei Public and Private Cloud Jinsong Liu  Lei Gong

14

Performance evaluation

• Workload:

– Pushserver in Huawei Private Cloud

– Continuous testing for 24 hours

• Results

with balance schedwithout balance

sched1:1 vcpupin

<10ms 93.50% 70% 95.30%

93.50%

70%

95.30%

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

Pro

po

rtio

n o

f b

uild

ing

chai

ns

Proportion of building chains (higher is better)

Page 15: Performance Optimization on Huawei Public and Private Cloud · Performance Optimization on Huawei Public and Private Cloud Jinsong Liu  Lei Gong

15

Agenda

• Optimization for LHP

• Balance scheduling

• RTC optimization

Page 16: Performance Optimization on Huawei Public and Private Cloud · Performance Optimization on Huawei Public and Private Cloud Jinsong Liu  Lei Gong

16

RTC on KVM

• Windows use RTC as clock event device

• RTC emulation in Qemu, three timers

– rtc_periodic_timer• Generates periodic interrupts

• Programmable to occur according to interrupt rate

– rtc_update_timer• Generates alarm interrupts

• Occur one per second to once per day

– rtc_coalesced_timer• Generates compensation interrupts

• Slews the lost ticks since different reasons

• Getting worse and worse with the VM density increase

• Pain points– Some operations need to hold BQL

– Context switching between user space and kernel space

– Interrupt injecting from user space

– Performance degradation• Latency increase

• Windows guest density decrease

Page 17: Performance Optimization on Huawei Public and Private Cloud · Performance Optimization on Huawei Public and Private Cloud Jinsong Liu  Lei Gong

17

RTC optimizations on KVM

• Minimize influence of BQL – Placing RTC memory region outside BQL

• Using irqfd inject interrupts

• Hyperv features– hyperv clock, …

– Decreases read/write ioports

• Decreases ioport access on 0x70/0x71

• Move RTC emulation to hypervisor– Inject interrupts in KVM

– Reduce context switching

– But• Large attack surface

• Incompatible with new features, such as split irqchip

• Optimize RTC compensation solution

Page 18: Performance Optimization on Huawei Public and Private Cloud · Performance Optimization on Huawei Public and Private Cloud Jinsong Liu  Lei Gong

18

RTC compensation solution

• Slew RTC ticks in hypervisor directly

• Count the coalesced interrupts– When an RTC interrupt injecting failed

– Adjust the count when RTC interrupt rate changes

• Inject coalesced interrupts after EOI handler if exist– Don’t need a separate timer

– More timely

– Throttle the speed if there is too many coalesced interrupts

• Live migration support– Save the coalesced interrupts in src side

– Restore them in dest side

– Both KVM and Qemu need to be patched

Page 19: Performance Optimization on Huawei Public and Private Cloud · Performance Optimization on Huawei Public and Private Cloud Jinsong Liu  Lei Gong

19

Optimization evaluation

Before

optimization

After

optimization

Page 20: Performance Optimization on Huawei Public and Private Cloud · Performance Optimization on Huawei Public and Private Cloud Jinsong Liu  Lei Gong

Thank You!