15
Resource Monitoring for cgroup Hierarchies Kanaka Juvva, Intel Corporation

Resource Monitoring for cgroup Hierarchies · Current MBM/CMT design is transparent to the version of the hierarchy . Conclusions •Intel Xeon’s Platform Resource Monitoring •Linux

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Resource Monitoring for cgroup Hierarchies · Current MBM/CMT design is transparent to the version of the hierarchy . Conclusions •Intel Xeon’s Platform Resource Monitoring •Linux

Resource Monitoring for cgroup Hierarchies

Kanaka Juvva, Intel Corporation

Page 2: Resource Monitoring for cgroup Hierarchies · Current MBM/CMT design is transparent to the version of the hierarchy . Conclusions •Intel Xeon’s Platform Resource Monitoring •Linux

Agenda

• Motivation • Multi-core Cache Hierarchies

• Application Domains

• Intel Xeon’s Shared Resource Monitoring Technology

• Implementation

• Use Cases

• cgroups

• Hierarchical Monitoring

Page 3: Resource Monitoring for cgroup Hierarchies · Current MBM/CMT design is transparent to the version of the hierarchy . Conclusions •Intel Xeon’s Platform Resource Monitoring •Linux

Motivation (Why Monitoring ?)

Moore’s Law Drives Exponential growth in Compute and Data

Big Data ProliferationDynamic Data Centers CloudHPC

Multi-Core is the Horse Power behind the Information Highway

Linear Scalability for CPU boundContention for shared resources

• Impacts Scalability

Multi-core cache hierarchies

Cache Hierarchies help performance and mitigate contention

Also Data Orchestration improves Scalability to applications

core(s)core(s)core(s)core(s)core(s)core(s)core(s)

MEMORY

Page 4: Resource Monitoring for cgroup Hierarchies · Current MBM/CMT design is transparent to the version of the hierarchy . Conclusions •Intel Xeon’s Platform Resource Monitoring •Linux

Motivation Contd….

Intel® Xeon® processor Shared Resource Monitoring

LLC Occupancy

Memory Bandwidth

Monitoring at multiple levelso Thread

o Process

o APP/Workload/cgroup/Container

o VMM

o System

• Quantify Performance

C4C3C2C1

L1/L2L1/L2L1/L2L1/L2

L3- LLC Occupancy

Memory - Memory Bandwidth

APP VM CONTAINER

Page 5: Resource Monitoring for cgroup Hierarchies · Current MBM/CMT design is transparent to the version of the hierarchy . Conclusions •Intel Xeon’s Platform Resource Monitoring •Linux

Intel Xeon’s Shared Resource Monitoring

Incarnation of Intel Xeon CPUs

Platform/Shared Resource Monitoring Broadwell-EP and Broadwell-EX

Grantley and Brickland Codenames

Broadwell – EP Microarchitecture Features

o Cache Monitoring Technology

o Memory Bandwidth Monitoring

o QuickPath Interconnect:

o 2 QPI 1.1 channels

o Cores Per Socket: Up to 22

o Threads Per Socket: Up to 44

o Sockets: 2

o Last Level Cache (L3): ~55MB

BROADWELL-EP

Core Core

Core Core

Core Core

Shared Cache

QPI

QPI

2 QPI 1.1

4 Channels

DDR4

40 Lanes PCIe* 3.0

DMI2

DDR4

DDR4

DDR4

DDR4

Page 6: Resource Monitoring for cgroup Hierarchies · Current MBM/CMT design is transparent to the version of the hierarchy . Conclusions •Intel Xeon’s Platform Resource Monitoring •Linux

Intel Xeon’s Shared Resource Monitoring Technology (Contd..)

Resource monitoring capability in each logical processor i.e. core. Two monitoring technologies are supported. Cache Monitoring Technology (CMT): Allows an OS, Hypervisor or VMM to monitor the usage of cache by applications Memory Bandwidth Monitoring Technology (MBM): Allows OS, Hypervisor or VMM to monitor bandwidth from one level

of cache hierarchy to the next – L3 cache which is backed directly by system memoryEither of them or both features can be present in a given CPU model. Have some common sub-features.

Linux (OS) can check the presence of the above features via a CPUID feature bit Details of each sub-feature via CPUID leaves and sub-leaves A mechanism for the Linux/OS or VMM to set a software-defined ID known as Resource Monitoring

ID (RMID) for each software thread A mechanism to monitor cache and bandwidth stats per RMID basis Linux/OS can Read back the collected stats such as L3 occupancy and Memory BW for an RMID Different event types are to distinguish the resource metrice.g. LLC_OCCUPANCY, TOTAL_BW, LOCAL_BWSoftware configures the event selection MSR (IA32_QM_EVTSEL) to specify the event or metric to be reported. RDMSR and WRMSR instructions are used to read and write the monitoring MSRs.

Page 7: Resource Monitoring for cgroup Hierarchies · Current MBM/CMT design is transparent to the version of the hierarchy . Conclusions •Intel Xeon’s Platform Resource Monitoring •Linux

Implementation

Adoption into Linux Kernelhttps://lkml.org/lkml/2016/3/21/176https://github.com/kanakajuvva

Linux User Space APIs Perf_event APIo perf_event_open() struct perf_event

o Performance Monitoring Unit Fill and Register struct pmu Call back functions to start, read and stop stats

Perf events are exported to userland LOCAL_BWo perf stat –e intel_cqm/local_bw/ -p “pid of my_application” TOTAL_BWo perf stat –e intel_cqm/total_bw/ -p “pid of my_application”

LLC_OCCUPANCYo perf stat –e intel_cqm/llc_occupancy/ -p “pid of my_application”

Page 8: Resource Monitoring for cgroup Hierarchies · Current MBM/CMT design is transparent to the version of the hierarchy . Conclusions •Intel Xeon’s Platform Resource Monitoring •Linux

Use CasesMonitoring Applicationcpumem benchmarkoConsumes constant memory bw of ~13.7GB/Sec

13000

13100

13200

13300

13400

13500

13600

13700

13800

13900

1 6

11

16

21

26

31

36

41

46

51

56

61

66

71

76

81

86

91

96

10

1

10

6

11

1

11

6

12

1

12

6

13

1

Total BW MB/Sec

13780

13785

13790

13795

13800

13805

13810

13815

13820

13825

13830

13835

1 6

11

16

21

26

31

36

41

46

51

56

61

66

71

76

81

86

91

96

10

1

10

6

11

1

11

6

12

1

12

6

13

1

13

6

Local BW MB/Sec

Page 9: Resource Monitoring for cgroup Hierarchies · Current MBM/CMT design is transparent to the version of the hierarchy . Conclusions •Intel Xeon’s Platform Resource Monitoring •Linux

Use Cases Contd..Monitoring Multi-threaded Application cqm benchmarko 8 – 48 threads

0

5000

10000

15000

20000

25000

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82

Loca

l BW

-M

B/S

ec

Time (Secs)

CQM Benchmark - 8,12,24 and 48 Threads

8 Threads 12 Threads 24 Threads 48 Threads

0

5000

10000

15000

20000

25000

30000

35000

1 4 7 10131619222528313437404346495255586164677073767982858891

Tota

l BW

-M

B/S

ec

Time (Secs)

CQM Benchmark -- 8, 12, 24 and 48 Threads

8 Threads 12 Threads 24 Threads 48 Threads

Page 10: Resource Monitoring for cgroup Hierarchies · Current MBM/CMT design is transparent to the version of the hierarchy . Conclusions •Intel Xeon’s Platform Resource Monitoring •Linux

Use Cases Contd..Monitoring VMLight VM running Linux – 3 scenariosoVM Boot

oVM Idle

oAn APP running in VM

0

100

200

300

400

500

600

1

18

35

52

69

86

10

3

12

0

13

7

15

4

17

1

18

8

20

5

22

2

23

9

25

6

27

3

29

0

30

7

32

4

34

1

35

8

37

5

39

2

40

9

42

6

44

3

Loca

l BW

-M

B/S

ec

Time (Secs)

VM Boot

0

2

4

6

8

10

12

14

16

1

19

37

55

73

91

10

9

12

7

14

5

16

3

18

1

19

9

21

7

23

5

25

3

27

1

28

9

30

7

32

5

34

3

36

1

37

9

39

7

41

5

43

3

45

1

46

9

Loca

l BW

-M

B/S

ec

Time (Secs)

VM Idle

Page 11: Resource Monitoring for cgroup Hierarchies · Current MBM/CMT design is transparent to the version of the hierarchy . Conclusions •Intel Xeon’s Platform Resource Monitoring •Linux

Monitoring VM Contd..

0

5

10

15

20

25

30

35

40

45

1 3 5 7 9 11131517192123252729313335373941434547495153555759616365676971737577798183858789919395

Loca

l BW

-M

B/S

ec

Time (Secs)

FWTS APP

Page 12: Resource Monitoring for cgroup Hierarchies · Current MBM/CMT design is transparent to the version of the hierarchy . Conclusions •Intel Xeon’s Platform Resource Monitoring •Linux

cgroups

• Linux mechanism(s) to manage and monitor resources for processes

• cgroups blend very well with containers

• MBM and CMT counters are read per RMID cgroup perf_event subsystem is used by Linux Perf tool for reading stats

Thread level stats are maintained in driver

Obtain cgroup membership for each thread

cgroup -> RMID(s) mapping

• LLC Occupancy /Memory BW consumed by a cgroup

= ∑ Resource consumed for each RMID in cgroup

Page 13: Resource Monitoring for cgroup Hierarchies · Current MBM/CMT design is transparent to the version of the hierarchy . Conclusions •Intel Xeon’s Platform Resource Monitoring •Linux

Hierarchical Monitoring

cgroups organize processes hierarchically Form a tree structure cgroup v1 and cgroup v2

Process belongs to at least one cgroupAll threads in a process belong to same cgroup in v2Traverse from parent to child to list the threads

• Resource consumed by parent cgroup = ∑ Resource consumed for each cgroup

= ∑ ∑ Resource consumed for each RMID in cgroup

Current MBM/CMT design is transparent to the version of thehierarchy

Page 14: Resource Monitoring for cgroup Hierarchies · Current MBM/CMT design is transparent to the version of the hierarchy . Conclusions •Intel Xeon’s Platform Resource Monitoring •Linux

Conclusions

• Intel Xeon’s Platform Resource Monitoring

• Linux Implementation

•Use Cases and Examples Scenarios

•Monitoring for Linux cgroups

Page 15: Resource Monitoring for cgroup Hierarchies · Current MBM/CMT design is transparent to the version of the hierarchy . Conclusions •Intel Xeon’s Platform Resource Monitoring •Linux

Q & A

Thank you