23
Configuring a compute node for NFV Network Innovation & Virtualisation Global CTO Unit _ Antonio López Gracia [email protected] 9 Jun 2015

3. configuring a compute node for nfv

Embed Size (px)

Citation preview

Configuring a compute node

for NFV

Network Innovation &

Virtualisation Global CTO Unit

_Antonio López Gracia

[email protected]

9 Jun 2015

2Configuring a compute node for NFV

HW & SW environment

BIOS setup

Installation of OS and required SW packages

IOMMU IOTLB cache support

Enabling IOMMU

Enabling hugepages

CPU isolation

Deactivating KSM

Enabling SR-IOV

Pre-provision of Linux bridges

Additional configuration to allow access from openvim

Compute node configuration in special cases

Available automation scripts in OpenMANO github

3HW & SW environment

Hardware:

Servers with Xeon E5-based Intel processors with Ivy Bridge or Haswell

architecture and 2 sockets

- Recommended at least 4 cores per socket

- Recommended at least 64 GB RAM per host

- Lab: HP DL380 Gen9 and Dell R720/R730 servers …

Data plane: 10Gbps NICs supported by DPDK, equally distributed between

NUMAs

- Lab: HP 560 SFP+, Intel Niantic and Fortville families NICs

Control plane: 1Gbps NICs

Software:

64bits OS with KVM, qemu and libvirt (i.e. RHEL7, Ubuntu Server 14.04,

CentOS 7) with kernel support of huge pages IOTLB cache in IOMMU

- Lab: RHEL 7.1

4BIOS setup

Enter the BIOS and ensure that all virtualization options are active:- Enable all Intel vt-x (processor virtualization) and vt-d (pci passthrough) options if present. Sometimes they

are grouped together just as “Virtualization Options”

- Enable SR-IOV if present as an option

- Verify processors are configured for maximum performance (no power savings)

- Enable hyperthreading (recommended)

If virtualization options are active, the following command should give a non

empty output:$ egrep "(vmx|svm)" /proc/cpuinfo

If hyperthreading is active, the following command should give a non empty

output:$ egrep ht /proc/cpuinfo

5Installation of OS and required SW packages

Install RHEL7.1 with the following options:%packages

@base

@core

@development

@network-file-system-client

@virtualization-hypervisor

@virtualization-platform

@virtualization-tools

Install the following packages:$ sudo yum install -y screen virt-manager ethtool gcc gcc-c++ xorg-x11-xauth xorg-x11-xinit xorg-x11-

deprecated-libs libXtst guestfish hwloc libhugetlbfs-utils libguestfs-tools policycoreutils-python

6IOMMU IOTLB cache support

Use a kernel with support of huge pages IOTLB cache in IOMMU.

From vanilla kernel 3.14 this support is included. In case you are using an older

kernel, you should update your kernel.

Some distribution kernels have ported this requirement.

Find out if the kernel of the distribution you are using has this support:

RHEL 7.1 kernel (3.10.0-229.el7.x86_64) meets the requirement

RHEL 7.0 requires a specific upgrade tu support the requirement. You can

upgrade the kernel as follows:$ wget http://people.redhat.com/~mtosatti/qemu-kvm-take5/kernel-3.10.0-123.el7gig2.x86_64.rpm

$ sudo rpm -Uvh kernel-3.10.0-123.el7gig2.x86_64.rpm --oldpackage

7Enabling IOMMU

Enable IOMMU, by adding the following to the grub command lineintel_iommu=on

8Enabling hugepages (I)

Enable 1G hugepages, by adding the following to the grub command linedefault_hugepagesz=1G hugepagesz=1G

The number of huge pages can be also set at grub:hugepages=24 (reserves 24GB)

Or with a oneshot service that runs on boot (for early memory allocation):$ sudo vi /usr/lib/systemd/system/hugetlb-gigantic-pages.service[Unit]

Description=HugeTLB Gigantic Pages Reservation

DefaultDependencies=no

Before=dev-hugepages.mount

ConditionPathExists=/sys/devices/system/node

ConditionKernelCommandLine=hugepagesz=1G

[Service]

Type=oneshot

RemainAfterExit=yes

ExecStart=/usr/lib/systemd/hugetlb-reserve-pages

[Install]

WantedBy=sysinit.target

9Enabling hugepages (II)

Then set the number of huge pages:$ sudo vi /usr/lib/systemd/hugetlb-reserve-pages#!/bin/bash

nodes_path=/sys/devices/system/node/

if [ ! -d $nodes_path ]; then

echo "ERROR: $nodes_path does not exist"

exit 1

fi

reserve_pages()

{

echo $1 > $nodes_path/$2/hugepages/hugepages-1048576kB/nr_hugepages

}

# This example reserves 12 pages of huge pages on each numa node

reserve_pages 12 node0

reserve_pages 12 node1

And enable the service:$ sudo chmod +x /usr/lib/systemd/hugetlb-reserve-pages

$ sudo systemctl enable hugetlb-gigantic-pages

Recommended best practice: reserve 4GB per NUMA to run the OS and use all

other system memory for 1GB huge pages

Mount huge pages in /etc/fstab:$ sudo echo "nodev /mnt/huge hugetlbfs pagesize=1GB 0 0" >> /etc/fstab

10CPU isolation

Isolate CPUs so that the host OS is restricted to run only on some cores,

leaving the others to run VNFs in exclusive

Recommended best practice: run the OS on the first core of each NUMA node,

by adding the isolcpus field to the grub command line.isolcpus=1-9,11-19,21-29,31-39

The exact CPU numbers depend on the CPU numbers presented by the host OS. In the previous example, CPUs

0, 10, 20 and 30 are excluded because CPU 0 and its sibling 20 correspond to the first core of NUMA node 0,

and CPU 10 and its sibling 30 correspond to the first core of NUMA node 1.

Running this awk script suggest the value to use in your compute node:$ gawk 'BEGIN{pre=-2;} ($1=="processor"){pro=$3;} ($1=="core" && $4!=0){ if (pre+1==pro){endrange="-" pro}

else{cpus=cpus endrange sep pro; sep=","; endrange="";}; pre=pro;} END{printf("isolcpus=%s\n",cpus

endrange);}' /proc/cpuinfo

isolcpus=2-35,38-71

11Dedicated resource allocation

CPU

QPI

I/O

device

I/O

device

Core Core Core CoreCore

Core Core Core CoreCore

ME

MO

RY

I/O

device

I/O

device

CPU

I/O

device

I/O

device

Core Core Core CoreCore

Core Core Core CoreCore

I/O

device

I/O

device

ME

MO

RY

• CPUs: not oversubscribed, isolated from host OS

• Memory: huge pages

• I/O devices: passthrough, SR-IOV

Host OS + Hypervisor VM 1 VM 2 VM 3Not used

6

VM 5VM 4

12Activating grub changes for iommu, huge pages and isolcpus

In RHEL 7/CentOS OS$ sudo vi /etc/default/grubGRUB_TIMEOUT=5GRUB_DEFAULT=savedGRUB_DISABLE_SUBMENU=trueGRUB_TERMINAL_OUTPUT="console"GRUB_CMDLINE_LINUX="rd.lvm.lv=rhel/swap crashkernel=auto rd.lvm.lv=rhel/root rhgb quiet intel_iommu=on default_hugepagesz=1G hugepagesz=1G isolcpus=2-35,38-71"GRUB_DISABLE_RECOVERY="true"

Update grub - BIOS:$ sudo grub2-mkconfig -o /boot/grub2/grub.cfg

Update grub - EFI:$ sudo grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg

-Don’t forget to reboot the system.

- After boot check that all options were applied on boot:$ cat /proc/cmdlineBOOT_IMAGE=/vmlinuz-3.10.0-229.el7.x86_64 root=/dev/mapper/rhel_nfv105-root ro rd.lvm.lv=rhel_nfv105/swap crashkernel=auto rd.lvm.lv=rhel_nfv105/root rhgb quiet intel_iommu=ondefault_hugepagesz=1G hugepagesz=1G isolcpus=2-35,38-71

13Deactivating KSM (Kernel Same-page Merging)

KSM enables the kernel to examine two or more already running programs

and compare their memory. If any memory regions or pages are identical, KSM

reduces multiple identical memory pages to a single page. This page is then

marked copy on write. If the contents of the page is modified by a guest virtual

machine, a new page is created for that guest virtual machine.

KSM has a performance overhead which may be too large for certain

environments or host physical machine systems.

KSM can be deactivated by stopping the ksmtuned and the ksm service.

Stopping the services deactivates KSM but does not persist after restarting.# service ksmtuned stop

Stopping ksmtuned: [ OK ]

# service ksm stop

Stopping ksm: [ OK ]

Persistently deactivate KSM with the chkconfig command. To turn off the

services, run the following commands:# chkconfig ksm off

# chkconfig ksmtuned off

14Enabling SR-IOV

SR-IOV enabling depends on the NIC used

For Intel Niantic and Fortville NICs, the number of VF enabled is defined by

writing on

echo X > /sys/bus/pci/devices/<PF pci address>/sriov_numvfs

Recommended best practice: set the number of Vfs per PF by using udev rules:$ cat /etc/udev/rules.d/pci_config.rules

ACTION=="add", KERNEL=="0000:05:00.0", SUBSYSTEM=="pci", RUN+="/usr/bin/bash -c 'echo 8 >

/sys/bus/pci/devices/0000:05:00.0/sriov_numvfs'"

ACTION=="add", KERNEL=="0000:05:00.1", SUBSYSTEM=="pci", RUN+="/usr/bin/bash -c 'echo 8 >

/sys/bus/pci/devices/0000:05:00.1/sriov_numvfs'"

ACTION=="add", KERNEL=="0000:0b:00.0", SUBSYSTEM=="pci", RUN+="/usr/bin/bash -c 'echo 8 >

/sys/bus/pci/devices/0000:0b:00.0/sriov_numvfs'“

Blacklist the ixgbevf module, by adding the following to the grub command line.

This must be done after adding this host to the openvim, but not before. The

reason for blacklisting this driver is because it causes that the vlan tags of

broadcast packets are not properly removed when received by an SRIOV portmodprobe.blacklist=ixgbevf (on grub boot line)

15Pre-provision of Linux bridges (I)

OpenMANO relies on Linux bridges to interconnect VMs when there are no

high performance requirements for I/O. This is the case of control plane VNF

interfaces that are expected to carry a small amount of traffic.

A set of Linux bridges must be created on every host. Every Linux bridge must

be attached to a physical host interface with a specific VLAN. In addition, a

external management switch must be used to interconnect those physical host

interfaces. Bear in mind that the host interfaces used for data plane VM

interfaces will be different from the host interfaces used for control plane VM

interfaces.

Currently OpenMANO configuration uses 20 bridges named virbrMan1 to

virbrMan20, using vlan tags 2001 to 2020 respectively to interconnect VNF

elements

Another bridge called virbrInf with vlan tag 1001 is used to interconnect

physical infrastructure (hosts, switches and management VMs like openMANO

itself, in case of running virtualized)

16Pre-provision of Linux bridges (II)

To create a bridge in RHEL 7.1 two files must be defined in

/etc/sysconfig/network-scripts:$ cat /etc/sysconfig/network-scripts/ifcfg-virbrMan1

DEVICE=virbrMan1

TYPE=Bridge

ONBOOT=yes

DELAY=0

NM_CONTROLLED=no

USERCTL=no

$ cat /etc/sysconfig/network-scripts/ifcfg-em2.2001

DEVICE=em2.2001

ONBOOT=yes

NM_CONTROLLED=no

USERCTL=no

VLAN=yes

BOOTPROTO=none

BRIDGE=virbrMan1

The host interface (em2 in the example), the name of the bridge (virbrMan1)

and the VLAN tag (2001) can be different. In case you use a different name for

the bridge, you should take it into account in 'openvimd.cfg'

17Additional configuration to allow access from openvim (I)

Uncomment the following lines of /etc/libvirt/libvirtd.conf to allow external

connection to libvirtd:unix_sock_group = "libvirt"

unix_sock_rw_perms = "0770"

unix_sock_dir = "/var/run/libvirt"

auth_unix_rw = "none“

Create and configure a user for openvim access. A new user must be created

to access the compute node from openvim. The user must belong to group

libvirt, and other users must be able to access its home:#creates a new user

$ sudo useradd -m -G libvirt <user>

#or modified an existing user

$ sudo usermod -a -G libvirt <user>

# Allow other users to access /home/<user>

$ sudo chmod +rx /home/<user>

18Additional configuration to allow access from openvim (II)

Copy the ssh key of openvim into compute node. From the machine where

openvim is running (not from the compute node), run:openvim $ ssh-keygen #needed for generate ssh keys if not done before

openvim $ ssh-copy-id <user>@<compute host>

After that, ensure that you can access directly without password prompt from

openvim to compute host:openvim $ ssh <user>@<compute host>

Create a local folder for image storage and grant access from openvim:

Images will be stored in a remote shared location accessible by all compute nodes.

This can be a NFS file system for example. The VNFs description will contain a path to

images stored in this folder. Openvim assumes that images are stored here and copied

to a local file system path at virtual machine creation. The remote shared

configuration is outside the scope of the compute node configuration, as it is required

only by the VNF descriptors.

19Additional configuration to allow access from openvim (III)

A local folder must be created (in default configuration, we assume

/opt/VNF/images) where the deployed VMs will be copied, and access must

be granted to libvirt group in a SElinux system. In the automation script we

assume that "/home" contains more disk space than "/", so a link to a local

home folder is created:$ mkdir -p /home/<user>/VNF_images

$ rm -f /opt/VNF/images

$ mkdir -p /opt/VNF/

$ ln -s /home/<user>/VNF_images /opt/VNF/images

$ chown -R <user> /opt/VNF

# SElinux management

$ semanage fcontext -a -t virt_image_t "/home/<user>/VNF_images(/.*)?"

$ cat /etc/selinux/targeted/contexts/files/file_contexts.local |grep virt_image

$ restorecon -R -v /home/<user>/VNF_images

20Compute node configuration in special cases (I)

Datacenter with different types of compute nodes:

In a datacenter with different types of compute nodes, it might happen that compute

nodes use different interface naming schemes. In that case, you can take the most

used interface naming scheme as the default one, and make an additional

configuration in the compute nodes that do not follow the default naming scheme.

In order to do that, you should create the file hostinfo.yaml file inside the image local

folder (e.g. typically /opt/VNF/images). It contains entries with:

openvim-expected-name: local-iface-name

For example, if openvim contains a network using macvtap to the physical interface

em1 (macvtap:em1) but in this compute node the interface is called eno1, creates a

local-image-folder/hostinfo.yaml file with this content:

em1: eno1

21Compute node configuration in special cases (II)

Compute nodes in a development workstation

If a normal workstation is used to develop VNFs (as in this training) some of the

compute node requirements should not be configured, as VNF performance is not a

possible target.

In order to get a working development environment:

• Do not configure huge pages, as it would substract memory for the development

environment

• Do not configure isolcpus, as it would substract CPUs for the development

environment

• Do not configure SR-IOV interfaces, as normally 10GB data plane interfaces won’t

be available

22Available automation scripts in OpenMANO github

Automate all operations from previous slides with Telefonica NFV Reference

Lab recommended best practices

https://github.com/nfvlabs/openmano/blob/master/scripts/configure-

compute-node-RHEL7.1.sh

Personalize RHEL7.1 on compute nodes

Prepared to work with the following network card drivers:

- tg3 driver for management interfaces

- ixgbe and i40e driver for data plane interfaces

https://github.com/nfvlabs/openmano/blob/master/scripts/configure-

compute-node-develop.sh

For develop workstations, without isolcpus, huge pages, data plane

interfaces