Upload
movilforum
View
624
Download
6
Tags:
Embed Size (px)
Citation preview
Configuring a compute node
for NFV
Network Innovation &
Virtualisation Global CTO Unit
_Antonio López Gracia
9 Jun 2015
2Configuring a compute node for NFV
HW & SW environment
BIOS setup
Installation of OS and required SW packages
IOMMU IOTLB cache support
Enabling IOMMU
Enabling hugepages
CPU isolation
Deactivating KSM
Enabling SR-IOV
Pre-provision of Linux bridges
Additional configuration to allow access from openvim
Compute node configuration in special cases
Available automation scripts in OpenMANO github
3HW & SW environment
Hardware:
Servers with Xeon E5-based Intel processors with Ivy Bridge or Haswell
architecture and 2 sockets
- Recommended at least 4 cores per socket
- Recommended at least 64 GB RAM per host
- Lab: HP DL380 Gen9 and Dell R720/R730 servers …
Data plane: 10Gbps NICs supported by DPDK, equally distributed between
NUMAs
- Lab: HP 560 SFP+, Intel Niantic and Fortville families NICs
Control plane: 1Gbps NICs
Software:
64bits OS with KVM, qemu and libvirt (i.e. RHEL7, Ubuntu Server 14.04,
CentOS 7) with kernel support of huge pages IOTLB cache in IOMMU
- Lab: RHEL 7.1
4BIOS setup
Enter the BIOS and ensure that all virtualization options are active:- Enable all Intel vt-x (processor virtualization) and vt-d (pci passthrough) options if present. Sometimes they
are grouped together just as “Virtualization Options”
- Enable SR-IOV if present as an option
- Verify processors are configured for maximum performance (no power savings)
- Enable hyperthreading (recommended)
If virtualization options are active, the following command should give a non
empty output:$ egrep "(vmx|svm)" /proc/cpuinfo
If hyperthreading is active, the following command should give a non empty
output:$ egrep ht /proc/cpuinfo
5Installation of OS and required SW packages
Install RHEL7.1 with the following options:%packages
@base
@core
@development
@network-file-system-client
@virtualization-hypervisor
@virtualization-platform
@virtualization-tools
Install the following packages:$ sudo yum install -y screen virt-manager ethtool gcc gcc-c++ xorg-x11-xauth xorg-x11-xinit xorg-x11-
deprecated-libs libXtst guestfish hwloc libhugetlbfs-utils libguestfs-tools policycoreutils-python
6IOMMU IOTLB cache support
Use a kernel with support of huge pages IOTLB cache in IOMMU.
From vanilla kernel 3.14 this support is included. In case you are using an older
kernel, you should update your kernel.
Some distribution kernels have ported this requirement.
Find out if the kernel of the distribution you are using has this support:
RHEL 7.1 kernel (3.10.0-229.el7.x86_64) meets the requirement
RHEL 7.0 requires a specific upgrade tu support the requirement. You can
upgrade the kernel as follows:$ wget http://people.redhat.com/~mtosatti/qemu-kvm-take5/kernel-3.10.0-123.el7gig2.x86_64.rpm
$ sudo rpm -Uvh kernel-3.10.0-123.el7gig2.x86_64.rpm --oldpackage
8Enabling hugepages (I)
Enable 1G hugepages, by adding the following to the grub command linedefault_hugepagesz=1G hugepagesz=1G
The number of huge pages can be also set at grub:hugepages=24 (reserves 24GB)
Or with a oneshot service that runs on boot (for early memory allocation):$ sudo vi /usr/lib/systemd/system/hugetlb-gigantic-pages.service[Unit]
Description=HugeTLB Gigantic Pages Reservation
DefaultDependencies=no
Before=dev-hugepages.mount
ConditionPathExists=/sys/devices/system/node
ConditionKernelCommandLine=hugepagesz=1G
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/lib/systemd/hugetlb-reserve-pages
[Install]
WantedBy=sysinit.target
9Enabling hugepages (II)
Then set the number of huge pages:$ sudo vi /usr/lib/systemd/hugetlb-reserve-pages#!/bin/bash
nodes_path=/sys/devices/system/node/
if [ ! -d $nodes_path ]; then
echo "ERROR: $nodes_path does not exist"
exit 1
fi
reserve_pages()
{
echo $1 > $nodes_path/$2/hugepages/hugepages-1048576kB/nr_hugepages
}
# This example reserves 12 pages of huge pages on each numa node
reserve_pages 12 node0
reserve_pages 12 node1
And enable the service:$ sudo chmod +x /usr/lib/systemd/hugetlb-reserve-pages
$ sudo systemctl enable hugetlb-gigantic-pages
Recommended best practice: reserve 4GB per NUMA to run the OS and use all
other system memory for 1GB huge pages
Mount huge pages in /etc/fstab:$ sudo echo "nodev /mnt/huge hugetlbfs pagesize=1GB 0 0" >> /etc/fstab
10CPU isolation
Isolate CPUs so that the host OS is restricted to run only on some cores,
leaving the others to run VNFs in exclusive
Recommended best practice: run the OS on the first core of each NUMA node,
by adding the isolcpus field to the grub command line.isolcpus=1-9,11-19,21-29,31-39
The exact CPU numbers depend on the CPU numbers presented by the host OS. In the previous example, CPUs
0, 10, 20 and 30 are excluded because CPU 0 and its sibling 20 correspond to the first core of NUMA node 0,
and CPU 10 and its sibling 30 correspond to the first core of NUMA node 1.
Running this awk script suggest the value to use in your compute node:$ gawk 'BEGIN{pre=-2;} ($1=="processor"){pro=$3;} ($1=="core" && $4!=0){ if (pre+1==pro){endrange="-" pro}
else{cpus=cpus endrange sep pro; sep=","; endrange="";}; pre=pro;} END{printf("isolcpus=%s\n",cpus
endrange);}' /proc/cpuinfo
isolcpus=2-35,38-71
11Dedicated resource allocation
CPU
QPI
I/O
device
I/O
device
Core Core Core CoreCore
Core Core Core CoreCore
ME
MO
RY
I/O
device
I/O
device
CPU
I/O
device
I/O
device
Core Core Core CoreCore
Core Core Core CoreCore
I/O
device
I/O
device
ME
MO
RY
• CPUs: not oversubscribed, isolated from host OS
• Memory: huge pages
• I/O devices: passthrough, SR-IOV
Host OS + Hypervisor VM 1 VM 2 VM 3Not used
6
VM 5VM 4
12Activating grub changes for iommu, huge pages and isolcpus
In RHEL 7/CentOS OS$ sudo vi /etc/default/grubGRUB_TIMEOUT=5GRUB_DEFAULT=savedGRUB_DISABLE_SUBMENU=trueGRUB_TERMINAL_OUTPUT="console"GRUB_CMDLINE_LINUX="rd.lvm.lv=rhel/swap crashkernel=auto rd.lvm.lv=rhel/root rhgb quiet intel_iommu=on default_hugepagesz=1G hugepagesz=1G isolcpus=2-35,38-71"GRUB_DISABLE_RECOVERY="true"
Update grub - BIOS:$ sudo grub2-mkconfig -o /boot/grub2/grub.cfg
Update grub - EFI:$ sudo grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg
-Don’t forget to reboot the system.
- After boot check that all options were applied on boot:$ cat /proc/cmdlineBOOT_IMAGE=/vmlinuz-3.10.0-229.el7.x86_64 root=/dev/mapper/rhel_nfv105-root ro rd.lvm.lv=rhel_nfv105/swap crashkernel=auto rd.lvm.lv=rhel_nfv105/root rhgb quiet intel_iommu=ondefault_hugepagesz=1G hugepagesz=1G isolcpus=2-35,38-71
13Deactivating KSM (Kernel Same-page Merging)
KSM enables the kernel to examine two or more already running programs
and compare their memory. If any memory regions or pages are identical, KSM
reduces multiple identical memory pages to a single page. This page is then
marked copy on write. If the contents of the page is modified by a guest virtual
machine, a new page is created for that guest virtual machine.
KSM has a performance overhead which may be too large for certain
environments or host physical machine systems.
KSM can be deactivated by stopping the ksmtuned and the ksm service.
Stopping the services deactivates KSM but does not persist after restarting.# service ksmtuned stop
Stopping ksmtuned: [ OK ]
# service ksm stop
Stopping ksm: [ OK ]
Persistently deactivate KSM with the chkconfig command. To turn off the
services, run the following commands:# chkconfig ksm off
# chkconfig ksmtuned off
14Enabling SR-IOV
SR-IOV enabling depends on the NIC used
For Intel Niantic and Fortville NICs, the number of VF enabled is defined by
writing on
echo X > /sys/bus/pci/devices/<PF pci address>/sriov_numvfs
Recommended best practice: set the number of Vfs per PF by using udev rules:$ cat /etc/udev/rules.d/pci_config.rules
ACTION=="add", KERNEL=="0000:05:00.0", SUBSYSTEM=="pci", RUN+="/usr/bin/bash -c 'echo 8 >
/sys/bus/pci/devices/0000:05:00.0/sriov_numvfs'"
ACTION=="add", KERNEL=="0000:05:00.1", SUBSYSTEM=="pci", RUN+="/usr/bin/bash -c 'echo 8 >
/sys/bus/pci/devices/0000:05:00.1/sriov_numvfs'"
ACTION=="add", KERNEL=="0000:0b:00.0", SUBSYSTEM=="pci", RUN+="/usr/bin/bash -c 'echo 8 >
/sys/bus/pci/devices/0000:0b:00.0/sriov_numvfs'“
…
Blacklist the ixgbevf module, by adding the following to the grub command line.
This must be done after adding this host to the openvim, but not before. The
reason for blacklisting this driver is because it causes that the vlan tags of
broadcast packets are not properly removed when received by an SRIOV portmodprobe.blacklist=ixgbevf (on grub boot line)
15Pre-provision of Linux bridges (I)
OpenMANO relies on Linux bridges to interconnect VMs when there are no
high performance requirements for I/O. This is the case of control plane VNF
interfaces that are expected to carry a small amount of traffic.
A set of Linux bridges must be created on every host. Every Linux bridge must
be attached to a physical host interface with a specific VLAN. In addition, a
external management switch must be used to interconnect those physical host
interfaces. Bear in mind that the host interfaces used for data plane VM
interfaces will be different from the host interfaces used for control plane VM
interfaces.
Currently OpenMANO configuration uses 20 bridges named virbrMan1 to
virbrMan20, using vlan tags 2001 to 2020 respectively to interconnect VNF
elements
Another bridge called virbrInf with vlan tag 1001 is used to interconnect
physical infrastructure (hosts, switches and management VMs like openMANO
itself, in case of running virtualized)
16Pre-provision of Linux bridges (II)
To create a bridge in RHEL 7.1 two files must be defined in
/etc/sysconfig/network-scripts:$ cat /etc/sysconfig/network-scripts/ifcfg-virbrMan1
DEVICE=virbrMan1
TYPE=Bridge
ONBOOT=yes
DELAY=0
NM_CONTROLLED=no
USERCTL=no
$ cat /etc/sysconfig/network-scripts/ifcfg-em2.2001
DEVICE=em2.2001
ONBOOT=yes
NM_CONTROLLED=no
USERCTL=no
VLAN=yes
BOOTPROTO=none
BRIDGE=virbrMan1
The host interface (em2 in the example), the name of the bridge (virbrMan1)
and the VLAN tag (2001) can be different. In case you use a different name for
the bridge, you should take it into account in 'openvimd.cfg'
17Additional configuration to allow access from openvim (I)
Uncomment the following lines of /etc/libvirt/libvirtd.conf to allow external
connection to libvirtd:unix_sock_group = "libvirt"
unix_sock_rw_perms = "0770"
unix_sock_dir = "/var/run/libvirt"
auth_unix_rw = "none“
Create and configure a user for openvim access. A new user must be created
to access the compute node from openvim. The user must belong to group
libvirt, and other users must be able to access its home:#creates a new user
$ sudo useradd -m -G libvirt <user>
#or modified an existing user
$ sudo usermod -a -G libvirt <user>
# Allow other users to access /home/<user>
$ sudo chmod +rx /home/<user>
18Additional configuration to allow access from openvim (II)
Copy the ssh key of openvim into compute node. From the machine where
openvim is running (not from the compute node), run:openvim $ ssh-keygen #needed for generate ssh keys if not done before
openvim $ ssh-copy-id <user>@<compute host>
After that, ensure that you can access directly without password prompt from
openvim to compute host:openvim $ ssh <user>@<compute host>
Create a local folder for image storage and grant access from openvim:
Images will be stored in a remote shared location accessible by all compute nodes.
This can be a NFS file system for example. The VNFs description will contain a path to
images stored in this folder. Openvim assumes that images are stored here and copied
to a local file system path at virtual machine creation. The remote shared
configuration is outside the scope of the compute node configuration, as it is required
only by the VNF descriptors.
19Additional configuration to allow access from openvim (III)
A local folder must be created (in default configuration, we assume
/opt/VNF/images) where the deployed VMs will be copied, and access must
be granted to libvirt group in a SElinux system. In the automation script we
assume that "/home" contains more disk space than "/", so a link to a local
home folder is created:$ mkdir -p /home/<user>/VNF_images
$ rm -f /opt/VNF/images
$ mkdir -p /opt/VNF/
$ ln -s /home/<user>/VNF_images /opt/VNF/images
$ chown -R <user> /opt/VNF
# SElinux management
$ semanage fcontext -a -t virt_image_t "/home/<user>/VNF_images(/.*)?"
$ cat /etc/selinux/targeted/contexts/files/file_contexts.local |grep virt_image
$ restorecon -R -v /home/<user>/VNF_images
20Compute node configuration in special cases (I)
Datacenter with different types of compute nodes:
In a datacenter with different types of compute nodes, it might happen that compute
nodes use different interface naming schemes. In that case, you can take the most
used interface naming scheme as the default one, and make an additional
configuration in the compute nodes that do not follow the default naming scheme.
In order to do that, you should create the file hostinfo.yaml file inside the image local
folder (e.g. typically /opt/VNF/images). It contains entries with:
openvim-expected-name: local-iface-name
For example, if openvim contains a network using macvtap to the physical interface
em1 (macvtap:em1) but in this compute node the interface is called eno1, creates a
local-image-folder/hostinfo.yaml file with this content:
em1: eno1
21Compute node configuration in special cases (II)
Compute nodes in a development workstation
If a normal workstation is used to develop VNFs (as in this training) some of the
compute node requirements should not be configured, as VNF performance is not a
possible target.
In order to get a working development environment:
• Do not configure huge pages, as it would substract memory for the development
environment
• Do not configure isolcpus, as it would substract CPUs for the development
environment
• Do not configure SR-IOV interfaces, as normally 10GB data plane interfaces won’t
be available
22Available automation scripts in OpenMANO github
Automate all operations from previous slides with Telefonica NFV Reference
Lab recommended best practices
https://github.com/nfvlabs/openmano/blob/master/scripts/configure-
compute-node-RHEL7.1.sh
Personalize RHEL7.1 on compute nodes
Prepared to work with the following network card drivers:
- tg3 driver for management interfaces
- ixgbe and i40e driver for data plane interfaces
https://github.com/nfvlabs/openmano/blob/master/scripts/configure-
compute-node-develop.sh
For develop workstations, without isolcpus, huge pages, data plane
interfaces