48
CON CON - - TAIN TAIN - - ERS ERS CON-TAIN-ERS CON-TAIN-ERS CON-TAIN-ERS CON-TAIN-ERS

SiteGround Tech TeamBuilding

Embed Size (px)

Citation preview

CONCON--TAINTAIN--ERSERS CON-TAIN-ERSCON-TAIN-ERS CON-TAIN-ERSCON-TAIN-ERS

CON-TAINCON-TAIN--ERSERS CONCON--TAINTAIN-ERS-ERS CON-TAIN-ERSCON-TAIN-ERS

CONCON-TAIN-ERS-TAIN-ERS CON-TAIN-ERSCON-TAIN-ERS CON-CON-TAINTAIN--ERSERS

● chroot● namespaces● cgroups

Control Groups

What do we have?

● cpuset - whole cores and cpu mapping● cpuacct - cpu cycle accounting● cpu - less then core granularity● memory - limits and accounting● blkio - limits and accounting● net_cls - network classification ● net_prio - network priority● Freezer + checkpoint/restore - migration

General structure

● tasks– attach a task(thread) and show

list of threads

● cgroup.procs– show list of processes

# mount -t cgroup none /cgroups

# mount -t cgroup -o cpuset cpuset /cg/cpuset

How to use them?

● Create cgroup

# mkdir /cgroup/GRP● Prepare minimum limits

# echo 0-2 > /cgroup/GRP/cpuset.cpus

# echo 0-1 > /cgroup/GRP/cpuset.mems● Add a process to a cgroup:

# echo PID > /cgroup/GRP/tasks● Verify that a process is in the cgroup

# grep PID /cgroup/GRP/tasks

cpuset

● Physical CPU & Memory limits– cpuset.cpus - list of allowed CPUs– cpuset.mems - list of allowed memory slots– cpuset.cpu_exclusive - 0/1 are the CPUs

exclusive to this group– cpuset.mem_exclusive - 0/1 are the memory

slots exclusive to this group

Documentation/cgroups/cpusets.txt

CPU accounting

● cpu usage combined for all cpus (in nanoseconds)

● cpu usage per-cpu (in nanoseconds)● per cpu and user/system(in USER_HZ)

● Documentation/cgroups/cpuacct.txt

CPU

● CPU scheduler limits CONFIG_CGROUP_SCHED– cpu.shares– cpu.cfs_quota_us: in microseconds– cpu.cfs_period_us: in microseconds (default 100ms)– cpu.stat: exports throttling statistics

nr_throttled: Number of times the group has been throttled/limited.

throttled_time: The total time duration (in nanoseconds) for which entities of the group have been throttled.● Documentation/scheduler/sched-bwc.txt

CPU 3

CPU 2

CPU 0

CPU examples

CPU 1q - quatap - period

q: 500p: 500

q: 1000p: 500

q: 1500p: 500

q: 2000p: 500

# echo 250000 > cpu.cfs_quota_us# echo 500000 > cpu.cfs_period_us

q: 250p: 500

memory

Only Memory● memory.usage_in_bytes

– show current res_counter usage for memory

● memory.limit_in_bytes– set/show limit of memory usage

● memory.failcnt– show the number of memory usage hits limits

Memory + Swap● memory.memsw.usage_in_bytes● memory.memsw.limit_in_bytes● memory.memsw.failcnt

memory

Kernel Memory limits● memory.kmem.limit_in_bytes

– set/show hard limit for kernel memory

● memory.kmem.usage_in_bytes– show current kernel memory allocation

● memory.kmem.failcnt– show the number of kernel memory usage hits

limits

blkio

● blkio.weight– allowed range 10 - 1000– we use 500

● blkio.throttle.io_serviced

blkio

/ cgroup - 100% I/O/ cgroup - 100% I/O

blkio

/lxc - 90% I/O/lxc - 90% I/O

blkio

/lxc/lxc90% I/O90% I/O

/lxc/c120 50% I/O

from the 90% in /lxc for each

container

blkio

/ / 1024 1024 |- lxc/ |- lxc/ 900900| |- c120| |- c120 450450| |- c121| |- c121 450450| |- c122| |- c122 450450| |- c123| |- c123 450450

So each container can get only 50% of the total So each container can get only 50% of the total I/O of the LXC cgroupI/O of the LXC cgroup

Network

● Adding network class to each cgroup so you can later limit it with tc– Documentation/cgroups/net_cls.txt

● Prioritizing network traffic on interface– Documentation/cgroups/net_prio.txt

Freezer + CRIU

● freezer.state – ТHAWED– FREEZING– FROZEN

● freezer.self_freezing– 0 (thawed)/ 1 (frozen)

● freezer.parent_freezing – 0 if partent is frozen

● CRIU - Checkpoint and Restore

In Userspace

Linux Namespac

es

Why do we need that?

What namespaces do we have?

● UTS namespace● User namespace● PID namespace● IPC namespace● Mount namespace● Network namespace

UTS namespace

● Hostname

kernel.hostname = lxc1● Domainname

kernel.domainname = sgvps.net

Host namespace

Newnamespace

Newnamespace

Newnamespace

User namespace

User authentication and mapping files:● /etc/passwd● /etc/group● /etc/shadow

- What if we want to create a username called pesho, but such user already exists?

- What if we want to create user joan with UID 1005, but there is already user pesho with UID 1005?

IPC namespace

Unix/Linux IPCs

- unix domain sockets

- shared memory

- semaphores

- message queues

/proc/PID/fd/

|- 3 -> socket:[3537]

IPC namespace

Unix/Linux IPCs

- unix domain sockets

- shared memory

- semaphores

- message queues

key shmid owner perms bytes nattch

0x0052e2c1 1139834880 postgres 600 37879808 4

Network namespace

- IP

- IPv6

- Routing

- TCP

- UDP

- SCTP

- DCCP

- RDS

● Having а separate loopback device for a process● Or simply test the MySQL server on the same IP● Completely different routing for a process

Mount namespace

the most complex one...

having only one / is a problem...

- at around 22000 mounts everything on your machine starts to lag... no matter how many cores or ram you have :(

- having a different /proc/mounts per process would be nice and very interesting to implement... :)

PID namespace

Migration of processes between machines (CRIU)

It allows you to have a two or more processes running with the same PID.

PID - is the PID on the host machine

NSPID - is the PID that the process sees

PID NSPID

1421 5420 ssh-agent

1730 5420 xchat

1756 5420 firefox

QQUUEESSTTIIOONNSS

The NEW Backup system

The NEW Backup system

Avatar Design

Avatar MasterAvatar Master

Host ServersHost Servers Backup ServersBackup Servers

Avatar Design

Avatar MasterAvatar Master

Host ServersHost Servers Backup ServerBackup Server

Schedule backup jobs

Avatar Design

Avatar MasterAvatar Master

Host ServerHost Server Backup ServerBackup Server

Start backups

Each backup server has a limit of maximum simultaneous jobs.

- max jobs- max backups- max restores

Avatar Design

Avatar MasterAvatar Master

Host ServerHost Server Backup ServerBackup Server

Report status

each backup reports a lot of things:- thinpool data usage- mounted df output- LV df output- archive_size- broken dbs- remote_addr- user IP- exit_code- caller_pid- interface_type- archive_size- last_progress

Layerd backupsFile

Physical Volume

Volume Group

ThinPool

Logical Volume

Snapshot6

Snapshot5

Snapshot4

Snapshot3

Snapshot2

Snapshot1

Snapshot0

Loop mount

Backup Server Structure

/sdb/avatar on /var/backups type none (rw,bind)

# ls /var/backups/siteground200.com/

total 33333656

-rw------- 1 root root 32212254720 Jul 22 04:03 camerafi

-rw------- 1 root root 32212254720 Jul 22 01:36 celticc1

-rw------- 1 root root 32212254720 Jul 22 00:57 citecang

-rw------- 1 root root 32212254720 Jul 21 20:24 ecoshea5

[root@smallvault1 /]#

Backup Server Structure

# losetup -f /var/backups/siteground200.com/exaera30

# losetup -a

/dev/loop0: [0811]:909901835 (/var/backups/siteground200.com/exaera30)

# vgchange -K -ay

2 logical volume(s) in volume group "exaera30" now active

# lvs

LV VG Attr LSize Pool Origin Data% Meta%

1437516546 exaera30 Vwi-a-t--- 30.00g coregroup 2.09

coregroup exaera30 twi-a-t--- 29.82g 2.10 1.54

#

Backup Server Structure

[root@smallvault1 /]# mount /dev/exaera30/1437516546 /mnt/...

[root@smallvault1 /]# ls -l /mnt/exaera30/1437516546

total 40

drwxr-xr-x5 root root 4096 Jul 21 17:09 configs

drwxr-xr-x3 963 959 4096 Dec 23 2014 etc

drwx--x--x14963 959 4096 Dec 23 2014 home

drwx------ 2 root root 16384 Jul 21 17:09 lost+found

drwxr-x--- 9 963 959 4096 Feb 29 2012 mail

drwxr-xr-x2 root root 4096 Jul 21 17:09 mysql

drwxr-xr-x2 root root 4096 Jul 21 17:09 pgsql

[root@smallvault1 /]#

Account Backup/Restore

● Configuration– Extractor scripts– Intractor scripts

● Files● Mails● SQLs

– MySQL, mysqldump– PgSQL, pg_dump

Full server restore

Avatar MasterAvatar Master

Host ServerHost Server Backup ServerBackup Server

Report status

account 1

ns1 & ns2 restore here

account 3

Web Interface?

● Ammm...

SOON :)