Infrastructure as code might be literally impossible part 2

Preview:

Citation preview

infrastructure as code might be literally impossible

part 2

joe damato packagecloud.io

part 1

bit.ly/impossible-infra

hi, i’m joei like computers

i once had a blog called timetobleed.com

@joedamato

packagecloud.io@packagecloudio

follow along

blog.packagecloud.io

hi

disclaimer

infrastructure as code might be impossible because nothing works.

cognitive load

too much stuff

coping strategies

coping w cognitive load

copy & paste configs

stackoverflow

BTWThis is actually part of another talk I’m working on called

Programmers should get paid more & work less

anw

the problem is so pronounced, that in some cases it’s impossible to do seemingly simple tasks

some examples then some thoughts.

Today’s cool stories1. SSL 2. APT 3. Linux Networking 4. Linux Threading (maybe) 5. Python packaging (maybe)

SSL

SSL is important

agreed?

Ubuntu & Debian

don’t agree

SSL doesn't work on Debian

/ Ubuntu

anw

LOL gnutls, who cares?

apt-get!git!

curl!ngIRCd!

well, actually you should use

OpenSSL

I like rabbits.

* 3. All advertising materials mentioning features or use of this * software must display the following… !

* 6. Redistributions of any form whatsoever must retain the following…

OpenSSL says…

GPL says

6. ….You may not impose any further restrictions on the recipients' exercise of the rights granted herein.

These two licenses are not compatible.

in other words

software licenses force you to use a particular SSL library with a very painful bug.

greetings

(not sayin that OpenSSL is

bug free)

(but, am sayin NSS and gnutls have less mindshare)

btw

(hi)

OK but I don’t care about SSL,

I use GPG.

NO.!plz stop.

anw

APT

file compression is important

agreed?

Ubuntu & Debian

don’t agree

(more about hash sum mismatch

later)

in other words

APT bug when decompressing XZ files makes it impossible to install software reliably

this is unfortunate due to the slow release cycle of Debian/Ubuntu updates

“SO easy, that type of work can be done over the weekend”

-o Acquire::CompressionTypes::Order::=gz

… OK … hopefully that repo has gzip’d metadata or it’s gonna be a real short

trip

anw

hash sum mismatch

have you seen it?

do you know what it

means?

do you know why it

happens?

what it means

happens all the time…

how could that happen?

one of (at least) 3 ways

1. stale cache between client/server 2. XZ decompression bug 3. apt race condition

how to avoid each1. better HTTP headers… or use SSL…. but like gnutls ?? lol

2. don’t generate XZ archives 3. ?????? race condition ??????

APT race

how it happens

1. Download + cache Release file 2. repo owner updates repo 3. Download Packages files 4. Compare checksums from the (stale) Release file against Packages file

5. hash sum mismatch

this means…

it is impossible to

1. update your repository without breaking clients

2. generate consistent mirrors of other repositories

!!!!!!this is bad!

!!!!!

but i’ve done all of these before and never had a

problem?

congrats you got lucky!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

so, wait, joe, are you saying that APT metadata is inherently racy?

yes!

and ubuntu agrees

OK so APT repos and the tools you use to generate them are fundamentally racy

so now what?

Acquire-by-hash

Acquire-by-hash• Mechanism for downloading metadata by it’s

hash sum • Server should keep “a few” older copies of

metadata around • Prevents the race condition from happening

Acquire-by-hash• Added in APT 1.2.0 • Ubuntu Xenial and newer • Debian Stretch and newer • not supported by reprepro!• not supported by aptly

only one way to get working, consistent, not

racy APT metadata

use packagecloud.io

Linux Networking

Full networking writeup

literally 90 pages

literally everything about linux networking

literally available here: http://bit.ly/linux-networking

summary

[random os] has a better/faster/leaner/whatever networking stack

than linux

lots and lots and lots of copy

paste coping

question

an answer

an other answer

yet an other answer

and on and on and on and on…

no one even knows what these

values mean

(p. much no one knows what these

values mean)

example

netdev_max_backlog

similarish explanations

what does it actually mean

tho?

If

if

• driver calls netif_receive_skb (likely) • and RPS is disabled (default)

Then

it doesn't do anything.

literally nothing. it’s not even checked.

if

• driver calls netif_rx (unlikely) • or RPS is enabled (rare for most ppl)

then data is queued to a backlog length

limited by netdev_max_backlog

coping strategies abound

here’s a coping strategy i think

is fine

curl | sudo bash

you aren’t reading all of the chef/puppet

source so what’s the difference?

(hi, be mad)

too damn hard to understand how a computer works

on that note…

Linux Threading

“threads are slow”

“context switches are expensive/

slow/…”

a 7 year old bugfix for XFree86 broke threads on Linux

Story Time

TLS segment selectors XFree86 Modules

Story Time

mmap MAP_32BIT

June 29, 2001

“This adds a new mmap flag to force mappings into the low

32bit address space. Useful e.g. for XFree86′s ELF loader or linuxthreads’ thread

local data structures.”

Nov 11, 2002

“532. Fixed module loader to map memory in the low 32bit address space on x86-64 (Egbert Eich).”

Story Time

ELF small code model 31bit mapping

Jan 4, 2003

“Make MAP_32BIT for 64bit processes only map in the first 31bit,

because it is usually used to map small model code. This fixes the X server crashes. Some cleanups in

this area.”

So: MAP_32BIT is actually MAP_31BIT

Mar 4, 2003

/* For Linux/x86-64 we have one extra requirement: the stack must be in the first 4GB. Otherwise the segment register base address is not wide enough. */

glibc

May 9, 2003

/* We prefer to have the stack allocated in the low 4GB since this allows faster context switches. */

glibc

justification for MAP_32BIT in glibc changed

Aug 13, 2008

“Pardo” report

https://lkml.org/lkml/2008/8/12/423

“Pardo” report

“Pardo” reportpardo filled the 31bit 1GB space

with thread stacks. !

subsequent allocations were doing a linear search for a free address

on the kernel side.

MAP_STACK is added.

(it does nothing)

June 29, 2001: MAP_32BIT added to kernel

Nov 11, 2002: XFree86 updated to use MAP_32BIT

time or w/e

Jan 4, 2003: MAP_32BIT updated for ELF small code

Feb 12, 2003: wrmsr slowness reported

Mar 4, 2003: MAP_32BIT added to glibc

May 9, 2003: MAP_32BIT retry added to glibc

Aug 13, 2008:“Pardo” reportAug 13, 2008: MAP_STACKAug 15, 2008: glibc updated

a few questions

how did we get here?

question

legacy code backward compat

an thought

free open source doesn’t exist

an thought

why so much copy-paste

coping?

question

necessary complexity

an thought

lack of timean thought

an aside:

but, why is there no time?

i don’t know, but could it be that

efficiency gains are captured by

management instead of engineering?

or could it be that…

working software systems aren’t

economically viable for 99% of companies?

hence why no one found that threading

bug for 5 years?

working software given complex requirements is

expensive

how much did you pay for your

an Linux?

?packagecloud.io@packagecloudio

Python Packaging

3 types of python packages

1. source distributions (sdists) 2. eggs 3. wheels

some …interesting… behavior with [-_.]

setup(name='hi_automacon', … !

!

setup(name=‘hi-automacon', … !

!

setup(name=‘hi.automacon', …

what do you think happens?

“There are only two hard things in Computer

Science: cache invalidation and naming

things.”

(literally unknown)

hi_automacon

setup.py: hi_automacon metadata: hi-automacon sdist: hi_automacon-1.0.tar.gz egg: hi_automacon-1.0-py2.7.egg wheel: hi_automacon-1.0-py2-none-any.whl

OK SO: wheels and eggs leave “_” in the filename but !translate it in the metadata to “-“ !

…. but not sdists

OK OK OK OK OK OK OK OK

thats fine not a big deal

weekend work and all that

hi-automacon

setup.py: hi-automacon metadata: hi-automacon sdist: hi-automacon-1.0.tar.gz egg: hi_automacon-1.0-py2.7.egg wheel: hi_automacon-1.0-py2-none-any.whl

OK SO: wheels and eggs translate “-“ to “_” in the filename but !leave it in the metadata !

…. but not sdists

package name file name metadata

dash underscore dash

underscore underscore dash

wheels and eggs only

sdists are WYSIWYG affff

hi.automacon

weird

• everything has ‘.’ in it • file names and metadata for all

package types

let’s curl against PyPI….

django-allauth

curl https://pypi.python.org/simple/

django-allauth/

HTTP 200

OK OK OK OK OK OK OK OK

curl https://pypi.python.org/simple/

django_allauth/

HTTP 302

< Location: /simple/ django-allauth

OK OK OK OK OK OK OK OK

curl https://pypi.python.org/simple/

django.allauth/

HTTP 200

(hi)

and now what happens if we try mixing the case?

lol maybe next time.

Recommended