View
199
Download
0
Category
Preview:
Citation preview
@EdMcBane
Francesco Degrassi
Enthusiastic yet pragmatic Lean Software Developer.
Uppish and cynical nihilist from time to time.
@EdMcBane
Lean Software Development and team coaching
Continuous Delivery, High availability, performance
Security sensitive & high uncertainty domains
@EdMcBane
The challenge
● Primary european client
● Innovative service for the consumer market
● Large userbase (200K+ users)
● Very high request rate
● Low latency requirement (<< RTT)
@EdMcBane
Make your assumptions explicit
and keep testing them
#1 Make your
assumptions explicitand keep challenging them
@EdMcBane
Make your assumptions explicit
and keep testing them
#2 Performance &
High Availability are not extra features
@EdMcBane
Make your assumptions explicit
and keep testing them
#3 Do not reinvent
the wheel
...but keep things simple
@EdMcBane
SO_REUSEPORT
For TCP, so_reuseport allows multiple listener sockets to be bound to the same port.
Received packets are distributed to multiple sockets bound to the same port using a 4-tuple hash.
With so_reuseport the distribution is uniform.
@EdMcBane
LESS(1) General Commands Manual LESS(1)
NAME less - opposite of more
SYNOPSIS less -? less --help less -V less --version less [-[+]aABcCdeEfFgGiIJKLmMnNqQrRsSuUVwWX~] [-b space] [-h lines] [-j line] [-k keyfile] [-{oO} logfile] [-p pattern] [-P prompt] [-t tag] [-T tagsfile] [-x tab,...] [-y lines] [-[z] lines] [-# shift] [+[+]cmd] [--] [filename]... (See the OPTIONS section for alternate option syntax with long option names.)
DESCRIPTION
LESS IS similar to MORE (1), but has many more features. Less does not have to read the entire input file before starting, so with large input files it starts up faster than text editors like vi (1). Less uses termcap (or terminfo on some systems), so it can run on
Manual page less(1) line 1 (press h for help or q to quit) .
@EdMcBane
TCP_TW_RECYCLE
Enable fast recycling TIME-WAIT sockets. Default value is 0. It should not be changed without advice/request of technical experts.
Linux will drop any segment from the remote host whose timestamp is not strictly bigger than the latest recorded timestamp
TCP_TW_RECYCLE + NAT = MADNESS
@EdMcBane
Make your assumptions explicit
and keep testing them
#5High Availability is much more than just redundancy
@EdMcBane
● Redundant hardware● Redundant software components
But there’s more!
● Graceful degradation● Incremental rollouts
Failure impact
@EdMcBane
Failure frequency
But then also:
● proven technology
● high quality hardware
● automation (to avoid errors)
@EdMcBane
● Effective monitoring○ realtime○ reliable○ understandable○ thorough○ meaningful○ actionable
● Rollback / rollforward● Automation (for speed)
Time to recover
@EdMcBane
...but be prepared to improvise
● In house experience
● Developers on call
● Drills (chaos monkeys)
Processes designed for ordinary times
are not resilient in a crisis and need to be changed.
@EdMcBane
Make your assumptions explicit
and keep testing them
#7Monitoring is essential
… and we can do way better
@EdMcBane
No one size fits all
● “Monitor everything”, like “100% test coverage” is a nice slogan.
● Each environment requires a slightly different solution
● Balance between data availability, cost and ability to keep it actionable
@EdMcBane
We are doing logging wrong
● Unstructured
● Inconsistent
● Poor defaults
● Complex, obscure components
● A huge waste of computing power
@EdMcBane
We need a complete overview
● Logs
● Metrics
● Alerts
● Together, coherent, cross-referenced
@EdMcBane
Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so.
Douglas Adams
“
”
@EdMcBane
Thanks!@EdMcBanefdegrassi@gmail.comfrancesco.degrassi@optionfactory.net
http://www.optionfactory.net/blog
Recommended