View
1.213
Download
0
Category
Tags:
Preview:
DESCRIPTION
A discussion of the technical problems that resulted in adopting Python as a first-class language within Netflix's cloud environment
Citation preview
Python Through the Back Door
!
CodeMash 2014 !
Roy Rapoport @royrapoport rsr@netflix.com
www.linkedin.com/in/royrapoport
A Word About Me
!2
• About 20 years in technology • Systems engineering, networking,
software development, QA, release management
• Time at Netflix: 1655 days • Before at Netflix: Service Delivery in
the IT/Ops, troubleshooter, Builder of Python Things[tm]
• Current role: Insight Engineering •Real-Time Operational Insight
(4y:6m:11d)
Stories We Tell
!3
• Technical Problems • Howler Monkey • Alerting
• Python As a First-Class Language • Culture and People
ProblemPythonPeople
“Netflix Company Profile Now via self service*
!4
People
Go to your favorite Python REPL and type the following:
import re, requests!content = requests.get(“http://ir.netflix.com").content!content = content.replace(“ ", " ")!p = re.compile(r”.*over (\d+) .*in (\d+)”, re.S)!m = re.match(p, content)!print "Netflix is the world's leading Internet \! subscription service for enjoying TV and movies, \! with more than {} million subscribers in {} \! countries.”.format(m.group(1), m.group(2))!
*No whining. Remember that you’ll never again need to wait for me to update this slide like you had to wait for database access when you started your last job.”
- Jay Zarfoss, http://www.slideshare.net/zarfide
Design Your Culture for Desired Outcomes
!5
1. Speed of innovation!2. Availability!3. Cost
People
Design For What’s Important
!6
Freedom and Responsibility!Hire Smart Experienced People!
Set them Loose!Watch Magic Happen
People
PoliciesRaise your hand if you love them
People
Policies (How They Usually Work)
���8
People
Policies (How They Usually Work)
���9
11/27/2006 “Sorry, but the standard monitor...is the HP 17" flat panel. I actually told a director last week that they couldn't have a 19" for a new office so I am not picking on just you.”
People
Policies (How They Usually Work)
���10
!6/18/2007
“There is a request for quantity 2 17” flat panels. We have received direction from the CIO that no one will have more than 1 flat panel monitor. I just wanted to let you know that there will only be one monitor ordered ... The 17” is our only standard except for Legal.”
People
Policies (How They Usually Work)
���11
•Prescriptive •Inflexible •Determined by others •Slow to change
People
Policies @nflx
���12
People
Policies @nflx
���13
!01/30/2013, 15:22 PST
I'd like to request a 15” MBP w/ Retina Display. I don't know how much you guys care about CPU specs -- it looks like the bump from 2.3GHz to 2.6GHz is reasonably priced at only about $100, so if it works for you that'd be nice. 16GB RAM and at least 512GB drive. !
01/31/2013 12:00 PST: “Forwarded to IT Purchasing to provide a quote to Roy for the requested configuration.” 13:33 PST: “Requesting quote from vendor” 15:32 PST: “Attached is the quote, please approve and I’ll place order” 15:46 PST: “Thanks for the rapid response. Please order.” 15:52 PST: “Ordered. PO #...”
People
Policies @nflx
���14
• Descriptive
• As flexible as we are
• Describe what we choose to do/get
• Evolve quickly
People
The Before Time
���15
Dozens of SSL Certificates
Decentralized
Kept Expiring
Hilarity would ensue
Amazon Resources
“No Preset Limit”
You know when you hit it
Hilarity would ensue
Problem
The Before Time
���16
• Well-developed Developer Ecosystem
• Discovery
• DB Client
• Credentials Management
• Memory Object Cache
• Server Infrastructure
• Telemetry
• You wanted that for Java, right?
Python
The Before Time
���17
• Just moved from IT/Ops
• Formally tasked with SSL cert issue as quarterly goal
• Limits issue “tacked” on
• Happily hackily Pythonic
• Didn’t know JavaPresenter Selfie
ProblemPython
Architecture
���18
ELB
EC2
Filesystem
IP Range
DNS Domain
Cassandra
Certificate
Nagger
CherryPy
Problem
7/10/2011 Ready for beta
Persistence• Started with SimpleDB
• Then Cassandra
• Drove creation of …
• import Discovery
• import Cassandra
• And a design error!19
Python
Abstraction
!20
Python
• “The process of separating ideas from specific instances of those ideas at work.”
• Some abstraction: Good
• Too much abstraction burns your tongue*
• Known bug
* Mixed metaphor is mixed
Architecture
���21
Problem
Architecture
���22
Problem
Alerting
���23
Problem
• Enterprise IT Solution • Managed by the Enterprise IT Alerting People • File Tickets • Send alerts to NOC • Completely separate from telemetry system
Copyright USAID Microlinks. CC Attribution 2.0
Alerting
���24
Problem
• Enterprise IT Solution • Managed by the Enterprise IT Alerting People • File Tickets • Send alerts to NOC • Completely separate from telemetry system
Copyright: http://www.flickr.com/photos/s_w_ellis
CC Attribution 2.0 License
Alerting
���25
Problem
•Already had a good telemetry system
•Outsourced notification to PagerDuty
•No alert routing (and deduplication)
Monitoring Alerting Notification
Alerting
���26
People
•Space crunch •New cube mate: @jedberg •One Month Deadline
Alerting
���27
Problem
alerting
api
Central Alert
Gateway
Pager Duty
Amazon SES
Atlas
Let’s Wake Someone Up (Livecoding for Fun and Profit)
But Now We Need…
���28
Python
•import Discovery.publish
•import EVCache
•import EpicMetrics
•import Archaius
•import Asgard.Registry
•import AKMS
AKMS?
���29
Python
In [1]: import AKMS!In [2]: ak = AKMS.AKMS("RoyWasHere")!In [3]: ak.keys()!Out[3]: ['MLQBAYLLDIGXPBQB', ‘eMr+Mdhv+E4xD+paPCxXF+’]!In [4]: a, s = ak.keys()!In [5]: s3_object = boto.connect_s3(a, s)!In [6]: ak = AKMS.AKMS("RoyWasHere", version=2)!In [7]: ak.keys()!Out[7]: [‘yn[…]G’, ‘rV[…]bKfSUHDSA’, ‘reallyLongStringElided']!In [8]: ak.expiration!Out[8]: 1389165118!In [9]: a, s, s2 = ak.keys()!In [10]: s3_object = boto.connect_s3(a, s, security_token=s2)
So AKMS
���30
People
•Server more paranoid than most
•Making Python library was a pain
•Remember Jay?
•High lateral trust
•Prioritization autonomy
•Never ask for permission
Lateral Trust
���31
People
•Humans are good game players
•What are the rules?
•Zero-sum games: I want you to lose
•Stack ranking
•Fixed bonus / raise pools
Lateral Trust @nflx
���32
People
•No fixed pools for anything
•No ranking (at all)
•Reviews != raises
•Smart people generally make good decisions
•Global optimization
Subordinate Trust @nflx
���33
People
•Focus on results
•Unleash employees
•Encourage disagreement
•Accept dissent
•Job #1: Attract and retain world-class talent
Manager Trust @nflx
���34
People
•Question, question, question
•Drive for context, not decisions
•Nobody is above questioning
Field of Dreams
���35
Python
•Turned out I wasn’t the only one •Striking the right balance between MVP and future growth (maybe)
•And if it hadn’t … it’d still have been the right choice
A Virtuous Cycle
���36
Python
•Requirement for high impact •No process for permission •Unorthodox language choice •Lateral support for development •Increased adoption •… •Profit!*
PeopleProblem
* (or at least a new standard)
!37
http://bit.ly/netflixcmpython
Tell me what you think. You know you want to.
Attributions
���38
http://www.flickr.com/photos/watchsmart/
http://www.flickr.com/photos/yaketyyakyak/
Pem Dorjee Sherpa
Recommended