View
193
Download
0
Category
Preview:
Citation preview
INFRASTRUCTURE FOR DECISION MAKERSQuestions for a better architecture
Eric Lubow @elubow #ddsea15
PERSONAL VANITY
๏ CTO of SimpleReach
๏ Co-Author of Practical Cassandra
๏ Skydiver, Mixed Martial Artist, Motorcyclist, Dog Dad (IG: @charliedognyc), NY Giants fan
Eric Lubow @elubow #ddsea15
SIMPLEREACH
๏ Identify the best content
๏ Use engagement metrics
๏ Stream processing ingest
๏ Many metrics, time sliced
๏ Multiple data stores
Eric Lubow @elubow #ddsea15
What do you mean infrastructure?
Eric Lubow @elubow #ddsea15
๏ Architects
๏ CTOs
๏ Lead Developers
๏ Developers
๏ Basically everyone
WHO IS MAKING THESE DECISIONS?
Eric Lubow @elubow #ddsea15
YOU WOULDN’T BUILD SOFTWARE WITHOUT PLANNING
FIRST, SO WHY WOULD YOU BUILD AN ARCHITECTURE
WITHOUT PLANNING?
Eric Lubow @elubow #ddsea15
๏ Architectures get built ad hoc
๏ Pieces tend to be built as needed and not always thought out
๏ Many lead developers don’t have a lot of architecture experience
๏ We don’t live in a perfect world and are usually time bound
๏ Product needs to be built and we’ll figure out the rest later (technical debt)
REALITY OF THE SITUATION
Eric Lubow @elubow #ddsea15
What are we actually going to talk about today?
Eric Lubow @elubow #ddsea15
๏ Hardware
๏ Cloud
๏ Databases
๏ Message Systems
๏ Scale/Scaling
๏ Costs
๏ Compliance
๏ Development ease
๏ Authentication
FRAMEWORK FOR BUILDING
๏ Developer / Operational Capabilities
๏ Available Support
๏ Monitoring / Instrumentation
๏ Testing / Staging / QA
๏ Repeatability of Systems
๏ Safety nets
๏ Pressure valves
๏ Administration ease
๏ Authorization
Eric Lubow @elubow #ddsea15
WHY SHOULDN’T I LEAVE RIGHT NOW
Eric Lubow @elubow #ddsea15
๏ Unsexy talks can have good information
๏ Understanding these concepts can save lots of technical debt
๏ There are lessons learned from not knowing which to ask questions
๏ I’m kind of entertaining
๏ In case I’m not entertaining, I’ll use some entertaining pictures
๏ I’m going to tell you a story
REASONS TO LISTEN
Eric Lubow @elubow #ddsea15
HOW DID SIMPLEREACH GET FROM …
Eric Lubow @elubow #ddsea15
TO …
Business/Application/Translation/Data Access
Router/Load Balancer/Config/Authentication
SERVICE SERVICE SERVICE SERVICE
SERVICE SERVICE SERVICE SERVICE
Redshift
Platform
Eric Lubow @elubow #ddsea15
๏ Allows people to use a common language when discussing or solving problems
๏ Allows a common toolset for solving problems
๏ Simplifies difficult tasks
๏ Every language has frameworks: Ruby/Rails, Python/Django, Javascript/Ember.js
๏ Attempts to answer the questions:
๏ How should I do this?
๏ Is this a good idea?
๏ Is this the right tool?
WHY ARE FRAMEWORKS IMPORTANT
Eric Lubow @elubow #ddsea15
Eric Lubow @elubow #ddsea15
๏ Where is this going to live?
๏ How do I get data in?
๏ How am I going to store the data?
๏ How do I move data around?
BASIC QUESTIONS
๏ How should data look coming out?
๏ How do I get data out?
๏ How do I know if something is wrong?
๏ How do I maintain/scale/build?
Eric Lubow @elubow #ddsea15
๏ Is this going on the cloud? Amazon, Google, Azure, Rackspace?
๏ Do you need to be in a data center?
๏ Are APIs important?
๏ What kind of distribution of services / fault tolerance needs to be
available?
๏ What kind of SLAs do you need to meet (100% uptime)?
WHERE IS THIS GOING TO LIVE?
Eric Lubow @elubow #ddsea15
HOW DO I GET DATA IN?
๏ Build apps that follow the same paradigm
๏ POST data to an end point
๏ Consume off a queue
๏ Use message systems for queueing
๏ Message aggregation for efficiency
๏ Message sampling for throttles
๏ Try to avoid talking directly to a database from client facing applications
๏ Write your own client driver to talk to your architecture
Eric Lubow @elubow #ddsea15
HOW AM I GOING TO STORE THE DATA?
Eric Lubow @elubow #ddsea15
๏ What’s the latest cool technology?
CHOOSING A DATABASE IS EASY, #AMIRITE
๏ What is my data volume?
๏ What are my query patterns?
๏ Is my data (un)structured?
๏ Will data remain consistent?
๏ Am I read heavy or write heavy?
๏ Am I batch loading data?
๏ Is eventually consistent data ok?
๏ Can I have a DR plan?
๏ Legal/compliance requirements?
๏ Are there experts/enterprise support?
๏ What’s the community like?
๏ Easy to administer?
๏ Tooling, monitoring, language support?
๏ Cloud or iron?
๏ High volume ingestion or batch loading?
๏ Fault tolerance?
๏ Open source vs enterprise system?
๏ Employee learning curve vs. learning cost?
Eric Lubow @elubow #ddsea15
HOW DO I MOVE DATA AROUND?ROAD METAPHOR:
๏Messages = Cars
๏Message System = Highway / Roads
๏Database = Parking Lot
๏Cache = Cell Phone Lot
๏Commerce/Industry = Worker/Consumer/Analyzer
๏Enrichment = Gas Station
Eric Lubow @elubow #ddsea15
๏ Only recently starting to become part of important discussions
๏ Provide consistent interfaces between disparate systems
๏ Clients can have minimal architecture knowledge
๏ Everyone can speak the same language (JSON, please not XML)
๏ Allow for high availability
๏ Help minimize the cost of downtime
๏ Control data flow patterns
๏ Makes [horizontal] scaling easier
๏ Enrichment/in-stream modifications of data
๏ Instrument and monitor data states between systems
MESSAGE SYSTEMS ARE MY FAV
Eric Lubow @elubow #ddsea15
๏ Distributed and de-centralized topology
๏ At least once delivery guaranteed
๏ Multi-cast style message routing
๏ Simple to configure and deploy
๏ All for zero-downtime maintenance windows
๏ Ephemeral channels for testing data
๏ Channel sampling
NSQ
nsq.io
Eric Lubow @elubow #ddsea15
HOW SHOULD DATA LOOK COMING OUT?
๏ Agree on a data format?
๏ XML, JSON, AVROJSON
๏ Again, please don’t use XML
๏ HATEOAS - heavy lift but decent client support
๏ What meta data should be sent with the response?
๏ How can unnecessary calls to an API be mitigated?
Eric Lubow @elubow #ddsea15
HOW DO I GET DATA OUT?
๏ Monolithic service architecture
๏ REST interface through a single URL to ask for data?
๏ Many micro-service end points?
๏ HTTP / RPC / THRIFT
๏ JSON API / HATEOS / CUSTOM
๏ How many libraries need to be written, tested and maintained?
Eric Lubow @elubow #ddsea15
And now back to our story…
Eric Lubow @elubow #ddsea15
SIMPLEREACH CONTEXT
๏ 100 million URLs
๏ 300 million Tweets
๏ 50k - 100k events per second (tens of billions of events per day)
๏ 200G new per hour
๏ 700T of total data (10T per month)
๏ 10T of hot data
๏ 2-3T of daily log data
๏ Excludes all monitoring data
Eric Lubow @elubow #ddsea15
Solr
Solr
Vertica + Cassandra
Vertica + Cassandra
Vertica
Mongo
Eric Lubow @elubow #ddsea15
STREAM-BASED DATA COLLECTION
Internet
Edge
Inte
rnal A
PI
Solr
C*
Mongo
Redis
Vertica
API
Fire Hose
App
Co
nsu
me
rs
Qu
eu
e
Eric Lubow @elubow #ddsea15
NEED FOR SPEED
๏ Concurrency
๏ Compiled code is much faster
๏ Statically typed languages make for less unexpected error situations
๏ Still speaks every other interchange language
๏ Cleaner code
Eric Lubow @elubow #ddsea15
MICROSERVICES: THE NEW HOTNESS!
๏ Fine grained, clearly scoped services
๏ Break 1 thing != break #allthethings
๏ Better fault isolation
๏ Easier to create throttles/release
valves
๏ Better able to monitor more
granularly
๏ Made everyone more devopsy
MICROSERVICES: THE NEW HOTNESS?
๏ Strict micro-service setups have
large database overheads
๏ Testing/deployments are more
complex
๏ More general overhead
๏ Slow down developer time
๏ Service discovery
Pros Cons
Eric Lubow @elubow #ddsea15
HYBRID MICRO-SERVICE / SHARED LIBRARY
Business/Application/Translation/Data Access
Router/Load Balancer/Config/Authentication
SERVICE SERVICE SERVICE SERVICE
SERVICE SERVICE SERVICE SERVICE
Redshift
Platform
Eric Lubow @elubow #ddsea15
GENERIC SERVICE AND DATA FLOW
Redshift
Data Access Layer
Business Logic
NSQApplication Layer
NSQ
rou
ter
au
th
Data Access Layer
Business Logic
NSQApplication Layer
NSQ
Data Access Layer
Business Logic
NSQApplication Layer
NSQ Data Access Layer
Business Logic
NSQApplication Layer
NSQ
logstash
Eric Lubow @elubow #ddsea15
SMART ROUTER
๏ Handles service state and service registry/discovery information
๏ Canonical reference for all things platform
๏ Prevents older versions of services from re-appearing
๏ Highly available proxy application
๏ Has burst-able capacity to mitigate DoS
๏ Auto-scaling tier
Eric Lubow @elubow #ddsea15
BUSINESS LOGIC LAYER
๏ Contains thicker macro services
๏ Aggregates common features and functionality
๏ Permissioning/throttling/access restrictions
๏ Centrally handling trigger events
๏ Exposing various API end points
๏ Orchestrating calls to the DAL
Eric Lubow @elubow #ddsea15
DATA ACCESS LAYER
๏ Responsible for CRUD
๏ Houses many of the data models
๏ Responsible for balancing throughput of data in/out of databases
๏ Minimize the number of DB connections by using pooling
Eric Lubow @elubow #ddsea15
HYBRID MICRO-SERVICE / SHARED LIBRARY
Redshift
Platform
WebApp 1 WebApp 2 Python App Go App
Ingestion Stream
Proxy/Router
Ingestion Stream
Ingestion Stream
Eric Lubow @elubow #ddsea15
SMILEY HAPPY PEOPLE
Eric Lubow @elubow #ddsea15
HOW DO I KNOW IF SOMETHING IS WRONG
๏ Testing
๏ Monitoring
๏ Instrumentation
๏ No pull requests w/o instrumentation
๏ No pull requests w/o monitoring
๏ Build dashboards
Eric Lubow @elubow #ddsea15
DASHBOARD #ALLTHETHINGS
Eric Lubow @elubow #ddsea15
WHAT SHOULD I MONITOR/INSTRUMENT?
๏ Frequency
๏ Error rates
๏ Success rates
๏ Request Volume
๏ Message Counts
Eric Lubow @elubow #ddsea15
HOW DO I MAINTAIN/SCALE/BUILD?
๏ Already discussed monitoring/instrumentation
๏ Making sure you can maintain architecture is the same as ensuring you can
maintain code
๏ Have easy to use, flexible deployment systems
๏ Keep an audit trail
๏ Make processes repeatable and systematic
๏ Configuration management
๏ Automation (event based when possible)
๏ Easy enough to add and maintain but difficult to break
Eric Lubow @elubow #ddsea15
If you want to increase innovation, you need to lower the cost of failure.
Joi Ito, MIT Media Lab
Eric Lubow @elubow #ddsea15
WHAT JUST HAPPENED
๏ A little architecture knowledge is a good thing
๏ Don’t start out with complexity
๏ Build what you need with growth in mind
๏ Make sure you have the basics covered
๏ Might be something to the micro-service hype
๏ Monitor everything
๏ Allow customizations and innovations
Eric Lubow @elubow #ddsea15
QUESTIONS IN LIFE ARE GUARANTEED,
ANSWERS AREN’T.
Eric Lubow
@elubow
Data Day Seattle
#ddsea15
Recommended