Mircoservices, dev ops and Engineering best practices at Wix.com

Preview:

Citation preview

@aviranm

Aviran MordoHead of

Microservices, DevOps & Engineering at Wix.com

www.linkedin.com/in/aviran

@aviranm

http://www.aviransplace.com

@aviranm

@aviranm

Wix In NumbersOver 87M users

Static storage is >2Pb of data

3 data centers + 2 clouds (Google, Amazon)

2B HTTP requests/day

1200 people work at Wix

@aviranm

Over 200 Microservices on Production

@aviranm

Microservices - What Does it Take

@aviranm

How to Get There? (Wix’s journey)

http://gpstrackit.com/wp-content/uploads/2013/11/VanishingPointwRoadSigns.jpg

@aviranmhttp://p1.pichost.me/i/11/1339236.jpg

In the Beginning of Time

About 6 years ago

@aviranm

The Monolithic GiantOne monolithic server that handled everything

Dependency between features

Changes in unrelated areas caused deployment of the whole system

Failure in unrelated areas will cause system wide downtimeLighttpd

(file serving) MySQLDB

Wix(Tomcat)

@aviranm

Breaking the System Apart

https://upload.wikimedia.org/wikipedia/commons/6/67/Broken_glass.jpg

@aviranm

@aviranm

Concerns and SLA

Many feature request

Lower performance requirement

Lower availability requirement

Write intensive

Edit websites

Not many product changes

High performance

High availability

Read intensive

View sites, created by Wix editor

@aviranm

Mono-Wix

Phase 1

@aviranm

Extract Public Service

Editor service (Mono-Wix) Public service

@aviranm

Divide and Conquer

Editor service Public service

Guideline: No runtime, deployment or data dependency

@aviranm

Separation by Product LifecycleDecouple architecture => Decouple teams

Deployment independence

Areas with frequent changes

Editor service Public service

@aviranm

Separation by Service LevelScale independently Use different data storeOptimize data per use case (Read vs Write)Run on different datacenters / clouds / zonesSystem resiliency (degradation of service vs. downtime)Faster recovery time

Editor service Public service

@aviranmhttp://blogs.adobe.com/captivate/2011/03/training-adding-interactivity-to-elearning-courses-with-adobe-captivate-5.html/time-to-learn-clock

@aviranm

Service Boundary

@aviranm

Separation of DatabasesCopy data between segments

Optimize data per use case (read vs. write intensive)

Different data stores

Public serviceEditor serviceCopy necessary data

@aviranm

Serialization

@aviranm

Serialization / ProtocolBinary?JSON / XML / Text?HTTP?

Public serviceEditor service ?

@aviranm

Serialization / Protocol - TradeoffsReadability?

Performance?Debug?Tools?Monitoring?Dependency?

Public serviceEditor service ?

@aviranm

API Transport/Protocol

@aviranm

How to Expose an APIREST?RPC?SOAP?

Public serviceEditor service ?

@aviranm

Wix’s Choices RESTHTTP

Public serviceEditor service

BinaryJSON-RPCHTTP

@aviranm

API Versioning

@aviranm

API Versioning

Public serviceEditor serviceBackward compatibility

Maybe here

API Schema /v1/v2YAGNI

@aviranm

Service Discovery

@aviranm

Service Discovery

Public serviceEditor serviceConfiguration (DNS+LB)

Zookeeper?Consul?Etcd?Eureka?

YAGNI

@aviranm

Resilience

@aviranm

What does the Arrow Mean?

Public serviceEditor service

Failure Point

Every Network Hop Reduces Availability

@aviranm

Failure Points = Network I/O

Public serviceEditor service

Failure Point

Retry policyCircuit breakerThrottlers

Be careful – you may cause downtime

Retry only on idempotent operations

@aviranm

Degradation of Service

Public serviceEditor service

Failure Point

Feature killer (Killer feature)FallbacksSelf healing

@aviranm

Testing

@aviranm

Test a Distributed System (at Wix)

Public serviceEditor service

Unit TestIntegration TestServer E2EAutomation

Client

@aviranm

Distributed Logging

@aviranm

Build visibility into service

@aviranm

What is the Right Size of a Microservice?

@aviranm

The Size of a Microservice is the Size of the Team That is

Building it.

“Organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations”Conway, Melvin

@aviranm

How many Engineers can Work on a Single Project ?

http://cdn.wp.sunmotors.co.uk/get/2014/03/cars.28.620x413.jpg

@aviranm

Microservices = Small TeamsSmall Teams = Small Rooms

@aviranm

@aviranm

Ownership

@aviranm

Team Work

Microservice is owned by a team

You build it – you run it

No microservice is left without a clear owner

Microservice is NOT a library – it is a live production system

@aviranm

Culture that ROQS

ROQS

esponsibility

wnership

uality

haring@aviranm

@aviranm

What Is The Common Denominator? Product manager

Project manager

QA

Operations

DBA

Develperscan dothese jobs

@aviranm

@aviranm

Developer

Product

QA

Management

Operation

BI

Dev Centric Culture – Involve The Developer Product definition (with product)

Development

Testing (with QA engineers)

Deployment / Rollback (with operations)

Monitoring / BI (with BI team)

DevOps – to enable deployment and rollback, fully automated

Support Circle

@aviranm

What did you Learn from Just 2 Services

● Service boundary● Monitoring infrastructure● Serialization format● Synchronous communication protocol (HTTP/Binary)● Asynchronous (queuing infra)● Service SLA● API definition (REST/ RPC / Versioning)● Data separation● Deployment strategy● Testing infrastructure (integration test, e2e test)● Compatibility (backwards / forward)

@aviranm

Continue to Extract More Microservices

@aviranm

HTML Editor

Flash Editor

MSM

Private Media

Public Media

Editor Segment Public Segment

Premium Services

List DB

App Builder

App Store

App Market

Dashboard

Mailer

TimeZone

Public HTML API

Public API (Flash)

MSP

Public Server

HTML Renderer

HTML SEO Renderer

Flash Renderer

Flash SEO Renderer

Sitemap Renderer

Robots.txt Renderer

User Server

Template Viewer

ContactsHUBActivity

Site Members

Store MgrComments

Snapshoter

User Pref

Feed Me

Shout-out Hotels

PETRI

Site Pref

Dist LoggerSlicer

eCom Renderer

eCom Cart

eCom Checkout

eCom Catalog

eCom Orders

Payment Facade

Account Info

HTML API

HTML Embeder

BlogMobile

Mostly writes

2 Data centers

Db active-standby (preferably active-active)

Performance < 2s 99%

Serves mostly site builders

Uptime > 99.9

Mostly reads

>2 Data centers

Db active-active(-active)

Performance < 500ms 99%

Serves mostly site viewers

Uptime > 99.99

@aviranm

When to Extract a New Microservice

@aviranm

Microservice or Library?

Do I create deployment dependency?

What is DevOps overhead (managing middleware) ?

Who owns it?

Does it have its own development lifecycle?

Does it fit the scalability / availability concerns?

Can a different team develop it?

I need time zone from an IP address

@aviranm

Microservice has Ops, Library is Only Computational

@aviranm

Which Technology Stack to Use

@aviranm

Free to Chose?Microservices gives the freedom to use a different technology stacks.Enables innovation

@aviranm

Default to the Stack You Know how to Operate.

@aviranm

Innovate on Non Critical Microservices and Take Full Responsibility for its Operation.

@aviranmhttp://wallpaperbeta.com/dogs_kiss_noses_animals_hd-wallpaper-242054/

@aviranm

Why Microservices

Scale engineering

Development Velocity

Scale system ?

@aviranm

Microservices is the First Post DevOps Architecture

@aviranm

Every Microservice is a Overhead

@aviranm

Wix’s Company Model

Company

Guild

Guild

Company

Gang GangGang Gang

Product Product

Cross-Engineering Teams

A Guild is a group of people that share expertise, knowledge, tools, code and practice

Company focus on large segmentHas all the resources it needs to be independentPeople within the company are aligned with the Guilds

@aviranm

Production State Changes Every 5 min

@aviranm @aviranm

@aviranm @aviranm

Quality = Better Engineers

Better Engineers = Professional Growth

Professional Growth = Investment in People

@aviranm

Guild Day

Engineers work 4 days with their company

Thursday is Guild day.

Developers conduct quality enhancing activities with the Guild.

@aviranm

Guild Day Goals Builds cross-team relationships

Shares knowledge

Assimilate the culture

Lesson learned

Continuous improvement

Promotes innovation

Professional development@aviranm

@aviranm

Guild Day Schedule 10:00-11:00 Open Space11:00-11:15 Break11:15-11:40 Project spotlight11:45-13:00 Tech talk or Workshop

@aviranm

@aviranm

Open Space

@aviranm

Post MortemAny production downtime is investigatedUnderstand the causeLesson learnedAI to avoid same issue in the futureFindings are published in dapulse (public forum)

@aviranm

Guild Week – Games of GangsOne week each quarterPair programming with a person from another company

Enhancing infrastructureBuilding toolsHelping companiesWork on open sourceGoal #1 – Improve engineering skills and

quality

@aviranm

FastFeatures

BetterQuality

@aviranm

Thank You

@aviranm

Q&A@aviranm

http://www.aviransplace.com

www.linkedin.com/in/aviran

Aviran MordoHead of Engineering

http://engineering.wix.com

@WixEng