Upload
aviran-mordo
View
1.525
Download
2
Tags:
Embed Size (px)
DESCRIPTION
How Wix is doing continuous delivery and our Dev-Centric culture to support that
Citation preview
Wix Dev-Centric Culture
Aviran MordoHead Of Back-End Engineering @ Wix
@aviranmhttp://www.linkedin.com/in/aviran
10:07
10:07
Wix In Numbers
• Over 45,000,000 users
– >1M new users/month
• Static storage is >800TB of data
– >1.5TB new files/day
• 3 Data centers + 2 Clouds (Google, Amazon)
– ~300 servers
• >700M HTTP requests/day
• ~600 people work at Wix
– Of which ~ 200 in R&D
Traditional Dev Pipeline
Product Dev QA Operations
10:07
10:07
10:07
Product Dev QA Operations
10:07
Long development cycleTime waste (Wait)Late feedbackHard to fix1-2 Releases a year
Waterfall
SCRUM
10:07
10:07
Lean
Agile
SCRUM
XP
SCRUM != Agile
Jan 2014
1250Deployments (production changes) per month
Double the velocity from last year
Every 9 minutes
production changes its
state (during working hours)
Do You Have The Guts To Deploy 60
Times A Day?10:07
10:07
Lets Go Back In Time
Where We Were• We were working traditional waterfall• With fear of change
– It is working, why touch it?– Uploading a release means downtime and bugs!
• With low product quality• With slow development velocity• With tradition enterprise development lifecycle
– Three months of a “VERSION” development and QA– Six months of crisis mode cleaning bugs and stabilizing
system
10:07
10:07
Taiichi Ohno
Lean Product development
“Top 5 Most-Used Commands in Microsoft Word • Paste• Save• Copy• Undo• Bold
These five commands account for around 32% of the total command use in Word.
Paste itself accounts for more than 11% of all commands used, and has more than twice as much usage as the #2 entry on the list, Save.
Beyond the top 10 commands, the curve flattens out considerably. The percentage difference in usage between the #100 command ("Accept Change") and the #400 command ("Reset Picture") is about the same in difference between #1 and #11 ("Change Font Size") “
Scaling challenges – Product
• Product Minimum Viable Product (MVP)– Does MVP meet your product
standards?• What about tooltip, help,first time
ux, etc.. ? – How to define a product that can
be developed in a day ?– And that can win in a/b test …
To Be Implemented
Get out of thought land
• The law of failure– Most new “its” will fail even if they are flawlessly
executed
• Invest less, in-touch less , better ability to admit it fail– Data beats opinions - let the customer decide
make sure you building the right it before build it right
Quick Feedbac
k
Continuous Delivery
10:07
Risk
• Waterfall - minimize number of deployments
• CD - minimize number of changes and impact in $$
10:07
Risk = #deployments * chance of something going wrong (~ number of changes) * impact of something wrong in $$
Small Development Iterations
• No Waterfall• No Scrum• No Iterations• No long documents• Build something small• When it is ready, deploy it
– Measure it– Then fix it– Again– And again, until Dev, Product and Customers
are happy
• Then start changing it– Again, as a small change
Product/Dev/QA/Ops boundaries are going down
What Is The Common Denominator?
• Product manager• Project manager• QA• Operations• DBA
Developers can do these jobs
CD is culture & mindset
• Trust the developers– Empower developers to change production– Developer knows his system best
• Automation as a default choice – no more “is it worth to automate ? ”– Everything should be automated
• Welcome to the twilight zone – Product/Dev/QA boundaries are going down– Everyone need to care about everything – Less formality : Corridor - IN , Meeting Room -
Out
Dev Centric Culture – Involve The Developer
• Product definition (with product)
• Development (with architect)• Testing (with QA developers)• Deployment / Rollback(with
operations) • Monitoring / BI (with BI team)• DevOps – to enable deployment
and rollback, fully automated
Developer
Product
QA
Management
Operation
BI
Continuous Delivery – Key points
• Abandon the “VERSION” paradigm – move to a feature centric methodology
• Make small and frequent release as soon as possible
• Automate everything – TDD/CI/CD• Measure everything
– A/B test every new feature– Monitor real KPIs (business, not CPU)
• Deploy without downtime
10:07
Test Driven Development
• No new code is pushed to Git without being fully tested– We currently have around 10,000 automated tests
• Before fixing a bug first write a test to reproduce the bug
• Cover legacy (untested) systems with Integration tests
10:07
What people think of TDD
• TDD slows down development
• With TDD we write more code (product + test
code).
• TDD has no significant impact on quality
10:07
What people think of TDD
• TDD slows down development
• With TDD we write more code (product + test
code).
• TDD has no significant impact on quality
10:07
TDD Actual impact on development
• We develop products faster• Removes fear of change• Easier to enter some one else’s project• Do we still need QA? (Yes, they code automation
tests)– We don’t have QA for back-end applications
• Writing a feature is 10-30% slower, 45-90% less bugs
• 50% faster to reach production.• Considerably less time to fix bugs
10:07
Refactoring
10:07
Is Refactoring Rework?Absolutely NOT !
• Refactoring is the outcome of learning
• Refactoring is the cornerstone of improvement
• Refactoring builds the capacity to change
• Refactoring doesn’t cost, it pays
10:07
Refactoring
• Refactor from inside out– Small iterations with
tests– Refactor small methods
- make sure the tests don’t break
– Deploy often
• Re-write from the outside in– Write from scratch (one
piece at a time)– Code duplication
sometimes needed (temporary)
– Protected by Feature Toggle
10:07
Before refactoring make sure everything is covered with tests- Legacy code usually covered by IT tests
Feature Toggles
10:07
Code branch
10:07
New Code Old Code
FT Opene
d
Yes No
Usage example
Simple “if” statement in your code
10:07
Feature Toggles
• Everyone develops on the Trunk
• Every piece of code can get to production at anytime
10:07
Feature Toggle to the rescue
• Unused new code can go to production – no harm done
• Operational new code goes with a guard – use new or old code by feature toggle
10:07
10:07
DB Schema Changes Without Downtime
• Adding columns– Use another table link by primary key– Use blob field for schema flexibility
• Removing fields– Stop using. Do not do any DB schema
changes
10:07
New DB schema with data migration
• Plan a lazy migration path controlled by feature toggle
1. Write to old / Read from old2. Write to both / Read from old 3. Write to both / Read from new, fallback to old
• Backward compatibility is a must4. Write to new / Read from new, fallback to old5. Eagerly migrate data in the background6. Write to new / Read from new
10:07
Feature Toggle Strategies (gradual expose users)
• Company employees• Specific users or group of users• Percentage of traffic• By GEO • By Language• By user-agent• User Profile based• By context (site id or some kind of hash on
site id)
10:07
Feature Toggle Override• By specific server
– Used to test system load– New database flows/migration– Refactoring that may affect performance and memory usage
• By Url parameter– Enable internal testing– Product acceptance– Faking GEO
• By FT cookie value– Testing– When working with API on a single page application
10:07
10:07
Wix PETRI
A/B Tests
10:07
A/B Test
• Every new feature is A/B tested• We open the new feature to a % of users
– Define KPIs to check if the new feature is better or worse
– If it is better, we keep it– If worse, we check why and improve– If we find flaws, the impact is just for % of our users
(kind of Feature Toggle)
10:07
An interesting site effect on product
• How many times did you have the conversion “what is better”?– Put the menu on top / on the side
• Well, how about building both and A/B Testing?
10:07
Marking users with toss value in a cookie
• Anonymous user– Toss is randomly determined– Can not guarantee persistent experience if changing browser
• Registered User– Toss is determined by the user ID– Guarantee toss persistency across browsers– Allows setting additional tossing criteria (for example new
users only)– Only use this for sections that a user has to be authenticated
10:07
• Do not mix anonymous and registered tests
• AB test parentage of users with optional filters– New Users Only (Registered users only)– By language – By GEO– By Browser – user-agent – OS– Any other criteria you have on your users
10:07
A/B Test Features
• A/B Test Override– Allows to set a value of a test for validation– Helps support experience what users experiencing
• Override methods– Via URL parameter– Via cookie
• Start/Stop Test• Pause tests• Bots always get “A”
10:07
10:07
NOT !!!
Gradual Deployment
10:07
• Assume two components
• We shutdown one and install on it the new version. It is not active yet
• Do self test• Activate the new server it is passes self test
• Continue deploying the other servers, a few at a time, checking each one with self test
A 1.1 B 1.1
A 1.1
B 1.2
A 1.1A 1.1 B 1.1B 1.1
A 1.1A 1.1
B 1.1
B 1.2
A 1.1
B 1.2
A 1.1A 1.1
B 1.1
B 1.2
A 1.1 B 1.1A 1.1A 1.1 B 1.1B 1.2
Self Test / Post Deployment TestAfter each server deployment run a self test before deploying the next server.
• Checking server configuration and topology– Make sure database is accessible (DB connection string)– Is the schema the one I expect– Access required local resources (data files, other config files,
templates, etc’)– Access remote resources– RPC / REST endpoints reachable and operational
• Server will refuse requests unless it passes the self test• Allow a way to skip self test (and continue deployment)
10:07
Tools - App-info – Self Test
Backward and Forward compatible
• Assume two components
• We release a new version of one
• Now Rollback the other…
10:07
A 1.1B 1.2
A 1.2
B 1.1A 1.1A 1.1
B 1.1
B 1.2
A 1.2A 1.1
B 1.1B 1.1
A 1.1 B 1.1A 1.1A 1.1 B 1.1B 1.1
A 1.0
A 1.2A 1.1 B 1.2B 1.1
B 1.2 A 1.2
A 1.2A 1.1 B 1.2B 1.1
B 1.0
A story on Wix time machine
Time machine event =
• Deployment capabilities : “no click” deployment – Dozens of services , 130+ servers over 3 Data Centers
• Backward and forward compatibility at the extreme field test case
– Mixed versions of services / DB with no service downtime
• Empowerment – The power we give to individual
• Risk taken and failure embracement
CD – prepare to invest…..
• Dev infrastructure - Refactor , Refactor, Refactor• Testing infrastructure & know how• Deployment infrastructure & tools
• Automation , Automation , Automation
• Monitoring (business and technical)– hundreds of aspects – thresholds use is a Must– Monitor business KPIs– Internal & external – Endless Tuning & learning
How does it work – CD Practices
• Test driven development• Small Development Iterations• Backwards and Forwards compatible• Gradual Deployment & Self-Test• Feature Toggle• A/B Testing• Exception Classification• Production visibility
10:07
Tools - App-info - Dashboard
Tools - App-info – Running Experiments
Tools – Monitoring - New Relic
Tools – Frying Pan
Tools – Lifecycle To Rule Them All
Where are we today?• We have re-written our flash editor product as an HTML 5
editor– In just 4 months
• Introduced Wix 3rd party applications (developers API)– In just 6 weeks
• We are easily replacing significant parts of our infrastructure
• And we are doing ~50 releases a day!• Production state changes every 9 minutes.
10:07
Aviran Mordo
@aviranmhttp://www.linkedin.com/in/aviran
http://www.aviransplace.com
10:07
Read more: The Road To Continuous Delivery: http://goo.gl/K6zEK
Dev-Centric Culture: http://goo.gl/0Vo70t