Upload
techwellpresentations
View
57
Download
2
Tags:
Embed Size (px)
Citation preview
BT14 Concurrent Session 11/13/2014 3:00 PM
"Data Driven Software Testing: The New, Lean Approach"
Presented by:
Ken Johnston Microsoft
Brought to you by:
340 Corporate Way, Suite 300, Orange Park, FL 32073 888‐268‐8770 ∙ 904‐278‐0524 ∙ [email protected] ∙ www.sqe.com
Ken Johnston is the principal engineering manager for the Microsoft Operating Systems Data Integration and Insights team. Since joining Microsoft in 1998, Ken has filled many other roles, including test lead, test manager, and group program manager. Recently he has worked on Bing Data Quality and Measurements, Cosmos (the Microsoft big data platform), and the Windows Apps Store. For two and a half years, Ken served as the Microsoft director of test excellence. He is a frequent speaker, blogger, and author on big data, software testing, and online services development. Contact Ken on Twitter @rkjohnston.
11/25/2014
1
Data Driven SoftwareData Driven Software Testing QualityThe Lean Approach to not Testing
Ken Johnston Principal Data Science ManagerTwitter – @rkjohnstonBlog – http://blogs.msdn.com/kenjEmail – [email protected] - http://linkedin.com/in/rkjohnston
@rkjohnston #BSCADC
About Ken
Data Scientist in Data Driven Outcomes (D2O)
Office Live, WebApps, Office Online
Cosmos, AutoPilot, Local, ShoppingOffice Online Shopping
Next is Data Driven Quality
EaaSy –Everything as a Service “yes!”
Write Books and Blog on Occasion
11/25/2014
2
This is a talk about Change, Big Change
No Test Plans• No Test Plans• Fewer Test Cases• Less Test
Automation• Releasing with
Lower InitialQuality
About this PresentationBig Data + Agile are
i iMinimum
i bl
Analysis and Insights
Driving Change
Data Driven Quality Framework
Viable Quality
Designing for DDQ and Mitigating Risk
Taking Action “Big Data” Search Trends
@rkjohnston #BSCADC
11/25/2014
3
Big Data and Agile
The coming changes are being driven by g g g yBig Data and the rate of product release
@rkjohnston #BSCADC
Massive amounts of Internal Data
Engineering Data But Opinion Still ReignTest pass/fail resultsBug countsCode ComplexityCode CoverageCode Churn
• The HiPPO
• Highest Paid Person’s Opinion
PerformanceReliability
@rkjohnston #BSCADC
11/25/2014
4
Big Data Insights are Real
Improving Confidence Intervals@rkjohnston #BSCADC
Here’s a Classic Story
Cocoa butter lotionA large purseZinc and magnesium supplementsA bright blue rug
What would you do with this information?
But what if the expectant mother was an under aged minor living at home?
11/25/2014
5
Predictive Modeling is RealMicrosoft Bing Launches PredictionsPredictions Lab - https://www.prediction.microsoft.com/
2014 Elections with CNN -http://blogs.bing.com/search/2014/10/21/bing-and-ie-team-with-cnn-for-elections-2014/
D i Wi h h SDancing With the Stars -http://blogs.bing.com/search/2014/09/17/who-is-going-to-win-dancing-with-the-stars/
Uses Social, Search, and Betting Websites
Big Data and Quality is HappeningImproving IE Quality of Experience in a Dynamic Web
The Problem: Sites break, bad experience for our customers
• Huge engineering investment to stay on top of the ever changing g g g y p g gWeb
• Traditionally evaluated through manual testing, bug reporting, and escalations
• Public telemetry unstructured, very poor signal‐noise ratio
Solution: Get more data!
• “Report website problems”, new feature added in April servicing release
• Enables users to provide semi‐structured issue reports to Microsoft—URLs are structured, comments are freeform
• Preserve user Privacy (URLs and min data)
11/25/2014
6
Process and Normalize DataWe used a SARIMA model (Seasonal Autoregressive Integrated Moving Average):
Supports observed seasonality inSupports observed seasonality in weekday/weekend reports, and non‐stationary mean as volume of reports increases over time.
Introduction to Data Driven Quality Framework (DDQF)
11/25/2014
7
Traditional Testing DDQF Cycle• Test Planning • What could go Wrong• Test Cases and
Automation• Test passes
• Instrumentation
• Early Release• Test passes• Defect Management• Sign Off/Release
• Early Release• Data Analysis• Rolling Releases
Asking the Right Questions
• DDQF is an iterative cycle
• Roots in DMAIC
Data AcquisitionRelease
• Roots in DMAIC (Define, Measure, Analyze, Improve, Control)
• Less up front certainty and more iterative
AnalysisTake Action• Release is the key to managing risk. Release is more than Control, it is also the break.
@rkjohnston #BSCADC
11/25/2014
8
Asking the Right gQuestions
The Heart of the Tester:Ruthless Curiosity
Establishing the fundamental questions about quality is criticalabout quality is critical
11/25/2014
9
know• Mean Time Between Failures
(MTBF)• Launch Time• Performance Metrics• Hang Time• Service Up Time and Availability• Page Load Timeg
• Mean Time Between Failures (MTBF)
• Page Dwell Time• Sessions per UU
• Launch Time• Performance Metrics• Hang Time• Service Up Time and Availability• Page Load Time
p• Engagement/Usage Time• Feature Engagement
o Discoverabilityo Return rate
• Click Through Rate (CTR)H Ti
g• Hover Time• Quick Back
11/25/2014
10
To Measure Customer Product Satisfaction you need Scenarios
C I t t WiFi h t t ilWhat is a Scenario?A scenario is a clear and succinct
description of a specific experience and
customer benefit that the product is
designed to deliver.
Can I connect to a WiFi hotspot easily
• Discoverability
• Trust
• Negotiate the connection
Can I file my expense reportg
It is a finite set of product aspects which
can be measured and evaluated from the
customer’s perspective.
y p p
• Does the software know me and keep
my profile
• Did it save my default currency
@rkjohnston #BSCADC
Scenario at a GlanceThink about the User
Software Makes• What does the user need to
do• What do they want to do• How do alternative software
products do it• Can we take steps out of the
Software Makes Goal Achievable
User has a goal
Each Step is a
Scenario Time
Task Com
pleti • Can we take steps out of the way
• What would delight the User
ion
@rkjohnston #BSCADC
11/25/2014
11
Questions Drive InstrumentationInstrument your Code
Software Makes• Scenario Start• Key Steps
• Lost Loops
• Scenario Complete• Success Ratios
Software Makes Goal Achievable
User has a goal
Each Step is a
Scenario
Success Ratios• Time to Task Completion• Minutes of Usage
@rkjohnston #BSCADC
Data Acquisition
11/25/2014
12
Two types of data to acquireActive = syntheticPassive = organic
Active for services only?Cli t i th i th ?Client: is the service there?
Staged Data Acquisition - Netflix
1B API requests per dayper day
Canary Deployment
11/25/2014
13
Staged Data Acquisition - Facebook
DogfoodIn prod, no users (except internal ones)Some servers in ProductionWorld-wide deploymentFeature light-up
Staged Data Acquisition - OutlookFiltering and aggregation at clientBe kind to the client
Pipeline to collect and process dataMake it easy
Staged Data AcquisitionStaged Data AcquisitionFeature Crew
Outlook Team
MS Office Team Microsoft Customers
Scale Validation
11/25/2014
14
Staged Data AcquisitionService
Stage 4: Some
Product (client, on-prem server)
Stage 1: In prod, no users Stage 2: Dogfood Stage 3: Some
servers in prod
Stage 4: Some more servers in
prod
Stage 5 : World-wide prod
Deployment Validation Service Validation Scale Validation
Real-time service quality
( p )Stage 1: Partial or whole product
teamStage 2: Dogfood
Stage 3: Technology Adoption
Programs (TAP)
Stage 4: Some clients in
production
Stage 5: All Customers
11/25/2014
15
Feedback
Feedback
11/25/2014
16
Analysis and Insights
Good Data lets you ask Questions
Software Makes AnalysisSoftware Makes Goal Achievable
User has a goal
Each Step is a
Scenario
Analysis
measures
• Success and Failure Ratios• Are we good enough
• Is Customer Engagement up• Is time to task completion down• Did we get enough user
feedbackfeedback
@rkjohnston #BSCADC
11/25/2014
17
Huge ImpactProduction Data is Real Data
Power of Production Dataoduct o ata
Real usersMultiple environmentsReal users environments
End to endScale & geo‐diversity
Keep your eye on the targetThe goal is not to get a bulls eye every time
The goal is toThe goal is to get the data and Learn
11/25/2014
18
Taking Action
• Change the gSoftware
• Change the Experience
• Change theChange the expected outcome
11/25/2014
19
A/B testingControlled experimentationUsage data on different experiences
Combine into more complex scenariosscenariosHow did user get to shopping cart checkout?
@rkjohnston #BSCADC
We then Re-Evaluate
Software Makes AnalysisSoftware Makes Goal Achievable
User has a goal
Each Step is a
Scenario
Analysis
measures
• Success and Failure Ratios• Are we good enough
• Is Customer Engagement up• Is time to task completion down• Did we get enough user
feedbackfeedback• What should we Change Next
@rkjohnston #BSCADC
11/25/2014
20
Designing and Releasing for DDQ
The world of
How much we tested at Microsoft
Testing thenTypical Industry Level of Testing
possible testing
11/25/2014
21
Say NO to yBUFT
@rkjohnston #BSCADC
Possible TestingWaste, excessive automation and
excessive testing that does not find any meaningful bugs.
Rich instrumentation identifies remaining critical to fix bugs in the
shipped code. Minimum Viable Release Quality
MVQ for all users but still use a rolling release process. Fix final few critical bugs after release
Minimum Viable Quality
ng T
est I
nves
tmen
tMVQ focuses on minimizing up front testing
Rich telemetry from production shifts testing and validation into
Limited Release MVQ for sub-set of users. Beta Users, Enthusiasts, Flighting
Incr
easi
nproduction.
Under Tested – frequent rollbacks, limited user engagement, strong negative customer feedback, bad
press
Under Tested
@rkjohnston #BSCADC
11/25/2014
22
Speed is your friend because…
Code churn is Layer 1
Maximum point of instability is at end of milestone
Code Churn Example 1
cumulative
Imagine this as part of a larger multi-layered project
y
Layer 2
Layer 3Six week coding milestone
Layer 3
• Tightly coupled layers• Long stabilization phase• Complicated end-to-end integration
Sim-ship increases risk
@rkjohnston #BSCADC
11/25/2014
23
Code Churn Example 2 (Continuous Deployment)
Layer 1
Rapid release cadence (weekly or daily)
Max Risk is Production
Layer 1
Layer 2
Layer 3
• Risk per release decreases because of more incremental change
• You still must be careful of Risk within Production but…
• Total risk over time can be less with Layer 3• Total risk over time can be less with incremental change
Layer N
@rkjohnston #BSCADC
User SegmentationOrganizing Users by profile and Risk Tolerance
@rkjohnston #BSCADC
11/25/2014
24
User Segmentation Approaches
• Profile Based• Usage behaviors
• new vs. power users• Browser type• Connection Type• Device and Device OS
• Opted in• Users Segment themselves • Opting in indicates risk tolerance
@rkjohnston #BSCADC
Balancing Speed and Risk with Rings
Ring 4: EveryoneRisk Tolerance
No desire for risk
Ring 2 External Beta UsersRing 2: Company
& NDARing 1: My Team
g y
Ring 0: Buddy Build
Red Line demarks disclosure risk and possible loss of patent rights
Risk Tolerance is highest
@rkjohnston #BSCADC
11/25/2014
25
The AutoPilot Watchdog ModelServers have 3 states they can be in
• Healthyy• Failure mode• Probation
Watchdogs report on server healthRepair Service has 3 actions
• Kill and restart a failing serviceg• Re-boot the server• Re-image the server
@rkjohnston #BSCADC
11/25/2014
26
Watchdogs are not just for Services• Watchdogs built into apps• 3 states for an App3 states for an App
• Kill and restart app sending report• Re-boot the device and re-launch• Fail back to LKG
• Devices are trickier• Build logic into the device so that
if it loses connectivity it can selfif it loses connectivity it can self correct.
• Instead of factory settings why not auto fail back to LKG
@rkjohnston #BSCADC
Generic Service StackProduction
Traffic
Front door servers for loggingService UX Front Door
Service Auth/Identity
Layer A vCurrent
Default P
Front door servers for logging and access management
UX rendering layers
Identity or authentication layers
Layer B vCurrent
Service Layer C (Persistent Data Store)
Path Persistent data layers
@rkjohnston #BSCADC
11/25/2014
27
Runtime Flags Example 1Side-by-Side Deployments
Runtime FlagsProduction
TrafficTest or Forked
Traffic
Service UX Front Door
Service Auth/Identity
Runtime Flags• Flags direct traffic through the stack• Used to test vNext before full
release
Layer A vCurrent
Default
Runtime
Traffic Traffic
RuntimeLayer A vCurrent
Layer B vCurrent
Service Layer C (Persistent Data Store)
t Path
Runtime
Runtime
Layer B vNext
Runtime Flags Example 2N Test Environments
ProductionTraffic
Test Case
CheckinTests
Service UX Front Door
Service Auth/Identity
Layer A vCurrent
Traffic CaseTests
Default
RuntimeRuntime
Layer A DevBox Layer A vCurrent
Layer B vCurrent
Service Layer C (Persistent Data Store)
Path
Runtime
Runtime
Layer A DevBox
Layer B Test Cluster
11/25/2014
28
Apps as a Service: FacebookHow Facebook secretly redesigned its iPhone app with your help
a system for creating alternate versions within the…a system for creating alternate versions… within the native app. The team could then turn on certain new features for a subset of its users, directly,
…a system of "different types of Legos... and see the results on the server in real time "results on the server in real time.
From article on The Verge by Dieter Bohn September 18, 2013
@rkjohnston #BSCADC
That Was a lot of ContentBig Data + Agile are
i iMinimum
i bl
Analysis and Insights
Driving Change
Data Driven Quality Framework
Viable Quality
Designing for DDQ and Mitigating Risk
Taking Action “Big Data” Search Trends
@rkjohnston #BSCADC
11/25/2014
29
Date
Th k YThank You
@rkjohnston #BSCADC