Download pdf - If you build it, they will come or will they?

LinkedIn.com/in/LarryMaccherone

If you build it, they will come...... or will they?

Measuring Delivered ValueIs it an unachievable snipe hunt or can it really be done?


Shoeless Joe Jackson

"Is this heaven?...”

“No, it's Iowa...”

“Iowa? Is there a heaven?...”

“Oh yeah. It's the place where dreams come true."

~ Field of Dreams

2


Why not trust our gut?

• Surprising resultsWould you have guessed that the Nigerian scam artists whose opening emails have the WORST grammar make the MOST money?

• Human bias

• Confidence that you are not missing opportunities

Not new reasons but Lean Startup techniques are largely QUALITATIVE

What I’m talking about today is decidedly QUANTITATIVE

Larry MaccheroneLinkedIn.com/in/LarryMaccherone


Software Development Performance Index (SDPI) metrics

1. Productivity - Throughput

2. Predictability - Variation in Throughput

3. Responsiveness - Time in Process (TiP)

4. Quality - Defect density/aging


SDPI Research FindingsTeams with low WiP have up to:● 4x better Quality● 2x faster Time to market● But potentially 34% worse productivity

Stable teams result in up to:● 60% better Productivity● 40% better Predictability

Dedicated teams: Teams made up of people who only work on that one team have double the Productivity

Team size:● Smaller teams have better Productivity● Larger teams have better Quality

Iteration length● Teams using two-week iterations have

the best balanced performance● Longer iterations correlate with higher

Quality● Shorter iterations correlate with higher

Productivity and Responsiveness

Ratio of testers to developers:● More testers lead to better Quality● More testers lead to worse Productivity

and Responsiveness● Teams with no testers have:

● The best Productivity● Almost as good Quality ● But much wider variation in Quality

And more on Kanban vs Scrum, co-location, Big bang agile, Gender diversity, etc.


Software Development Performance Index (SDPI) metrics

1. Productivity - Throughput

2. Predictability - Variation in Throughput

3. Responsiveness - Time in Process (TiP)

4. Quality - Defect density/aging

Never got a chance to incorporate at Rally...

5. Customer/stakeholder satisfaction – NPS (Pendo)

6. Employee happiness/engagement (AgilityHealth)

7. Build-the-right-thing - ??? (Pendo)

8. Code quality - Static analysis (Comcast)


Approach to SDPI

research


Types of signalsOutcome signal• Evidence of an outcome you are trying to effect• SDPI Examples: Throughput, Defect density, Time in

Process (TiP)• Plotted on the y-axis

Lever signal• Evidence of some decision you have made or

influenced• SDPI Examples: Simultaneous WiP, Ratio of testers to

developers • Plotted on the x-axis


Correlation of lever signals to outcome signals


Nuance - Controlling for other possible causes


Nuance - Overlap in spread (among other nuances)


Approach to “build-the-right-thing”

research


Three capabilities:

1. Analytics on visitor dataIf you are a SaaS software vendor, you would buy Pendo to gain insight on how visitors use your product

2. GuidesIn app education for your visitors on how to use your app. Think dialog bubbles

3. PollsSurvey your visitors. Net Promotor Score (NPS)?, Did you like this feature? What’s your role? Target a particular segment (visitor or account cohort)


A little more context1. Captures all browser events with no explicit coding

Enables analysis going back in time as soon as events are “tagged”Enables research on even untagged features

2. Powerful segmentation

3. Three critical product manager capabilities w/ only the cost of one product and only one browser agent integration (cost, performance, resource consumption, etc.)

4. Combines the use of analytics with guidesExample: As a product manager, I want to interview visitors that have used feature X at least 5 times to get their feedback. A guide can use that metric and prompt only those visitors to sign up for an interview slot.


Data model (assumes SaaS B2B application)

• Account – Another business paying you (or in trial) for use of your SaaS productExample metric: % of last 30 days that account used your product

• Visitor – A user at an accountExample metric: Weekly/monthly visitor churn

• Page – A page in your appExample metric: % of users that utilize this page

• Feature – A feature in your appExample metric: % of users that utilize this feature


Confusing Meta Meta

• Pendo uses Pendo to measure its own accounts/visitors

• My research started with this data and that drives most of the examples in this talkThis can be very confusing because the same word will show up twice. Example: The “feature” in Pendo that allows our customers to look at data related to a “feature” in their app. So the “feature feature” for Pendo’s use of Pendo.

• However, the primary goal is to offer these capabilities to Pendo customers


Research data

• Terrabytes of data

• ”Tagged” to page, feature, visitor, and account

Not yet considered but have potential access to:

• Personas of visitor or account

• Net Promotor Score (NPS) survey results


“Build-the-right-thing” signals


Outcome signals

• Churn vs Renew (requires a full renewal cycle)

• Win vs Loss (can gather much more rapidly)

• (later) Net promoter score (NPS)


Win/Loss and Renew/Churn outcome signals


Criteria for good outcome signals

• Can be extracted without human input using simple heuristics:• 15 consecutive months* (configurable) of usage => Renewed,

9 months* of use followed by no use => Churn

• First 4 months* of use => Win,4 month* since first use and most recent month no use => Loss

• May get better with human input

* Month = 30 days


“Lever” (input) Signals


Lots of alternatives, but which are strong signals?


Lever signal strength

A weak/moderate signal A strong signal


Signal strength analysis


Results for Pendo (Win/Loss)

Strong levers for Win/Loss1. Weekly new visitors

2. Use of the “feature” feature

3. Use of the “page” feature4. Days active

Weak levers for Win/Loss1. All features except “pages” and “features”

Could be much stronger if I can control for visitor persona and/or account type.Suspect there are a few key features for each persona/account type that would be highly predictive

2. Use of the “guide” featureMatches Pendo sales experience

3. Visitor change (churn)


Results for Pendo (Renew/Churn)

Strong levers for Renew/Churn1. Weekly new visitors

#1 strongest for Win/Loss

2. Event rate#5 strongest for Win/Loss

3. Time on site#6 strongest for Win/Loss

Weak levers for Renew/Churn1. Use of “page” feature

#5 weakest for Win/Loss

2. Use of “feature” feature#6 weakest for Win/Loss

3. Visitor change#3 weakest for Win/Loss


Combining signals withBayesian regression


Bayesian regression

• Think spam filters

• Builds conclusions from lots of little bits of evidence

• Deals well with missing evidence

• Intrinsically provides a confidence factor


Bayesian regression


Bayesian regression results for Pendo• Can predict Win/Loss 90%+

• Can predict Renew/Churn 93%+

• Accurately targets “on-the-bubble” accounts to nudge them one way


Takeaways


Takeaways for Pendo

1. Growing users in an account the best predictor for both Win/Loss AND Renew/Churn

2. Visitor change (churn) generally bad predictorMay be a Pendo-only phenomenon because Pendo does not charge by visitor

3. Confirms qualitative suspicion that the “guides” feature is not important for the sale, but is important for the renewal

4. No “key” low-level features foundShould change with visitor/account personas cohort analysis

5. Event rate might be a proxy for account size and/or financial healthSuspect trend would be valuable here but haven’t tried yet


Up next to address

• How much of this holds true when we look at Pendo’s customers customers?

• Some personas (visitor or account) only use some featuresWill try clustering of these entities

• Some features are only meant to be used once a monthWill seek to add this feature meta-data to the analysis

• Trends often matter more than level. Sometimes it’s level AND trendWill try to add trend analysis


Next two sections are “sidebars”Skip if there is no time


Non-parametric modeling


Are either of these normally distributed? Weibull?


Non-parametric modeling - Pros

• Deals elegantly with non-normal, non-Weibull, etc. distributionsHandles fat tails, bi-modal, tri-modal, etc.

• No need for a human to identify the distributionHumans get it wrong or often just default to something (usually Normal) rather than do the analysis to pick the right one

• No need to pick histogram bucketing approachConstant width (percentiles) works great almost all of the time


Non-parametric modeling - Cons• It’s not sophisticated

Just about every data science competition is won by a “simple” Bayesian network algorithm

• It requires a lot of computing powerSo what? We have it

• It’s not what all the stats and data science teachers teachInertia of status quo is hard to overcome

• Requires a bit more data


SDPI researh extensions withAgilityHealth


AgilityHealth Radar

• 75 Question Survey

• Taken in a coaching-oriented facilitationway. Each section is explained before thesection’s questions are answered.

• Facilitated by highly-trained agile experts

• Targeted at self-assessment and improvement, not comparative benchmarking


Analyzing AgilityHealth survey data

• Even more closely related to my SDPI work

• Should be even more interesting if/when we can correlate with SDPI data

• Promising research but with it’s own set of issues and confounds

• Those surveyed often give middle of the road responses (Casper Milquetoast effect)Even worse when survey results for all team members are averaged

• How to sense if there is survey fatigue and account for it? (75 questions)May be partially or wholly mitigated by coaching/facilitation during the survey

• Self-assessment confounds? Do late novices rate themselves higher than late intermediates because they don’t yet know what they don’t know?

• Hard to tease outcome signals from lever signals


Preliminary results

Best levers for team performance1. Team confidence and skills

2. Short-term plan

3. Backlog management

4. Clarity of roles

5. Leadership (summarized)

(actually 6 tied for 5th including Tech leadership, PO leadership, PO engagement, Servant leadership, Self organization, and Vision & purpose)

Worst levers for team performance1. Sustainable pace

Do “harder working” teams perform better?2. Physical environment and “tools”

Agile manifesto correct?3. Team allocation and stability

Why does this contradict SDPI research?It also contradicts AgilityHealth experience

4. RoadmapMore important for portfolio? Not team

5. Manager role in process improvementWhat @ process improvement in general?


People will come

‘People will come for reasons they can't even fathom. They'll arrive at your door and you’ll say "Of course, we won't mind if you look around. It's only $20 per person". They'll pass over the money without even thinking about it. And they'll watch the game and it'll be as if they dipped themselves in magic waters.’

~ Field of Dreams

46


They may come but…

‘News Corp tried to guide MySpace, to add planning, and to use “professional management” to determine the business’s future. That was fatally flawed when competing with Facebook which was managed lettting the marketplace decide where the business should go.’

~ Forbes

47


Thanks and questions