184
Sponsored by the U.S. Department of Defense © 2003 by Carnegie Mellon University Version 1.0 page 1 Pittsburgh, PA 15213-3890 Carnegie Mellon Software Engineering Institute Data Analysis Dynamics Jeannine M. Siviy William A. Florac Software Engineering Institute

Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

Embed Size (px)

Citation preview

Page 1: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

Sponsored by the U.S. Department of Defense© 2003 by Carnegie Mellon University

Version 1.0 page 1

Pittsburgh, PA 15213-3890

Carnegie MellonSoftware Engineering Institute

Data Analysis Dynamics

Jeannine M. SiviyWilliam A. FloracSoftware Engineering Institute

Page 2: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 2

Carnegie MellonSoftware Engineering Institute

Tutorial Outline

Section I – Understanding Data• How to use data• Understanding variation• Requirements for success• Common risks and pitfalls

Section II – Data Analysis Dynamics• Learning from our experiences• Useful tips for making measurement work• Thread together methods, tools, processes

Page 3: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 3

Carnegie MellonSoftware Engineering Institute

Tutorial Outline

Section III – Case Study

Summary

Addenda• Additional vignettes• Tool tips

Page 4: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 4

Carnegie MellonSoftware Engineering Institute

Tutorial Focus

Tools, tips, and techniques your organization can use foranalyzing software data and taking action

Specifically we will focus on• day-to-day practices• activities that lead to breakthroughs• why the problem, not management, should drive your

measurement program

Remember:There is no “cookie-cutter” approach to doing goodmeasurement, but there are some best practices.

Page 5: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 5

Carnegie MellonSoftware Engineering Institute

Section I: Understanding Data

Data can help you• Identify root causes of variability, off-target performance• Better predict plans and commitments• Make better decisions and take action• Monitor activities to keep projects on cost, schedule

Data is the means to an end, not the end itself.

Page 6: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 6

Carnegie MellonSoftware Engineering Institute

Process Performance Data

Data allows you to access/analyze process performance.

Process performance is behavior that can be described ormeasured using attributes of• process operation or execution• resultant products or services

Process performance measures answer the question:“How is the process performing with respect to quality, quantity, effort (cost) and time?”

All process behavior is variable.

Page 7: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 7

Carnegie MellonSoftware Engineering Institute

Getting at the Cause of VariationShewhart divides variation into two types:

• Common Cause Variation

- variation in process performance due to normal orinherent interaction among process components(people, machines, material, environment, andmethods).

• Assignable Cause (Special Cause) Variation

- variation in process performance due to eventsthat are not part of the normal process.

- represents sudden or persistent abnormal changesto one or more of the process components.

Page 8: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 8

Carnegie MellonSoftware Engineering Institute

Everything is a process.

All processes have inherent variability.

Data is used to understand variation and to drivedecisions to improve the processes.

Understanding Variation

[ASQ 00], [ASA 01]

Original Mean

New mean after change(Spread due to common causevariation will re-establish itself.)

Special Cause Variation

Data Spread due toCommon Cause Variation

Page 9: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 9

Carnegie MellonSoftware Engineering Institute

In Other Words…

Target

USLLSL

CenterProcess

ReduceSpread

Target

USLLSL

Process Off Target

Defects

Target

USLLSL

Excessive Variation

Defects

Page 10: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 10

Carnegie MellonSoftware Engineering Institute

Measurement Data RequiresAnalysis and Interpretation

Data InterpretationAnalysis

Input Transformation Output

Separating signal from noise requires rigorous analysisprocedures.

This allows quantitatively based inferences to be drawnto guide decisions and actions.

Page 11: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 11

Carnegie MellonSoftware Engineering Institute

Data Analysis Studies

Remember what you are trying to accomplish. There aretwo approaches to data analysis to consider:

• Enumerative• aim is descriptive• determines “How many?” - not - “Why so many?”

• Analytic• aim is to predict or improve product attributes and/or

the behavior of the process in the future

Page 12: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 12

Carnegie MellonSoftware Engineering Institute

Enumerative Studies

An enumerative study answers questions such as:

• How many defects were found inspecting productcode?

• How many problem reports have customers filed?• What percent of staff have been trained in object-

oriented design methods?• How large were the last five products we delivered?• What were the average sizes of our code-inspection

teams last year?• How many staff hours were spent on software rework

last month?

Page 13: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 13

Carnegie MellonSoftware Engineering Institute

Analytic Studies

Software engineering examples of analytical studiesinclude:

• measuring software product attributes for the purposeof making changes to future products

• evaluating defect discovery profiles to identify focalareas for process improvement

• predicting schedules, costs, or operational reliability

• evaluating/comparing software tools, technologies, ormethods—for the purpose of selecting among them forfuture use

• stabilizing and improving software processes or toassess process capability

Page 14: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 14

Carnegie MellonSoftware Engineering Institute

Enumerative vs. Analytic

Undertake an enumerative study if: action is to be takenon the subject based on data that is already collected

Undertake an analytic study if: action is to be taken on theprocess that produced the data

Analytic studies utilize statistical processcontrol tools to draw inferences about future

process performance.

Page 15: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 15

Carnegie MellonSoftware Engineering Institute

Basic Data Analysis ParadigmProblem and goal statement (Y):• maximum latent defects released• minimum mean time between failure in the field• time to market improvement (as function of test time, defect density)

Define ControlAnalyze ActMeasure

• Discovery: paretos, histograms, distributions, c&e• Understanding: root cause, critical factors• Improvement: adjust critical factors, redesign• Performance: on target, with desired variation

Y = f(defect profile, yield)= f(review rate, method, complexity……)

• Problem & goalstatements

• Define boundaries• Process maps• “Management by Fact”

Page 16: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 16

Carnegie MellonSoftware Engineering Institute

Tips for Good Measures

Measures used to characterize products or processes• relate closely to the issue under study• have high information content• pass a reality and validity test• permit easy and economical collection of data• permit consistently collected, well-defined data• show measurable variation as a set• have diagnostic value

Define ControlAnalyze ActMeasure

[Wheeler 92]

Page 17: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 17

Carnegie MellonSoftware Engineering Institute

Tips for Better Data AnalysisVerified: accuracy of format, type, range, completeness,and type

Valid and Reliable: clear, consistent definitions

Accurate and Precise: precise counting method

Based on operational definitions, you should know• What does this measure mean?• What are the rules for assigning values?• What is the data recording procedure?

Define ControlAnalyze ActMeasure

Page 18: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 18

Carnegie MellonSoftware Engineering Institute

Tips for Operational Definitions 1Three criteria for creating operational definitions

• Communication - will others know precisely whatwas measured, how it was measured, and what wasincluded or excluded?

• Repeatability - could others, armed with thedefinition, repeat the measurements and getessentially the same results?

• Traceability - are the origins of the data identified interms of time, source, sequence, activity, product,status, environment, tools used, and collector.

Define ControlAnalyze ActMeasure

[Deming]

Page 19: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 19

Carnegie MellonSoftware Engineering Institute

Tips for Operational Definitions 2Operational definitions also help pinpoint training needsfor data collection.

The cost of data collection also bears on• When the data will be collected• Where the data will be collected• Who will collect the data

Define ControlAnalyze ActMeasure

Page 20: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 20

Carnegie MellonSoftware Engineering Institute

Tips for Better Data Analysis

Why should we care about the data details?

Validity - apples to apples comparisons

Reliability - understand the impact of variation

Accuracy - knowing that there is a signal

Precision - level of certainty for responding to the signal

Define ControlAnalyze ActMeasure

Page 21: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 21

Carnegie MellonSoftware Engineering Institute

Tips for Analyzing DataCritical inputs• Knowledge of product or process being measured• Driven by business/ technical issues or goals• Quality of measurement data

Critical aspects of the analysis process• Acknowledgement of and accounting for variation• Appropriate use of analysis tools and techniques• Resources and references (people, books)

Define ControlActMeasure Analyze

Page 22: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 22

Carnegie MellonSoftware Engineering Institute

Take the Data Deeper

To reduce variation pursue three investigative directions:

• Identify the assignable causes of instability and takesteps to prevent the causes from recurring.

• If the process is stable but not capable (not meetingorganizational or customer needs), then identify,design, and implement necessary changes that willmake the process capable.

• Continually improve the process, so that variability isreduced and quality, cost, or cycle time are improved.

Define ControlActMeasure Analyze

Page 23: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 23

Carnegie MellonSoftware Engineering Institute

Tips for Finding and CorrectingAssignable CausesNo formula or transformation algorithm is applicable. Justlike debugging software – it requires good detective work.

• thorough knowledge of the process• sufficient contextual data• re-check assumptions, interpretations, and data

accuracy• pick up on clues provided by behavior patterns• suspect everything• relate chart signals to process events and activities• check process compliance

Define ControlActMeasure Analyze

Page 24: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 24

Carnegie MellonSoftware Engineering Institute

Methods to Change the ProcessImproving a stable process requires making changes tocommon cause entities and variables. Selecting the rightchange involves examination of:

• process decomposition and evaluation• technology change• cause and effect relationships• products and by-products from other processes• business strategies and management policies

These factors may well be the drivers for changing theprocess!

Define ControlActMeasure Analyze

Page 25: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 25

Carnegie MellonSoftware Engineering Institute

Tips for Changing the ProcessAgree on process performance issues.• What needs improvement, why, and how much?

Select process performance variables, target means, andvariability.

Determine required changes to common cause entitiesand variables.

Select pilot process.

Execute and measure the changed process.

Compare new process performance data to historicalbaseline.

Make conclusions and recommendations.

Define ControlAnalyze ActMeasure

Page 26: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 26

Carnegie MellonSoftware Engineering Institute

Common Data Risks and Pitfalls

Analysis misses the “big picture”

Charts are colorful, but meaningless

Data set lacks robustness

No baseline for comparing current performance

Infrequent comprehensive rechecks of the data

Comparing or predicting process results without ensuringstability of processes

Page 27: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 27

Carnegie MellonSoftware Engineering Institute

A Process Improvement Toolkit

StatisticalControls:• Control Charts• Time Seriesmethods

Non-StatisticalControls:• Proceduraladherence• PerformanceMgmt• Preventiveactivities

Design ofExperimentsModeling

TolerancingRobust Design

SystemsThinkingDecision &Risk Analysis

7 Basic ToolsCause & EffectDiagrams, MatrixFailure Modes &Effects AnalysisStatisticalInferenceReliability AnalysisRoot CauseAnalysis4 Whats5 WhysHypothesis TestANOVA

Defect MetricsData CollectionMethodsSamplingTechniquesMeasurementSys. EvaluationQuality of Data

DefineBenchmarkBaselineContract/CharterKano ModelVoice of theCustomerVoice of theBusinessQuality FunctionDeploymentProcess FlowMapProjectManagement“Management byFact”

Measure Analyze Improve Control

Page 28: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 28

Carnegie MellonSoftware Engineering Institute

Section II: From Data to DecisionsThis concludes our introduction to understanding dataand getting the most use out of your analyses.

In Section II: Data Analysis Dynamics we will• share our experiences• provide useful tips for how to make measurement work• thread together methods, tools, processes• provide a path for action

Page 29: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 29

Carnegie MellonSoftware Engineering Institute

Analysis Dynamics 1

Getting Started

• Identify the goals

• Black box process view

• Is the data right?

• Do I have the right data?

Decision point:• If the data is not perfect,

do I move forward orobtain better data?

typicalstumblingpoint

typicalstumblingpoint

Page 30: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 30

Carnegie MellonSoftware Engineering Institute

Analysis Dynamics 2

Initial Evaluation• What should the data

look like?• What does the data

look like?• Can I characterize the

process and problem?

Decision point:• Can I address my goals

right now?• Or is additional analysis

necessary? at the same ordeeper level of detail?

• Can I move forward?

typicalstumblingpoint

typicalstumblingpoint

Page 31: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 31

Carnegie MellonSoftware Engineering Institute

Analysis Dynamics 3

Moving Forward• Further evaluation• Decompose the data• Decompose the process

Decision point:• Do I take action?• What action do I take?

Repeat until• root cause found• at target with desired

variation

typicalstumblingpoint

Page 32: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 32

Carnegie MellonSoftware Engineering Institute

Identify the Goals 1Goals should be continuously generated.

Without data, goals are stated at a conceptual level.

By quantifying performance• problems are characterized• true customer specifications are understood• quantitative goals statements can be made

Typical problems• goals do not exist or have not been explicitly stated• goals at different levels are disconnected

typicalstumblingpoint

Page 33: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 33

Carnegie MellonSoftware Engineering Institute

Identify the Goals 2Relevant tools and methods• Voice of the Client• Quality Function Deployment• Management by Fact• 4 Whats• SMART goals• FAST diagrams (Function Analysis Systems

Technique)

Page 34: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 34

Carnegie MellonSoftware Engineering Institute

Identify the Goals: ExampleCustomer Satisfaction

Track/chart fielddefects

Track/chart cost & scheduledeviation

Deliver high quality product

• other factors• survey or interview data

Plot, plot, plot:• trends• distributions• control charts (c-charts)• scatter plots

Plot, plot, plot:• trends• distributions• control charts (x-bar, r; x, mr)• scatter plots

otherfactors

Success Indicators,Management Indicators

Analysis Indicators,Progress Indicators

Analysis Indicators,Progress Indicators

SPI TaskPlans

Why?

How?

Page 35: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 35

Carnegie MellonSoftware Engineering Institute

What if there are no “BusinessGoals”?Without high-level business goals, data-drivenimprovement efforts quickly become fragmented.

Articulate business goals by• Brainstorming with leadership• Organizing results into strategic, operational goals

- add in any tactics that emerged during brainstorming• Performing hierarchical structure check

- “How?” answered top to bottom- “Why?” answered bottom to top

Verify that tactics drive impact and success.

Page 36: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 36

Carnegie MellonSoftware Engineering Institute

Black Box Process View

What are the key inputs and outputs to your process?What are key in-process variables over which you havecontrol?

Typical problems• Omitting this step - avoids examination of your

assumptions and understanding of the process• Selecting a view that matches the issue or study level• Constructing a view that does not match reality

Relevant tools & methods• Process Mapping• Mental Model

Page 37: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 37

Carnegie MellonSoftware Engineering Institute

What is a Process?• Any set of conditions or causes that work together to

produce a given result• A system of causes which includes people, materials,

energy, equipment, and procedures necessary to producea product or service

Products &Services

RequirementsIdeasTime

People Material Energy Equipment Procedures

Work activities

Page 38: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 38

Carnegie MellonSoftware Engineering Institute

Problem Management Process

OpenPTR’s

Test Lab

Developer

Screen/Resolve

Returned PTR’s

ClosedCanceled PTR’s

More Info

Review

Working PTR’s

Configuration Mgmt

IntegrateBuild/Drop

Developer’sTest Bed

VerifyPTR’s

Build RegressionTest

Distribute

Test Lab

ClosedFixedPTR’s

TestFixes

DevelopmentResolver

Closed

Invalid

PTRs

Repair rate

Test rate

Open rate

Valid rate

Queue size

ID, Desc

Page 39: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 39

Carnegie MellonSoftware Engineering Institute

Development Process Map

Code Compile UnitTestDesign

!!!! Requirements " Estimate !!!! Concept design

• Code• Data: Defects,

Fix time, DefectInjection Phase,Phase duration

• Detailed Design• Test cases• Complexity• Data: Design Review

defects, Fix time,Phase duration

• ExecutableCode

• Data: Defects,Fix time, DefectInjection Phase,Phase duration

• FunctionalCode

• Data: Defects,Fix time, DefectInjection Phase,Phase duration

!!!! Executable Code # Test Plan, Technique # Operational Profiles

!Resources" Code Stds" LOC counter! Interruptions

!!!! Code

Inspection

Rework

!!!! Critical Inputs! Noise

" Standard Procedure# Control Knobs

Page 40: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 40

Carnegie MellonSoftware Engineering Institute

Is the Data Right ?Understand the data source and the reliability of theprocess that created it.

Typical problems• wrong data• missing data• accuracy• veracity• credibility• skewed

Data transformations• ratios of bad data still equal bad data• increasing the number of decimal places does not

improve the data

typicalstumblingpoint

Page 41: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 41

Carnegie MellonSoftware Engineering Institute

Is the data right? - Example

Which set ofdata appearsto be morecredible?

Why?

# of People

Prepara

tion Effort

Size (SLOC)

5 3.7 20706 21.0 5556 5.1 1028 18.0 2606 12.0 1018 22.1 1656 11.8 17648 9.2 3485 7.3 768 16.5 15755 12.5 24416 18.3 1265 6.5 886 7.1 3838 10.2 1118 11.5 1926 5.2 2127 9.3 4017 8.8 8155 31.0 5515 4.9 4298 12.7 8839 30.3 10178 26.4 2116

# of People

Prepara

tion Effort

Size (SLOC)

4 2.0 3503 1.5 2103 2.0 3333 2.0 4303 2.0 4001 2.5 4004 3.0 4403 2.5 4503 3.5 4403 3.0 2553 2.8 4704 2.8 5003 1.5 2532 0.7 784 7.0 9003 3.5 4003 4.8 10143 1.5 1205 15.0 14954 4.0 2004 4.0 2004 4.0 2003 4.5 2004 4.0 200

Inspection Data Set 1 Inspection Data Set 2

Page 42: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 42

Carnegie MellonSoftware Engineering Institute

Do I Have the Right Data? 1Analyses can get off on the wrong track if the data ismisunderstood, or implicit assumptions are made about it.

Analyst must ask questions:• “Do I have measurements of all the significant and

relevant factors?• “Does this data represent what I think it does?”

Typical examples• total SLOC count in place of new/changed SLOC count• date recorded is often not the same as date observed• use of averages based on unstable processes (as in

normalization)

typicalstumblingpoint

Page 43: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 43

Carnegie MellonSoftware Engineering Institute

Do I Have the Right Data? 2Frequently the answers to these questions can not beanswered by a simple “eyeball” test, then an initialevaluation must be made using various tools andmethods.

Relevant tools & methods• Process Mapping• Goal-Driven Measurement templates• Operational definitions• Initial evaluation/exploration assessment using

statistical tools

Page 44: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 44

Carnegie MellonSoftware Engineering Institute

Initial Evaluation / Exploration 1What should the data look like?• first principles or relationships• mental model of process (refer to that black box)• what do we expect

What does the data look like?• Magnitude, range, and frequency• look at absolute and percentages• the shape of the curve

typicalstumblingpoint

Page 45: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 45

Carnegie MellonSoftware Engineering Institute

Initial Evaluation / Exploration 2Relevant tools & methods• descriptive statistics• run charts or SPC charts• time series• boxplots• correlation plots – first scan of relationships

typicalstumblingpoint

Page 46: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 46

Carnegie MellonSoftware Engineering Institute

Is this the right data?• unexpected high

inspection rate• unusually large

SLOC perinspection

• How manyinspectorscontributed to theprep-hr effort?

Review ID Defects SLOC

SLOC/RevHr

Rev PREP

Defect/KSLOC

Defects/hour

30 9 9,800 933.3 10.5 0.918 0.85732 5 16,091 804.6 20 0.311 0.25034 45 73,344 2,716.4 27 0.613 1.66736 45 32,352 808.8 40 1.390 1.12537 12 51,525 5,725.0 9 0.233 1.33341 13 98,207 4,214.9 23.3 0.132 0.55843 19 16,091 707.3 22.75 1.180 0.83544 13 204,216 8,168.6 25 0.064 0.52045 14 80,775 4,895.5 16.5 0.173 0.84847 2 72,747 5,914.4 12.3 0.027 0.16348 14 10,901 681.3 16 1.284 0.87550 11 11,468 1,146.8 10 0.959 1.10052 31 16,909 573.2 29.5 1.833 1.05153 17 28,538 1,902.5 15 0.596 1.13357 22 18,136 824.4 22 1.213 1.000

?Exploration Example1

Page 47: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 47

Carnegie MellonSoftware Engineering Institute

Exploration Example2

0 20000 40000 60000 80000 1000005

10

15

20

25

30

35

40

Y=19.06652+1.28868E-5 x

B DATA1.B.LR

SLOC

Scatterplot of Review hours vs SLOC Reviewed

Rev

iew

Hou

rs

R = 0.04549R2 = 0.00207

Little to nocorrelationbetweenSLOC sizeandinspectioneffort

Page 48: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 48

Carnegie MellonSoftware Engineering Institute

Evaluation Example3

Given there is no correlation between review time and theamount of SLOC reviewed,

What questions can be raised about the• SLOC count?• review time?• number of defect?• defect density?• defects discovered per review hour?

Page 49: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 49

Carnegie MellonSoftware Engineering Institute

Can I Move Forward?

Does the initial evaluation/exploration of data support thecritical assumptions?

What are your assumptions?• are they explicitly articulated?• for process, for data?

What are the risks you are taking if you move forward withthe assumptions you have made?

Is the variability or presence of process issues sosignificant that they overshadow data issues?

typicalstumblingpoint

Page 50: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 50

Carnegie MellonSoftware Engineering Institute

Moving Forward 1Moving forward is often a judgment call• can proceed with further data and process analysis in

parallel with improving data- it’s a tradeoff and a matter of balancing risks

• else get the “right” data before proceeding

Types of actions• removing assignable causes• reducing common cause variation• testing hypotheses• further decomposing data and process

typicalstumblingpoint

Page 51: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 51

Carnegie MellonSoftware Engineering Institute

Moving Forward 2This is the “heart” of the analysis• Explore, establish/confirm cause-effect relationships• Plot trends over time• Look for and identify the “drivers” or dominant factors• Gauge the variation of the variables• Find assignable causes• Determine stability and capability of processes• Decompose to find root cause

Relevant tools & methods• The “Basic Tools”

Page 52: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 52

Carnegie MellonSoftware Engineering Institute

Moving Forward- Basic Tools

Fundamental data plotting and diagramming tools• Cause & Effect Diagram• Histogram• Scatter Plot• Run Chart• Box and Whisker Plots• Pareto Chart• Control chart

The list varies with source. Alternatives include• Bar charts• Flow Charts• Descriptive Statistics (mean, median and so on)• Check Sheets

Page 53: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 53

Carnegie MellonSoftware Engineering Institute

Moving Forward-Establish Relationships

0 200 400 600 800

0

20

40

60

80

100

120

C DATA1.C.LR

New/changed SLOC Reviewed

Review Hours vs New/Changed SLOC

Rev

iew

Hou

rs

R = 0.83569R2 = 0.69838

0 1000 2000 3000 4000 5000

0

20

40

60

80

100

120

C DATA1.C.LR

Total SLOC

Review Hours vs Total SLOC

Rev

iew

hou

rs

R = 0.19244R2 = 0.03703

Page 54: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 54

Carnegie MellonSoftware Engineering Institute

Moving Forward-Identify Dominant Factors

Profile of Defects Found in Product XYZ

0

15

30

45

Syntax Error AmbiguousRequirements

Interface Error Missing Function Memory

Defect Type

Num

ber

of D

efec

ts

0%

25%

50%

75%

100%

Cum

ulat

ive

Per

cent

age

Number of Defects

Cumulative Percentage

Page 55: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 55

Carnegie MellonSoftware Engineering Institute

Moving Forward-Determine Extent of Variability 1

34 36 38 40 42 44 46 48 50 52 54 5602468

101214161820

Num

ber

of D

ays

Product-Service Staff Hou Time to fix a defect found after development

Numberof defects

Look for multimodal distributions.They point to multiple processes.

Basic Histogram showsdistribution, spread.

Page 56: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 56

Carnegie MellonSoftware Engineering Institute

Moving Forward-Determine Extent of Variability 2

36 38 40 42 44 46 48 50 52 540

2

4

6

Product Service Staff-Hours

Fre

quen

cy C

ount

LCL= 36.08 UCL=54.04CL= 45.06

Voice of the Process

Add controllimits to reflectprocesscapability

Page 57: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 57

Carnegie MellonSoftware Engineering Institute

Moving Forward-Determine Extent of Variability 3

25 30 35 40 45 50 550

2

4

6

Product Service Staff-Hours

Fre

quen

cy C

ount

LCL= 36.08 UCL=54.04CL= 45.06

LSL= 30 USL= 50Target = 40

Voice of the customer

Voice of the Process

Addspecificationlimits:

ProcessCapability

vs.

CapableProcess

Page 58: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 58

Carnegie MellonSoftware Engineering Institute

Moving Forward-Find Assignable Causes

.

0 20 40 60 80 100

0

100

200

300

400

LCL=0

CL=104

UCL=292

Wait time-days

2nd Qtr and 3rd Qtr Problem Closures

Pro

ble

m W

ait

tim

e-d

ays

Problem Closure Sequence

Problem Repair-Wait time• Issue: Delays inrepairing softwaretest sets

• Control chartindicates processunpredictable

• Pattern suggestsmixture of causesystems

Page 59: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 59

Carnegie MellonSoftware Engineering Institute

2nd Qtr and 3rd Qtr ProblemClosures Wait time

0 100 200 300 4000

5

10

Wait time-Days

Fre

qu

ency

Co

un

t

Problem Repair- Wait time

Histogram indicates dataincludes possible mixture ofcause systems

• One process for problemsup to 150 wait days

• A second process involvingmore than 150 wait days

Moving Forward-Finding Assignable Causes

Page 60: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 60

Carnegie MellonSoftware Engineering Institute

Moving Forward-Find Assignable Causes

Problem repair Wait time < 150 days

0 10 20 30 40 50 60 70 80

0

20

40

60

80

100

120

140

160

LCL= 0

CL=67.29

UCL=161.3

Pro

ble

m w

ait

tim

e-d

ays

Problem Closure Sequence

One process with 67-day average wait time• Near stable•Investigate causesystem for drivingfactors§ nature of defect§ staffing§ equipment§ test set type

Page 61: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 61

Carnegie MellonSoftware Engineering Institute

Moving Forward-Find Assignable Causes

Another processwith average waittime of 246 days

0 5 10 15 20

100

150

200

250

300

350

400

LCL=86.82

CL=246

UCL=405.2

Problem Repair Wait time > 150 days

Pro

ble

m w

ait

tim

e-d

ays

Problem Closure Sequence.

Page 62: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 62

Carnegie MellonSoftware Engineering Institute

Moving Forward-Find Assignable CausesProblem Repair-Wait time• Determined that there were two processes in operation

• Since both were (near) stable, necessary to examinecause systems for components that may be the drivingcontributors to wide variation and make appropriatechanges to each process

• Activities undertaken:- Classification of problems (defects) reported and found- Classification of test sets- Evaluation of test equipment availability- Availability of necessary skills

Page 63: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 63

Carnegie MellonSoftware Engineering Institute

Decomposition

Decomposition is separating the process into itscomponent parts or data by one or more of its attributes

• Makes sources of variation visible

• Provides opportunity for process improvement

This approach is useful• when process is stable and process change is needed

to reduce variation• for highlighting unusual data attributes that may be the

source of variation

Page 64: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 64

Carnegie MellonSoftware Engineering Institute

Decompose Data

Def

ects

0

5

10

15

2001 2002

yr

All Pairs

Tukey-Kramer

0.05

•Defect datadecomposed byyear

•May alsodecompose byproject type,organizationalslices, and so on

•Means comparison test determines if data groupingsare statistically different. These groups are not different.

•Values and sample size are accounted for in the test.

Page 65: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 65

Carnegie MellonSoftware Engineering Institute

Decompose Process Data 1Twenty one components from same product, same team• approximately same size• approximately same complexity

Defects found in design inspection are:

Component 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 TotalsDefects 12 16 18 32 22 16 23 35 15 27 16 25 20 26 20 23 23 36 22 27 17 471

Defect Type Number of Defects per Type per ComponentFunction 3 5 4 4 4 3 3 20 4 11 2 3 3 5 3 7 4 5 5 15 2 115Interface 2 2 4 4 3 4 2 3 3 4 2 3 5 3 3 3 2 16 6 2 4 80Timing 1 1 0 1 1 0 2 1 0 0 2 0 1 1 1 1 1 0 1 0 0 15

Algorithm 0 0 1 14 2 0 0 0 0 0 0 1 5 2 7 6 5 1 2 0 1 47Checking 1 1 5 1 7 1 1 2 0 1 6 3 1 12 1 0 2 4 3 5 2 59

Assignment 0 2 0 4 1 2 1 3 2 3 2 8 1 0 2 1 2 1 0 1 1 37Build/Pkg. 3 1 1 2 1 0 0 4 3 6 1 0 2 1 1 1 3 2 2 2 1 37Document 2 4 3 2 3 6 14 2 3 2 1 7 2 2 2 4 4 7 3 2 6 81

Page 66: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 66

Carnegie MellonSoftware Engineering Institute

Decompose Process Data 2

0 5 10 15 20

25

05101520

30354045

LCL = -0.0448

CL = 22.4

UCL = 44.9

Mov

ing

Ran

geT

otal

Def

ects

Component Number

0 5 10 15 200

5

10

15

20

25

30

CL = 8.45

UCL = 27.6

Apparent stable processbehavior

•But, defect rate too highand too much variation

•Explore examination ofdefects by type

Page 67: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 67

Carnegie MellonSoftware Engineering Institute

DecomposeProcess Data 3

Establish process stabilityby defect type

X’s mark assignablecauses by defect type

Elimination of assignablecauses will reduce variation

0

1

2

3

4

5

6

0 5 10 15 20

Defect Type = BLD/PKG X

0 5 10 15 2002468

10121416

Defect Type = DocumentationX

0 5 10 15 20

Defect Type = Algorithm

X

XXX

X

0246810121416

0 5 10 15 20-1.0-0.50.00.51.01.52.02.5

Defect Type = Timing

0 5 10 15 200

2

4

6

8Defect Type = Assignment

0 5 10 15 200

2

4

6

8

10

12Defect Type = Checking

0 5 10 15 20024681012141618 Defect Type = Interface

X

0 5 10 15 200

5

10

15

20

.

Defect Type = Function

Page 68: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 68

Carnegie MellonSoftware Engineering Institute

Potential Process Improvement

0 5 10 15 20 0 5 10 15 200

5

10

15

20

25

30

35

40

LCL=6.63

CL=22.43

UCL=44.9

Before Improvement

Indi

vidu

als

CL=15.81

UCL=24.9

LCL= 0

After Improvement

Control chart on right reflects potential improvement if all assignable causes removed

Page 69: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 69

Carnegie MellonSoftware Engineering Institute

Repeat until…..

Root cause(s) found

The process is at target, with desired variability

Other process performance data has not suffered• I.e. the process has not been suboptimized

Relevant tools & methods• Management by Fact• 5 Whys• Dashboard

Page 70: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 70

Carnegie MellonSoftware Engineering Institute

Number-Crunching Tools

• Higher learning curve than others• Best for those doing data-driven improvementas large part of their workload

Statistical Package

• May be better suited for charts which anorganization is routinely monitoring than forexploration

Standalone SPCPackage

• Many new add-ins available• Enables a wider variety of charts

Excel Addin

• Most people have a copy• OK for some basic charts• Nice for presentations• Otherwise quite limited

Spreadsheet (Excel) Comment Analysis done in….

Page 71: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 71

Carnegie MellonSoftware Engineering Institute

Section III: Case Study

This concludes our introduction to analysis dynamics.

In Section III we will showcase these dynamics through acase study.

Context:• organization project portfolio includes both new

development and maintenance• project size and complexity varies significantlly• project schedules vary from <1 month to >18 months

Page 72: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 72

Carnegie MellonSoftware Engineering Institute

Case Study Overview

This case study features the following:• pursuit of customer satisfaction

- via proxies of defects and effort & schedule variance• initial data evaluation and exploration• initial data and process decomposition• separation of goals into “monitor” and “improvement”• first iteration of root cause analysis for improvement

goals

Along the way, we will use this stop sign• to pause and generalize,• to ask probing questions,• to extend the topic

Page 73: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 73

Carnegie MellonSoftware Engineering Institute

Analysis Dynamics 1

Getting Started

• Identify the goals

• Black box process view

• Is the data right?

• Do I have the right data?

Decision point:• If the data is not perfect,

do I move forward orobtain better data?

“We didn’t stumble here–there weregoals from the beginning–but it tooktime to clarify them, to make themquantitative, and to separatemonitoring from improvement.”

“We had a lot of missing data. Weconducted “data archaeology” asmuch as possible to backfill the dataset. Learnings were used to improvethe automation of data collection.”

“Our data wasn’t perfect, but nomatter how we sliced it, there wereclear improvements to pursue.”

Sound bytes

Page 74: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 74

Carnegie MellonSoftware Engineering Institute

Analysis Dynamics 2

Initial Evaluation• What should the data

look like?• What does the data

look like?• Can I characterize the

process and problem?

Decision point:• Can I address my goals

right now?• Or is additional analysis

necessary? at the same ordeeper level of detail?

• Can I move forward

“For earned value data, we foundthe process to be consistently “out ofspec,” yet the external customersseemed satisfied. Reconciling the‘voices’ of the process, externalcustomers and internal managementis part of the process.”

Sound bytes

“We were able to identify many ofthe “data rightness” issues withoutexploring the data. But, in somecases, it was necessary to dive intothe data to identify the issues.”

Page 75: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 75

Carnegie MellonSoftware Engineering Institute

Analysis Dynamics 3

Moving Forward• Further evaluation• Decompose the data• Decompose the process

Decision point:• Do I take action?• What action do I take?

Repeat until• root cause found• at target with desired

variation

“Initial iterations of decomposition willbe shown. Because of risksassociated with imperfect data, eachconclusion needs to be carefullyweighed against the need foradditional verifying data.”

Sound bytes

Page 76: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 76

Carnegie MellonSoftware Engineering Institute

Business ObjectivesCustomer Satisfaction

Track/chart fielddefects

Track/chart cost & scheduledeviation

Deliver high quality product

• other factors• survey or interview data

Plot, plot, plot:• trends• distributions• control charts (c-charts)• scatter plots

Plot, plot, plot:• trends• distributions• control charts (x-bar, r; x, mr)• scatter plots

otherfactors

Success Indicators,Management Indicators

Analysis Indicators,Progress Indicators

Analysis Indicators,Progress Indicators

SPI TaskPlans

Goals from thebeginning of

effort

Project: What are leadingin-process indicators ofsuccess? Where areimprovements needed?

Page 77: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 77

Carnegie MellonSoftware Engineering Institute

Customer Data

• What data are readily available data?-post-project surveys

• Data archeology-What has been communicated via emails, phonecalls

• Is the data “perfect”? NO-few responses-qualitative responses

• New data collection needed:-updated, routine customer survey

By the way, is dataever perfect? Canyou afford to wait forperfect data?

Page 78: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 78

Carnegie MellonSoftware Engineering Institute

Customer Data - SampleQualitative comments, all positive:• Pleasure to work with!• Outstanding in all aspects!!• If this team had been on this project from the start a lot of

things may have gone smoother.• Really good to work with. Have been working with them 2-

3 years now. They do a good job and we get along well.

Quantitative comments• Finished testing without having to create any additional

builds.• We were able to save three flights.

Page 79: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 79

Carnegie MellonSoftware Engineering Institute

Defect Data• What data are readily available data?

-peer review inspection data• Data archeology

-field defects and confirmation of in-process defects• Is the data “perfect”? NO

-missing data-defect data skewed toward low priority defects-variations in operational definitions-feedback loops at group level, not org level

• New data collection needed:-confirmed operational definitions-improved automation of data collection process

Page 80: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 80

Carnegie MellonSoftware Engineering Institute

Field Defect Data BaselineOrganizational goal: 0 field defects

Field defects

In-process defect detection• # of defects vs. development life cycle

0100200

300400500600

700800900

1000

110012001300

Sum

(def

ects

)

1-P

rojP

lan

2-R

eqt

3-D

esig

n

4-C

ode

6-D

oc

7-F

inal

Tes

t

phase id

0

100

200

300

400

500

600

700

800

900

1000

Sum

(def

ects

)

1-P

rojP

lan

2-R

eqt

3-D

esig

n

4-C

ode

6-D

oc

7-F

inal

Tes

t

phase id

FY01 FY02

Field DefectsFY 01 4FY 02 4 When your “count for

the year” is 4, howuseful are controlcharts?

And, if your countsare higher?

Leading in-processindicators are whatyou should considerfor control charting.

Page 81: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 81

Carnegie MellonSoftware Engineering Institute

Earned Value DataReadily available data• monthly process effort, cost, schedule• compared to specification

- with text entry for out of spec causes

Data “archaeology”:• completed project data

- final vs. original with differences categorized

Is the data “perfect”? NO• losing track of replanning impact on performance• monthly data uses non-homogeneous sample• sparse data – some parts of organization better represented• not sure if “extreme” outliers can be excluded

Page 82: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 82

Carnegie MellonSoftware Engineering Institute

Completed Project Data Baseline

This represents (initial plan – final actual)• negative numbers are overruns• schedule is in terms of calendar days

It is the total cumulative variance• customer-requested/approved changes are included• one way or another, this is what the customer sees

% effort variance % sched varianceaverage -66.1% -15.0%standard deviation 415.9% 38.3%median 0.9% -8.1%min to max -2689.9% to 50.1% -99.8% to 128.0%n 42 42capability notes(spec = +/- 20%)

45.2%outside spec

40.4%outside spec

-30

-25

-20

-15

-10

-5

0

5

LSL

USL

Target

-1

-0.5

0

0.5

1

1.5

LSL

USLTarget

Page 83: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 83

Carnegie MellonSoftware Engineering Institute

internal/external categories

median contribution toward

total effort (cost) variance

# of projects reporting

median contribution toward

total schedule variance

# of projects reporting

internal project -30.83% 7 -34.32% 4

internal organization, outside project -1.25% 5 -73.77% 3external, reqt -22.48% 10 -20.20% 10external, sched 0.00% 15 -98.36% 17

Completed Project Data - Decomposed

Contribution to total variance, by internal/external categories

“Internal” and “external” taxonomy selected based on“sphere of influence and control”

Risk: while “internal causes” seem to be a significantopportunity, a small number of projects reported such causes

Page 84: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 84

Carnegie MellonSoftware Engineering Institute

Explore, Evaluate (Plot, Plot, Plot) 1

-1

-0.5

0

0.5

LSL

USLTarget

-0.5

0

0.5

1

1.5

LSL

USLTarget

% effort variance % sched varianceavg -2% 13%std dev 33% 36%median 2% 7%min to max -95% to 50% -128% to 71%capability notes(spec = +/- 20%)

43.8% outside spec

39% outside spec

When flyers are removed•Averages closer to target, spread narrowed•Medians minimally affected•Still nearly as many outside specs•Small “second peak” more visible

• What are guidelinesfor removing flyers?

• Average vs. median

Page 85: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 85

Carnegie MellonSoftware Engineering Institute

Explore, Evaluate (Plot, Plot, Plot) 2Schedule Variance Distribution to Time Series

same

data -150

-100

-50

0

50

100

150

200

Per

cent

Sch

edul

e V

aria

nce

0 10 20 30 40 50

Rows-100

-50

0

50

100

150

LSL

USLTarget

Time series plot shows• where in time the contributions tooverall high variability occur

• possible change in variability over time• where in time the points of the possible“second population” occur

Why not acontrol chart?

Page 86: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 86

Carnegie MellonSoftware Engineering Institute

Explore, Evaluate (Plot, Plot, Plot) 3Schedule Variance Time Series to Control Chart

same

data-150

-100

-50

0

50

100

150

200

Per

cent

Sch

edul

e V

aria

nce

0 10 20 30 40 50

Rows

-150

-100

-50

0

50

100

150

200

Per

cent

Sch

edul

e V

aria

nce

CBA

CB

A

4 8 12 16 20 24 28 32 36 40 44

Avg=-14.97

LCL=-114.28

UCL=84.34

0

50

100

150

200

Per

cent

Sch

edul

e V

aria

nce

4 8 12 16 20 24 28 32 36 40 44

Avg=37.35

LCL

UCL=122.02

UCL 84.3%

Avg –15%

LCL –114.3%

Control charts also show• possible second “population”• wide variability

But,• process may just not be stat. control

(if 2 populations, assumption violated)

• wide limits have limited practical value (use for off-line analysis only at this stage)•control charts geared for monitoring sustainment not improvement

Page 87: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 87

Carnegie MellonSoftware Engineering Institute

In-Process Cost/Schedule Data Baseline

Organizational goal (specification): +/-20%

In process effort/cost data• all life cycle phases, all projects, Oct – June (770+ pts)

In process schedule data• all life cycle phases, all projects, Oct – June (770+ pts)

mean +/- 3 standard deviations-7.2654498 +/- 19.23 or -64.96 to 50.43

capability notesspec = +/- 20% 17% of values outside spec

all data extreme values excluded

mean +/- 3 standard deviations-32 +/- 3*423 or-1301 to 1237

-2 +/- 3*25 or-77 to 73

schedule capabilityspec = +/- 20% 18% of values outside spec 17% of values outside spec

Page 88: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 88

Carnegie MellonSoftware Engineering Institute

%S

ch. V

ar.

-100

-80

-60

-40

-20

0

20

40

60

80

100

1: oct-01 2: nov-01 3: dec-01 4: jan-02 5: feb-02 6: mar-02

month label

In-Process Schedule Variance Boxplot

x

90th percentile75th percentilemedian: 50th percentile25th percentile10th percentile

mean

Data reported monthly for all projects, cycle phases

Conclusion: need to address variability

Why a boxplotand not an SPCchart?

Page 89: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 89

Carnegie MellonSoftware Engineering Institute

Are There Group Differences?

Schedule Variance, all projects, Oct 01 to Jun 02

Boxes influenced by quantity of data, and numbers themselves

Are there statistically significant group to group differences: NO

%sc

h va

r

-100

0

100

A B B F K N T

branch

All Pairs

Tukey-Kramer 0.05groups within organization

test forsignificantdifference

Page 90: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 90

Carnegie MellonSoftware Engineering Institute

Are There Project Differences?%

sch

var

-50

-40

-30

-20

-10

0

10

20

30

40

AAR-44 MIP 01-0122

AAR-44A BC3

AAR-47 Missile Warning Receiver BC1

ALE-47 BC1

ALE-47 BC1.1

ALIC FMS R 3.4.2

ALIC MDT BC 6

ALIC MDT Urgent Change

ALIC T5 Bad Az Fix

ALIC T5 Urgent Change

ALM-233 ASE Tools

ALM-233 Pod Test (ASE 2001)

ALM-233 Self Test (ASE 2000)

ALQ-131 ILSE BC3

ALQ-131 ILSE BC4

ALQ-131 MDG BC2

ALQ-131 OFP BC 0.1 Urgent

ALQ-131 POD OFP BC 0.1

ALQ-131 POD OFP BC 0.1 - Urgent

ALQ-131 POD OFP BC 1.5A

ALQ-135 CFG Update

ALQ-135 Urgent Change

ALQ-155/SI OFP BC4

ALQ-155/SI OFP BC5

ALQ-161 ERS 6.5

ALQ-161 PFS 4.06

ALQ-161 PFS 4.20

ALQ-161 PFS Block E

ALQ-161 RSST V2

ALQ-161 RSST V2.0

ALQ-161 RSST Version 2.0

ALQ-162 1553B Comm. Phase II

ALQ-172 AFSOC V1/V3 BC1

ALQ-172 ECM1 BC4

ALQ-172 O-Level TPS Urgent Change

ALQ-172(DBM)

ALQ-172(V) FFSD/ED Re-Host

ALQ-184 BC3

ALQ-184 Flight Recorder

ALQ-196 MDG

ALQ-196 OFP

ALQ-196 OFP BC3

ALQ-196 OFP BC3/MDG BC2

ALQ-196 Q196-00082-01 Urgent Change

ALQ-213 SWV010F

ALQ-213 SWV020A

ALQ-213 WinMDT

ALR-56M C-130J (2040)

ALR-56M V004Y/V004Z

ALR-69 ACVR BC1201

ALR-69 BC14/15

ALR-69 SWV 1004

ALR-69 SWV 1004/10

ALR-69 SWV 1305

EJTAT Urgent Cha

HARM PNU MPT

IEWS BC2

MERITS

MH-53M AI

MH-53M

TEWS

USM-4

USM-

USMProject Name

Each box represents the timeline of an individual project

Are there statistically significant project to project differences: YES,in some cases (Tukey-Kramer test not shown)

Conclusion: Non-homogeneous sample (data from all points along“project timeline”) was a major contributor to the “significantdifferences” and to the overall variability

Schedule Variance, all projects Oct 01 to Jun 02

Page 91: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 91

Carnegie MellonSoftware Engineering Institute

Improving Sampling & Analysis

Overall rollup:• group the data by project milestones

Within project:• identify different control limits for each development

phase• compare each project’s phase against the history of

similar projects in that same phase• robust sample for limit calculations is critical

A L Q 1 8 4 p r o je c t c o s t in d

-1 0

-5

0

5

1 0

1 3 5 7 9

11 13 15

M o n t h

project cost index

wider limitsfor projectsin planningphase

narrower limitsfor projects inexecution phase

Page 92: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 92

Carnegie MellonSoftware Engineering Institute

Our Improvement Focus

Two performance improvement priorities, for two differentportions of the organization• effort variation reduction• schedule variation reduction

Additionally, a specific improvement effort to efficientlygather more complete, more consistent data• needed to more fully understand the magnitude of

variability• needed to set exact (SMART) improvement goals

(Specific, Measurable, Agreed upon, Realistic, Timely)

Page 93: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 93

Carnegie MellonSoftware Engineering Institute

Can We Address the Goals?

This is a decision point in the analysis dynamics.

Do we have enough understanding of our data andprocess? NO

Key questions at this stage• What are the root sources of the variability?• How does the in-process variability provide an early

view of the end-of-project result?

Page 94: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 94

Carnegie MellonSoftware Engineering Institute

Data and Process Decomposition

Brainstormed root causes of variance

Decomposed process into 4 main subprocesses• mapped cause codes to process• identified cause codes that are resolved in-process

Data archaeology• evaluated cause codes using historical data

risks of dataarchaeology vs.starting anew

Page 95: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 95

Carnegie MellonSoftware Engineering Institute

Transformed original brainstorm list• initial experiential assessment of frequency, impact of each

cause code• refined “operational definitions” and regrouped brainstorm list• tagged causes to historical data• refined again

Final list included such things as• Missed requirements• Underestimated task• Over commitment of personnel• Skills mismatch• Tools unavailable• EV Method problem• Planned work not performed• External• Other

Cause Codes

Direct Cause vs.Root Cause

Causes resolved in-process vs. causesthat affect finalperformance

Page 96: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 96

Carnegie MellonSoftware Engineering Institute

Four High-Level Processes thatInfluence Final Performance

Technical Processes-Design-Implement-Formal Test-Release

Project Monitoring and Control-Measurement-Quality Assurance-Peer Review

Organizational Management-Workload Agreements-Resource Allocation-Funding-Training

Project Management-Workload Proposal-Planning-Requirements Management-Configuration Management-Decision Analysis and Resolution-Training

Cause Codes were mapped to these processes

Page 97: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 97

Carnegie MellonSoftware Engineering Institute

Prioritizing the CausesExternal

Underestimated TasksTools

EV Method

Algorithms and Assumptions• frequency & impact ofoccurrences – and whichoccurrences?

Cause Codes• Which are resolved inprocess?

Sphere of Influence• internal vs. external• degree of “processunderstanding”• degree of “process control”

• Pie Chart vs. Pareto?• Does everyone understand

where the data came from?• Are the algorithms and

assumptions valid?• What are the risks?

Page 98: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 98

Carnegie MellonSoftware Engineering Institute

Data TreatmentsProject Month Cause Code Variance Repeat?

A 1 4 4A 2 4 3 YA 3 5 7B 1 2 2B 2B 3 4 4C 1 5 8C 2C 3

Cause Code frequencyimpact

(average)frequency x

impact (or sum)2 1 2 24 2 4 85 2 7.5 15

Cause Code data maybe summarized byfrequency (f), impact (i),or f x i.

Usage of the latterresembles methodsused to evaluate,mitigate risk

frequency impactH HM ML L

Risk Mitigation Analogy

Might also use median

Page 99: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 99

Carnegie MellonSoftware Engineering Institute

Co-Optimizing Across theOrganization – Internal Causes

EV ProblemsUnexpecteddeparture

Planned worknotperformed

Assetavailability

Underestimatedtask

SkillsMismatch

5

Unexpecteddeparture ofpersonnel

MissedRequirements

EV ProblemsUnderplannedrework

Plannedwork notperformed

Missedrequirements

4

Missedrequirements

UnderestimatedTask

Missedrequirements

Missedrequirements

Underplannedrework

EV Problems3

UnderestimatedTask

Skillsmismatch

Underplannedrework

EV ProblemsAssets notavailable

Tools2

ToolsToolsUnderestimatedTask

UnderestimatedTask

ToolsUnderestimatedTask

1

OrganizationSlice 2 Effort

OrganizationSlice 2

Schedule

OrganizationSlice 1 Effort

OrganizationSlice 1

Schedule

EffortScheduleImpact# (fromPareto)

Page 100: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 100

Carnegie MellonSoftware Engineering Institute

In-process data as leading indicator

In-process data

freq impact f x i

Join the views of completed project performanceand in-process performance.

Since “cause categories” differ between the datasets, the first iteration is not trivial

internal/external categories

median contribution toward total effort (cost)

variance

# of projects reporting

median contribution toward

total schedule variance

# of projects reporting

internal project -30.83% 7 -34.32% 4

internal organization, outside project -1.25% 5 -73.77% 3external, reqt -22.48% 10 -20.20% 10external, sched 0.00% 15 -98.36% 17

Page 101: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 101

Carnegie MellonSoftware Engineering Institute

SMART Schedule Variance Goal

Reduce the total variance by decreasing the variance ofthe top 3 internal causes by 50% in 1 year

Reduce the impact of external causes by 50%

Indicators:• Trend for each cause independently• Trend for total variance

Will focus on these causesgive us bottom line results?

Page 102: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 102

Carnegie MellonSoftware Engineering Institute

Schedule Variance Root Cause 1Cause Code: Underestimated tasks

Process: Project Management

Subprocesses: Planning• Establish requirements• Define project process• Perform detailed planning

Requirements Management

As subprocesses are explored, process mapping techniquesmay be used with (or based on) ETVX diagrams

Page 103: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 103

Carnegie MellonSoftware Engineering Institute

Schedule Variance Root Cause 2Root Causes of Common Cause Variation• Inexperience in Estimation process• Flawed resource allocation.• Inexperience in product (system) for

estimator• Requirements not understood

Root causes of Special Cause Variation• Too much multitasking• Budget issues

A list of possible countermeasures wasdeveloped

Pros/Cons of doingthis retrospectivelyvs. real time

What is neededbefore executing thecountermeasures?

Could the “specialcauses” also be“common causes”?

Page 104: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 104

Carnegie MellonSoftware Engineering Institute

Putting it all Together

Dashboard to monitor “the whole picture”• customer satisfaction• defects• effort and schedule variance

Management by Fact* to monitor improvement efforts• effort variance reduction• schedule variance reduction• measurement quality improvement

Reference process documentationand project management principles in use.

*Tooltip for Management by Fact (MBF) in the Addendum

Who uses thedashboardsand MBFs?

Page 105: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 105

Carnegie MellonSoftware Engineering Institute

Notional Management by Fact (MBF)Reduce the total schedule variance by decreasing thevariance of the top 3 internal causes by 50% in 1 year.

Total variance w/mean comparison

Variance for top 3 causes:• Underestimated Tasks• EV Method Problem• Missed Requirements

Prioritization &Root Cause

• Inexperience• Resource Allocation• Requirements not

understood• ….

Counter Measures

First: Gather realtime data andverify “data archaeology”Then:•….•…

Impact, Capability

In total, thesecountermeasures willremove 15% of typicalvariance.(as possible, list impact ofeach countermeasure)

0

5

10

15

20

25

30

1 2 3 4 5Month

Per

cent

Sch

edul

e V

aria

nce

0123456789

1 2 3 4 5Month

Per

cen

t S

ched

ule

Var

ian

ce

Cause 1Cause 2Cause 3

Still needed: Relate in process and completed project data

Page 106: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 106

Carnegie MellonSoftware Engineering Institute

Notional DashboardEarned Value DataIn-process data:• monthly schedule index,cost index by projectmilestones

% c

ost

var

-100

-75

-50

-25

0

25

50

75

100

125

150

1: oct-01 2: nov-01 3: dec-01 4: jan-02 5: feb-02 6: mar-027: apr-028: may-029: jun-02

month label

Defect Data: Tally of Field Defects

CustomerSatisfaction

Return onInvestment

Completed projects data:• control chart• % outside spec• contribution of internalcauses to completed projectvariance

MeasurementQuality

ROI

Effort

CompletenessAccuracyProcedural Adherence

Other possible inclusions:• Engineering process procedural adherence (as a leadingindicator for EV, defect and measurement quality performance)

0 5 1 0 1 5 2 0 2 5 3 0 3 5

-1 0 0

- 8 0

- 6 0

- 4 0

- 2 0

0

2 0

4 0

6 0

8 0

.

L CL = -9 3.86

C L = -3 .74 6

U CL = 8 6.3 7

Mo

vin

g R

ange

P er ce nt va ria n ce --E f fo rt (O ct 93 thru M a r02 )Co m plete d P rojec ts

Indi

vidu

als

S ub grou p N o .0 5 1 0 1 5 2 0 2 5 3 0 3 5

0

2 0

4 0

6 0

8 0

1 0 0

1 2 0

C L =3 3 .8 8

U C L = 1 1 0 .7

0 5 1 0 1 5 2 0 2 5 3 0 3 5

-1 0 0

- 8 0

- 6 0

- 4 0

- 2 0

0

2 0

4 0

6 0

8 0

0 5 1 0 1 5 2 0 2 5 3 0 3 5

-1 0 0

- 8 0

- 6 0

- 4 0

- 2 0

0

2 0

4 0

6 0

8 0

.

L CL = -9 3.86

C L = -3 .74 6

U CL = 8 6.3 7

Mo

vin

g R

ange

P er ce nt va ria n ce --E f fo rt (O ct 93 thru M a r02 )Co m plete d P rojec ts

Indi

vidu

als

S ub grou p N o ..

L CL = -9 3.86

C L = -3 .74 6

U CL = 8 6.3 7

Mo

vin

g R

ange

P er ce nt va ria n ce --E f fo rt (O ct 93 thru M a r02 )Co m plete d P rojec ts

Indi

vidu

als

S ub grou p N o .0 5 1 0 1 5 2 0 2 5 3 0 3 5

0

2 0

4 0

6 0

8 0

1 0 0

1 2 0

C L =3 3 .8 8

U C L = 1 1 0 .7

“return” = variancereduction translated into $$

Note: In-process profiles to be shown on “group level”dashboards

Page 107: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 107

Carnegie MellonSoftware Engineering Institute

Organization Specific Process

SelectBusiness Goal

(CustomerSatisfaction)

GatherData

Analyze DataPrioritize

Issues

IdentifyPossibleCauses

(Brainstorm)

Perform CausalAnalysis (OPP)

PrioritizeActualCauses

IdentifyPotentialSolutions

DevelopAction Plan

ImplementImprovement

IdentifiedThresholds

Business ObjectiveSpecsPerformance Thresholds

•Project Performance•Measures Quality•SPI Implementation

•Snapshot (1st Iteration Baseline)•Issues (Validity of data, Quality ofData, Variance (performance)

No “Issues”Establish capability, models, etc.

Start subprocessselection

Draft ImprovementGoal (SMART) orIdentify focus area

Improvements

Gather Data/Analyze Data

Goal Refinement1st Iteration Final Goal

Page 108: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 108

Carnegie MellonSoftware Engineering Institute

Case Study Summary 1• Goal: Customer satisfaction via effort, schedule, field defects• Black Box Process: not explicitly dealt with until root cause• Right Data:

- in-process data available- needed to “data mine” for completed data- some “new data needs” identified

• Data is Right- multiple iterations to correct some data (is this in slides?)

• Explore/Evaluate- key to determining need for “data archaeology”- put field defects into “monitor” mode- focus on improving effort, schedule variability (or change

specs)- focus on improving measurement quality- focus on improving sampling schemes

iterative,the

“dynamics”overlap

goals getSMARTer,

morequantitative

Page 109: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 109

Carnegie MellonSoftware Engineering Institute

Case Study Summary 2• Explore/Evaluate continued

- extent of variability characterized- some decomposition conducted to distinguish

overall variability vs. multiple populations• Data & Process Decomposition

- Sub processes of interest selected based on paretoanalysis of “cause codes

• Root Cause Analysis:- many direct causes identified- separating common and special causes of variability- we’re getting there….

decompositionstarts in “initialexploration”

Page 110: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 110

Carnegie MellonSoftware Engineering Institute

Case Study Summary - Tools Used

StatisticalControls:• Control Charts• Time Seriesmethods

Non-StatisticalControls:• Proceduraladherence• PerformanceMgmt• Preventiveactivities

Design ofExperimentsModeling

TolerancingRobust Design

SystemsThinkingDecision &Risk Analysis

7 Basic ToolsCause & EffectDiagrams,MatrixFailure Modes &Effects AnalysisStatisticalInferenceReliability AnalysisRoot CauseAnalysis4 Whats5 WhysHypothesis TestANOVA

DefectMetricsDataCollectionMethodsSamplingTechniquesMeasurementSys. EvaluationQuality ofData

DefineBenchmarkBaselineContract/CharterKano ModelVoice of theCustomerVoice of theBusinessQuality FunctionDeploymentProcess FlowMapProjectManagement“Managementby Fact”

Measure Analyze Improve Control

adaptedtechnique for

impactevaluation

bold = tool used

anticipate futureuse for theseimprovement

efforts

control charts for limited analysis NOT as control mech.

Page 111: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 111

Carnegie MellonSoftware Engineering Institute

Summary – Key PointsShow me the data! Follow the data!

Couple data analysis with your knowledge of the process.

If your number-crunching is not adding value, then STOP!• Have a goal: a monitoring goal, an improvement goal

This isn’t that hard.• Slow down, think about your process and proceed

methodically

But it isn’t that easy either. (If it were, we’d all be out of a job).• Don’t be afraid to explore your data, to pursue your ideas.

Use your goals and your data as your guides.

You can get yourself into a chicken-and-egg argument with data.• Sometimes, you need to just dive in with what you have.

Page 112: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 112

Carnegie MellonSoftware Engineering Institute

Contact InformationBill FloracSoftware Engineering InstituteSoftware Engineering Measurement and AnalysisEmail: [email protected]

Jeannine SiviySoftware Engineering InstituteMeasurement & Analysis InitiativeEmail: [email protected]

Contact us for a copy of the slides.Or, leave a business card with Jeannine or Bill.Also, they will be posted on the SEMA web pageshttp://www.sei.cmu.edu/sema

Page 113: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 113

Carnegie MellonSoftware Engineering Institute

ReferencesNote: URL’s subject to change without notice

[ASA 01] American Statistical Association, Quality & Productivity Section, Enabling BroadApplication of Statistical Thinking, http://web.utk.edu/~asaqp/thinking.html, 2001

[ASQ 00] ASQ Statistics Division, Improving Performance Through Statistical Thinking,Milwaukee: ASQ Quality Press, 2000. H1060

[Deming] Deming, W. Edwards, Out of the Crisis. Cambridge, Mass.: Massachusetts Institute ofTechnology, Center for Advanced Engineering, 1986

[Wheeler 92] Wheeler, Donald, and David S. Chambers, Understanding Statistical Process Control,SPC Press, 1992

Page 114: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 114

Carnegie MellonSoftware Engineering Institute

Additional ReadingReferences on statistics and analytical tools (URL’s subject to change without notice)

General Statistics and Tools

Davis, Wallace III, Using Corrective Action to Make Matters Worse, Quality Progress, October2000

Gonick, Larry and Smith, Woollcott, The Cartoon Guide to Statistics, HarperPerennial,1993

The Memory Jogger, Goal/QPC, http://www.goalqpc.com

Wheeler, Donald, J. Understanding Variation – The Key to Managing Chaos, SPC Press, 1993

Statistical Process Control

AT&T / Western Electric Co., Statistical Quality Control Handbook, Delmar Printing Company

Chrysler, Ford, General Motors Corp., Statistical Process Control – SPC, A.I.A.G. 1995

Florac, William A., and Anita D. Carleton, Measuring the Software Process, Addison-Wesley,1999

Wheeler, Donald, and David S. Chambers, Understanding Statistical Process Control, SPC Press,1992

Wheeler, Donald and Polling, Sheila, Building Continual Improvement, SPC Press, 1998

Bayesian Modeling:

Fenton, Norman and Martin Neil, Software Metrics: Roadmap, International Conference onSoftware Engineering, 2000, available at http://www.softwaresystems.org/future.html

Page 115: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 115

Carnegie MellonSoftware Engineering Institute

AddendaAdditional vignettes

Tool tips

Page 116: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 116

Carnegie MellonSoftware Engineering Institute

Example of an Aidfor OperationalDefinitions usingOrthogonalClassification

P ro b le m S ta tu s In c lu d e E x c lu d e V a lu e C o u n t A r ra y C o u n t

O p e n ✔ ✔

R e c o g n iz e d ✔

E v a lu a te d ✔

R e s o lv e d ✔

C lo s e d ✔ ✔

P ro b le m T y p e In c lu d e E x c lu d e V a lu e C o u n t A rr a y C o u n t

S o f tw a re d e fe c tR e q u ire m e n ts d e fe c t ✔ ✔

D e s ig n d e fe c t ✔ ✔

C o d e d e fe c t ✔ ✔

O p e ra tio n a l d o c u m e n t d e fe c t ✔ ✔

T e s t c a s e d e fe c t ✔

O th e r w o rk p ro d u c t d e fe c t ✔

O th e r p ro b le m sH a rd w a re p ro b le m ✔

O p e ra tin g s ys te m p ro b le m ✔

U s e r m is ta k e ✔

O p e ra tio n s m is ta k e ✔

N e w re q u ire m e n t /e n h a n c e m e n t ✔

U n d e te rm in e dN o t re p e a ta b le /C a u s e u n k n o w n ✔

V a lu e n o t id e n t if ie d ✔

U n iq u e n e s s In c lu d e E x c lu d e V a lu e C o u n t A rr a y C o u n t

O r ig in a l ✔

D u p lic a te ✔ ✔

V a lu e n o t id e n t if ie d ✔

C r it ic a l ity In c lu d e E x c lu d e V a lu e C o u n t A rr a y C o u n t

1 s t le v e l (m o s t c r it ic a l) ✔ ✔

2 n d le v e l ✔ ✔

3 rd le v e l ✔ ✔

4 th le v e l ✔ ✔

5 th le v e l ✔ ✔

V a lu e n o t id e n t if ie d ✔

U rg e n c y In c lu d e E x c lu d e V a lu e C o u n t A rr a y C o u n t

1 s t (m o s t u rg e n t) ✔

2 n d ✔

3 rd ✔

4 th ✔

V a lu e n o t id e n t if ie d ✔

ReferencePage 33

Page 117: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 117

Carnegie MellonSoftware Engineering Institute

ComplianceIssues

May be basis forassignable causes

Compliance Issues Things to Examine When SeekingReasons for Noncompliance

Adherence to the process awareness and understanding of theprocess

existence of explicit standards

adequate and effective training

appropriate and adequate tools

conflicting or excessively aggressivegoals or schedules

Fitness and use ofpeople, tools, technology,and procedures

availability of qualified people, tools,and technology

experience

education

training

assimilation

Fitness and use ofsupport systems

availability

capacity

responsiveness

reliability

Organizational factors lack of management support

personnel turnover

organizational changes

relocation

downsizing

disruptive personnel

morale problems

Page 118: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 118

Carnegie MellonSoftware Engineering Institute

Initial Control Chart of InspectionPackage Review Rate (SLOC/Prep-Hr)

Assignablecauses due to:

• Erroneous andMissing Data

• Multiple CauseSystems (sixcomponentseach with owndevelopmentteam)

0

500

1000

1500

2000

2500

0 5 10 15 20 25

.

LCL=-437.7

CL=119.2

UCL=676.1

Mov

ing

Ran

ge S

LOC

/ Pre

p-H

r

Inspection Sequence Number

0

500

1000

1500

2000

0 5 10 15 20 25

CL=209.4

UCL=684.1

Inspection Package Review RateAll Product Components

Page 119: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 119

Carnegie MellonSoftware Engineering Institute

Inspection Package Review Ratefor Component A

0 5 10 15 20 25

0

100

200

300

400

.

LCL= 0

CL=47.3

UCL=170.5

Mov

ing

Ran

geS

LOC

/Pre

p-H

our

Inspection Sequence Number.

0 5 10 15 20 250

100

200

300

400

CL=46.33

UCL=151.4

Component A

Inspection Package Review RateRe-analyzed data

• Data errorseliminated• Consider singlemajor cause systemat a time• Control chart forone component• Several assignablecauses apparent

Page 120: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 120

Carnegie MellonSoftware Engineering Institute

Component A Review Rate

SL

OC

/Insp

ecti

on

-Hr

Investigation resultedin removal of separatecause systems includedin inspection packages:

• data tables• lists• arrays• different reviewprocess

0 5 10 15 20 25

0

100

200

300

400

LCL= 0

CL=47.3

UCL=168.2

Inspection Sequence Number

X

X

X

XX

X

Inspection Package Review RateComponent A

Page 121: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 121

Carnegie MellonSoftware Engineering Institute

Component A Revision

0 5 10 15 20

0

20

40

60

80

100

LCL= 0

CL=26.31

UCL=66.69

Inspection Sequence Number.

SLO

C/ I

nspe

ctio

n-H

r

Component A

Process Instability:Apparent shift ofprocess performanceafter #15

Leads to investigationof changes in processcause systems

Page 122: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 122

Carnegie MellonSoftware Engineering Institute

Cause-and-Effect Relationships

0 200 400 600 800 1000

0

100

200

300

400

Y=14.68315+0.22702 x

CORRELATION LINEAR 0.77 EXPONENTIAL (2) 0.86 EXPONENTIAL (3) 0.86

ModSloc

Slo

c p

er P

rep

Hr(

n)

Inspection review rate with increase in SLOC

Page 123: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 123

Carnegie MellonSoftware Engineering Institute

Component A Review Rate

Inspection Package Size (New and Changed SLOC)

Insp

ectio

n R

evie

w R

ate

( SLO

C/ In

spec

t ion-

Hr)

0

10

20

30

40

50

60

70

80

90

100

3 4 6 7

14 14 15 15 17 17 20 22 24 51 58 75 86 86 90 130

185

188

Amount of SLOC in review package is key driver of review time spent

Distribution of review rates by SLOC size

Page 124: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 124

Carnegie MellonSoftware Engineering Institute

Component A Review Rate

0

20

40

60

80

100

120

140

0 1 2 3 4 5 6LCL= 0

CL=61.9

UCL=154.5

Component A: Package Size > 60 SLOC

SLO

C/In

spec

tion-

Hr

7

LCL= 0

CL=10.7

UCL=29.5

Component A: Package Size < 60 SLOC

05

10152025

SLO

C/ I

nspe

ctio

n-H

r

2 4 6 8 10 12 14 16

Inspection Sequence Number

Inspection Sequence Number

Replot data using twocharts:• Rates for Inspection<60 SLOC• Rates for Inspection>60 SLOC

Indicates two processesin operation dependingon size of Inspectionpackage

Establish Trial Limits foreach subprocess

Page 125: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 125

Carnegie MellonSoftware Engineering Institute

0

10

20

30

40

50

0 5 10 15 20 25 30LCL = 0

CL=10.71

UCL=29.48

SLO

C/ I

nspe

ctio

n-H

r

Component A: Package Size < 60 SLOC

Inspection Sequence Number.

Inspection Sequence Number.0 2 4 6 8 10 12 14 16 18

0

100

200

300

400

500

LCL= 0CL=61.95

UCL=154.5

Component A: Package Size > 60 SLOC

SLO

C/ I

nspe

ctio

n-H

rComponent A Review Rate

Additional observations identify more assignable causes

Investigation determines that assignable cause observations from re-inspection process

Page 126: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 126

Carnegie MellonSoftware Engineering Institute

X

X

XX

X XX

X

X X

X

XX

0

10

20

30

40

50

0 5 10 15 20 25 30LCL = 0

CL=10.71

UCL=29.48

SLO

C/ I

nspe

ctio

n-H

r

Component A: Package Size < 60 SLOC

Inspection Sequence Number.

Inspection Sequence Number.0 2 4 6 8 10 12 14 16 18

0

100

200

300

400

500

LCL= 0CL=61.95

UCL=154.5

Component A: Package Size > 60 SLOC

SLO

C/ I

nspe

ctio

n-H

rComponent A Review Rate

All re-inspectiondata identified andremoved fromcontrol chart sincethey represent adifferent process(different causesystem)

Page 127: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 127

Carnegie MellonSoftware Engineering Institute

Component A Review Rate

0 5 10 15 20 2505

10152025

LCL=0

CL=10.71

UCL=29.48

Component A: Package Size < 60 SLOC

SLO

C/ P

rep-

Hr

Inspection Sequence Number.

Component A: Package Size > 60 SLOC

020406080

100120140

0 1 2 3 4 5 6 7 8 9 10LCL= 0

CL=61.95

UCL=154.5

SLO

C/ P

rep-

Hr

Inspection Sequence Number.

Charts plottedwith remainingdata (single causesystem)

Additional datapoints reinforcetrial limitshypothesis

Page 128: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 128

Carnegie MellonSoftware Engineering Institute

Component A Review RateAnalysis summary:• Inspection process consists of several (3)undocumented subprocesses• Review rate appears to be stable within two categories(< and > 60 SLOC)• Inspection packages of 60 SLOC or more reviewedabout 6X faster than those with <60 SLOC

Key questions requiring more study:• Why difference in review rates?• Is there a difference in effectiveness (rate of escapeddefects)?• Do other components behave similarly?• How do rates compare from release to release?

Page 129: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 129

Carnegie MellonSoftware Engineering Institute

Tool Tips Part 1: The Basic Tools

Overview (description, procedure, tips, examples) for• run charts• spc charts• boxplots

- including pareto boxes• scatter plots• histograms, distributions and capability

- twist: rayleigh, weibull distributions• bar charts• pareto charts• cause&effect diagram

- including cause & effect matrix

Page 130: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 130

Carnegie MellonSoftware Engineering Institute

Tooltip: 7 Basic Tools

Description• Fundamental data plotting and diagramming tools

- Cause & Effect Diagram- Histogram- Scatter Plot- Run Chart- Flow Chart- Brainstorming- Pareto Chart

• The list varies with source. Alternatives include- Statistical Process Control Charts- Descriptive Statistics (mean, median and so on)- Check Sheets

Page 131: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 131

Carnegie MellonSoftware Engineering Institute

7 Basic Tools: Usage

Plot trends over time

Examine relationships among measures

Explore cause-effect relationships

Prioritize issues

Determine stability and capability of processes

Page 132: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 132

Carnegie MellonSoftware Engineering Institute

7 Basic Tools: Chart Examples 1Defects Removed By Type

0102030405060708090

20 40 80 50 10 100 30 60 70 90

Defect Type Codes

Qu

anti

ty

Mean Time To Repair

0.00

20.00

40.00

60.00

80.00

100.00

1 2 3 4 5 6 7 8 9 10

Product

Tim

e (m

inu

tes)

Removed in test

Injected in Design

Pareto Chart

Run Chart

Page 133: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 133

Carnegie MellonSoftware Engineering Institute

7 Basic Tools: Chart Examples 2

Scatter Plot

Histogram

34 36 38 40 42 44 46 48 50 52 54 5602468

101214161820

Num

ber

of D

ays

Product-Service Staff Hours

Development Effort (personmonths) vs. Size (KSLOC)

0

100

200

300

400

500

0 10 20 30 40 50 60 70

Size (thousands of executable source lines)

Eff

ort

(p

erso

n m

on

ths)

Page 134: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 134

Carnegie MellonSoftware Engineering Institute

Softwarenotrequiredreliability

Methods Environment

Management PeopleMinimum

applicationexperience

No test specialists

No formal inspectionprocess

No formal defecttracking mechanism

Test beds to not matchuser configuration

No risk management

Inadequate testresources

Unrealisticcompletion date

7 Basic Tools: Cause & Effect

[Westfall]

Traditional diagram

Variation

Problem Process

Subcause A1 Cause A

Process

Cause C

Cause

B

Cause

D

Subcause B1

Subcause C1

Subcause C2

Subcause D1

Subcause D2

Subsu

bcau

se

A1.

1

Subsu

bcau

se D

2.1

Page 135: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 135

Carnegie MellonSoftware Engineering Institute

7 Basic Tools: Chart Variations

Box & Whisker Plotfor assessment data

CMMI BENCHMARK SEI Level 3

0

10

20

30

40

50

60

70

80

90

100

Req

uire

men

ts

Dev

elop

men

t

Tech

nica

l

Sol

utio

n

Pro

duct

Inte

grat

ion

Ver

ifica

tion

Val

idat

ion

Org

aniz

atio

n

Pro

cess

Foc

us

Org

aniz

atio

nal

Pro

cess

Def

initi

on

Org

aniz

atio

nal

Tra

inin

g

Inte

grat

ed

Pro

ject

Man

agem

ent

Ris

k

Man

agem

ent

Dec

isio

n

Ana

lysi

s &

Res

olut

ion

x

90th percentile75th percentilemedian: 50th percentile25th percentile10th percentile

mean

Page 136: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 136

Carnegie MellonSoftware Engineering Institute

7 Basic Tools: Chart Variations

Boxplot variations:• cost and schedule variance over time to show

organizational average and also variability• prioritized features for a new process technology rollout:

a combination “pareto-boxplot”

1

2

3

4

5

potential (specific) process features

Nee

ds

rati

ng

Page 137: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 137

Carnegie MellonSoftware Engineering Institute

Tool Tip: Run Charts

Description• Time series plot that can be used to examine data

quickly and informally for trends or other patterns thatoccur over time.

Tips• Run charts are not control charts - don’t over-interpret

them.• If observations are not all similarly spaced in time,

there may be more than one process influencing whatappears to be a single run.

Page 138: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 138

Carnegie MellonSoftware Engineering Institute

Assumptions• ordered by time• single underlying process• consistent operational definitions

Run Charts: Example

Observation Number

0 2 4 6 8 10 12 14 16-40481216

ObservedValue

Time

Page 139: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 139

Carnegie MellonSoftware Engineering Institute

Tool Tip: Statistical ProcessControl (SPC) ChartsDescription• run chart with statistical limits

Usage• let you know what your processes can do, so that you can

set achievable goals.• provide the evidence of stability that justifies predicting

process performance.• separate signal from noise, so that you can recognize a

process change when it occurs.• distinguishes common cause variation from special cause

variation• point you to fixable problems and to potential process

improvements

Page 140: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 140

Carnegie MellonSoftware Engineering Institute

Control Chart Basics

Lower Control Limit(LCL)

Upper Control Limit(UCL)

SpecificationLimits

LimitsControl Limits Determined by Process Performance Measurements

(Voice of the process)

Specification Limits Set by customer, engineer, etc.(Voice of the customer)

Event Time or Sequence

Mean or Center Line (CL)

CL + 3σσσσ

CL - 3σσσσ

Page 141: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 141

Carnegie MellonSoftware Engineering Institute

SPC: TipsReacting to Common Cause Variation as if it were Special CauseVariation cannot improve the process and will result in increasedvariability.

Check your data distributions!• Defect counts are never negative and may not be normally

distributed.

Set specification limits based on statistics, engineeringknowledge and risk of escaping defects.

Implement charts “in the field” only when you have correctiveaction guidelines. Otherwise, work the charts offline.

Always look at the average (or individual) and range charts!

Page 142: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 142

Carnegie MellonSoftware Engineering Institute

SPC: Example

0 5 10 15 20 25 308

12

16

20

24

28

32

LCL = 8.49

CL = 20.04

UCL = 31.6

Week of System Test

0 5 10 15 20 25 3002

4681012

14

CL = 4.35

UCL = 14.2

MovingRange

Number of Unresolved

Problem Reports

Page 143: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 143

Carnegie MellonSoftware Engineering Institute

SPC: Rules for DetectingProcess Instabilities

TEST 3:4 out of 5 signals in zone B

TEST 1:Single point outside of zone C

TEST 2:2 out of three beyond zone B

TEST 4: 8 successive points on same side of centerline

0 5 10 15 20 25

-20

-10

0

10

20

30

40

50

60

.

LCL

CL

UCL

Indi

vidu

als

Subgroup No.

Zone A

Zone A

Zone B

Zone C

Zone B

Zone C

Page 144: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 144

Carnegie MellonSoftware Engineering Institute

SPC: Anomalous PatternsRapid Shift in Level

Unstable Mixture

Stratification Trends

Cycles Pattern Bunching or Grouping

Page 145: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 145

Carnegie MellonSoftware Engineering Institute

Tooltip: Scatter Plots

Description• Display empirically observed relationships between

two measures.

Usage• A pattern in the plotted points may suggest that the

two measures are associated.• When the conditions warrant, scatter diagrams are

natural precursors to regression analyses.• Scatter diagrams are rarely used as the only means of

characterizing the relationship between twomeasures.

• Does not predict cause and effect relationships

Page 146: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 146

Carnegie MellonSoftware Engineering Institute

Scatter Plot: Example with Line

Y=2.08112+0.00435 x

0 200 400 600 800 1000 1200 1400 1600

0

2

4

6

8

10

12

14

16

sloc

Inspection Effort versus Inspected Slocef

fort

R = 0.45031

R-Squared = 0.20278

R = Correlation Coefficient

R-Squared = Fraction of variability explained by the model

Page 147: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 147

Carnegie MellonSoftware Engineering Institute

Tooltip: Histograms

Description:• Display the empirically observed distribution for

values of a measure.• Show the frequency of each value and the range of

values observed.

Usage:• Inappropriate unless the measure can be treated as a

continuous scale.

Page 148: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 148

Carnegie MellonSoftware Engineering Institute

Histograms - Examples

34 36 38 40 42 44 46 48 50 52 54 5602468

101214161820

Num

ber

of D

ays

Product-Service Staff Hours

Time to fix a defect found after development

Numberof defects

Look for multimodaldistributions

May point tomultiple processes

Page 149: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 149

Carnegie MellonSoftware Engineering Institute

Tooltip: Bar Charts

Description

Usage• Similar in many ways to histograms• Do not require that the measure be treated as a

continuous variable.• Bar charts are much more frequently used than

histograms.

Page 150: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 150

Carnegie MellonSoftware Engineering Institute

Bar Charts: Example

Defect Analysis

0

5

10

15

20

25

30

35

40

45

req’tsanalysis

design code unittest

componenttest

systemtest

customeruse

Software Activity

Per

cent

Def

ects Injected

Found

Escaped

Page 151: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 151

Carnegie MellonSoftware Engineering Institute

Bar Charts: A Word of Caution

Because they are so flexible, it is easy to get carriedaway with bar charts.

Defect Counts by Project and Type

0

5

10

15

20

25

A B C D E F

Project Identifier

Num

ber o

f Def

ects

Syntax Error

Ambiguous Requirements

Interface Error

Missing Function

Memory

Page 152: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 152

Carnegie MellonSoftware Engineering Institute

Tooltip: Pareto ChartsDescription:• Special form of a bar chart that ranks categories of data in

terms of their amounts, frequency of occurrence, oreconomic consequences.

Usage:• Ranking of problems, causes, or actions, etc., must be

orthogonal• Interpretation based on the “80/20 rule”

If the 80/20 rule does not apply• Consider counting a different attribute, while maintaining the

same stratification.• Consider re-stratifying - use a different classification

scheme.• Consider a different attribute of the process under study.

Page 153: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 153

Carnegie MellonSoftware Engineering Institute

Pareto Charts: ExampleProfile of Defects Found in Product XYZ

0

15

30

45

Syntax Error AmbiguousRequirements

Interface Error Missing Function Memory

Defect Type

Num

ber

of D

efec

ts

0%

25%

50%

75%

100%

Cum

ulat

ive

Per

cent

age

Number of Defects

Cumulative Percentage

Page 154: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 154

Carnegie MellonSoftware Engineering Institute

Tooltip: Cause & Effect Diagrams

Description*Also called “Fishbone” or “Ishikawa” diagrams)

Usage• Allow you to probe for, map, and prioritize a set of

factors that are thought to affect a particular process,problem, or outcome.

• Help elicit and organize information from people whowork within a process and know what might becausing it to perform the way it does.

Page 155: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 155

Carnegie MellonSoftware Engineering Institute

Cause & Effect Diagram: Tips

You can spend a lot of time discussing what the principalcauses should be (the main branches) if you are notcareful.• May need to work on the categorization of causes in

advance• May want to just use generic cause categories like;

Materials, Equipment, Operators, Procedures andEnvironment.

Page 156: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 156

Carnegie MellonSoftware Engineering Institute

Cause & Effect Diagram: Example

It takes too long to process our software change requests

Resolution ClosureCollection Evaluation

problem reportsnot logged in properly

information missingfrom problem reports

cannot determinewhat needs to be doneto fix the problem

cannot replicateproblem

cannot isolate softwareartifact(s) containing the problem

change control board meets only once a week

changes decisions not released in atimely manner

delays in approving changes

delays in shippingchanges and releases

must reconfigurebaselines

takes time to make changes

delays enroute

Page 157: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 157

Carnegie MellonSoftware Engineering Institute

Softwarenotrequiredreliability

Methods Environment

Management PeopleMinimum

applicationexperience

No test specialists

No formal inspectionprocess

No formal defecttracking mechanism

Test beds to not matchuser configuration

No risk management

Inadequate testresources

Unrealisticcompletion date

Basic Tools: Cause & Effect

[Westfall]

Traditional diagram

Variation

Problem Process

Subcause A1 Cause A

Process

Cause C

Cause

B

Cause

D

Subcause B1

Subcause C1

Subcause C2

Subcause D1

Subcause D2

Subsu

bcau

se

A1.

1

Subsu

bcau

se D

2.1

Page 158: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 158

Carnegie MellonSoftware Engineering Institute

Tool Tip: Cause & Effect Matrix

Description• method to determine possible causes of variation in the

process and to feed future experimental designs

Purpose• to organize problem-solving efforts when there are

multiple responses involved• to prioritize the number of factors to study• to build team consensus about what is to be studied

[Hexsab 02]

Page 159: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 159

Carnegie MellonSoftware Engineering Institute

Cause & Effect Matrix: Usage

When to use:• team is overwhelmed with the number of variables affecting

process• not possible to experiment with all of the variables – need to

narrow down the list• team is struggling with which factors may have the biggest

impact• it is not clear how each factor impacts customer requirements

Feeds other tools:• Failure Mode and Effects Analysis• Data collection plans• Experimentation• Control plans

[Hexsab 02]

Page 160: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 160

Carnegie MellonSoftware Engineering Institute

Cause & Effect Matrix: TermsProcess: The combination of people, equipment, materials, methodsand environment that produce output (product or service). It is arepeatable sequence of activities with measurable inputs and outputs.

Parameter: A measurable characteristic of a product or process.

Process Parameter: A measurable characteristic of a process that mayimpact product performance but may not be measured on the product.(The “x.”)

End-Product Parameter: A parameter that characterizes the product atthe finished product stage. (The “Big Y.”)

In-Process Product Parameter: A parameter that characterizes theproduct prior to the finished product stage. It is measured on the productupstream and is the result of a process step. (The “little y.”)

Input Variable: An output from other processes. (Neither x’s or y’s.)

[Hexsab 02]

Page 161: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 161

Carnegie MellonSoftware Engineering Institute

Cause & Effect Matrix: ProcedureIdentify the y’s from process map.

Rate the y’s on a scale from 1-10.• Involve the “customers” to determine the ratings.• Ratings are relative.

List the process steps and all of the x’s from the process map.

Rate the relationship of each x to each y on a 0, 1, 3, 9 scale.0 = No relationship between x and y1 = Remote relationship between x and y3 = Moderate relationship between x and y9 = Strong relationship between x and y

For each x• Multiply each relationship rating by the corresponding y rating• Sum the products

Use the summations to rank and select x’s for future experiments orfocused efforts[Hexsab 02]

Page 162: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 162

Carnegie MellonSoftware Engineering Institute

Cause & Effect Matrix: Format

Process steps X's X relationship to Y Sum

Y's:

Y ratings:

[Hexsab 02]

Page 163: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 163

Carnegie MellonSoftware Engineering Institute

Tool Tips Part 2: Beyond Basics

Overview (description, procedure, tips, examples) for• capability• voice of the customer• management by fact• process mapping

Page 164: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 164

Carnegie MellonSoftware Engineering Institute

Tooltip: Process CapabilityDescription• When a process is in statistical control with respect to

a given set of attributes, we have a valid basis forpredicting, within limits, how the process will performin the future.

Usage• Addresses predictable performance of a process

under statistical control.• For a process to be capable, it must meet two criteria:

- The process must be brought into a state ofstatistical control for a period of time sufficient todetect any unusual behavior.

- The capability of the process must meet or exceedthe specifications that have to be satisfied to meetbusiness or customer requirements.

Page 165: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 165

Carnegie MellonSoftware Engineering Institute

Histogram Reflecting ProcessCapability

36 38 40 42 44 46 48 50 52 540

2

4

6

Product Service Staff-Hours

Fre

quen

cy C

ount

LCL= 36.08 UCL=54.04CL= 45.06

Voice of the Process

Page 166: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 166

Carnegie MellonSoftware Engineering Institute

Process Capability vs. CapableProcess

25 30 35 40 45 50 550

2

4

6

Product Service Staff-Hours

Fre

quen

cy C

ount

LCL= 36.08 UCL=54.04CL= 45.06

LSL= 30 USL= 50Target = 40

Voice of the customer

Voice of the Process

Page 167: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 167

Carnegie MellonSoftware Engineering Institute

Tool Tip: Voice of the Client (VOC)

Description• a method to describe the stated and unstated needs or

requirements of the customer• can captured in a variety of ways: direct discussion or

interviews, surveys, focus groups, customerspecifications, observation, warranty data, field reports,complaint logs, etc.

[isixsigma]

Page 168: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 168

Carnegie MellonSoftware Engineering Institute

VOC: UsageFeeds Quality Function Deployment (QFD)

Risks• anecdotal, not quantitative• difficult to get “the right answer”• humans are PERFECT FILTERS!• it is very easy to induce bias in VOC

Tips• use existing information with care – it may be biased or too

narrowly focused• always use more than one source• customer visits allow direct discussion and observation• customer visits allow immediate follow-up questions and

unexpected lines of inquiry

Page 169: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 169

Carnegie MellonSoftware Engineering Institute

VOC Interviews: Procedure 1• Define the customer

• Select customers to interview

- Always interview more than one

• Plan interview

- Develop a checklist/guideline

- Teams of 3: “Moderator,” “Scribe,” “Observer”

• Conduct interviews

- Customer statements & observations need to berecorded VERBATIM

- Keep asking “why”

Page 170: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 170

Carnegie MellonSoftware Engineering Institute

VOC Interviews: Procedure 2Create VOC table.• Interpret verbatim statements into new meanings.• Document source of VOC or re-worked VOC.

- “I” if internally changed or generated (by team)- “E” if externally generated (by customer) or not

changed by team• Classify each statement as:

- a real need Ł feeds QFD- a technical solution- a feature requirement Ł feeds QFD- not a true need (e.g., cost issue, complaint,

technology, hopes dreams, etc.)• Quantify, Analyze, Prioritize

Page 171: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 171

Carnegie MellonSoftware Engineering Institute

VOC: Example TableNew process initiative under consideration• interview statements recorded verbatim and classified• column added for keyword sorting

Further development• “interpreted” comments about the organization’s true

goals, the overlap of initiatives (and so on)• evaluation for common themes• additional data collection may be needed

Customer comment Interpreted, reworded I/E perc

eptio

n,

expe

rienc

e, c

onte

xt

barr

ier

root

issu

e?

resu

lts, s

ucce

ss

need

solu

tions

Keyword for sorting

We are already at maturity level x, so why do more? E $$$$ $$$$

competing initiatives

Classification

Page 172: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 172

Carnegie MellonSoftware Engineering Institute

VOC: Analysis

Prioritization Method

Customer Time

Preparation Complexity

Analysis Complexity

Quality of Resulting

Prioritization

Number of customers

needed

Number of Needs to Prioritize Recommend

Frequency of Response short low low low large large NO

Constant Sum medium medium medium medium medium small YesRating short low low medium medium med-large YesSimple Ranking medium low low medium medium small-med YesQ-Sort short low low medium medium large YesPaired Comparison long medium high high large small YesRegression short medium high high large small-med Yes

Page 173: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 173

Carnegie MellonSoftware Engineering Institute

Tool Tip: Management by Fact (MBF)

Description• a concise summary of quantified problem statement,

performance history, prioritized root causes andcorresponding countermeasures for the purpose ofdata-driven project and process management

Management by Fact• uses the facts• eliminates bias• tightly couples resources and effort to problem-solving

Page 174: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 174

Carnegie MellonSoftware Engineering Institute

MBF: ProcedureIdentify and select problem• use “4 Whats” to help quantify the problem statement• quantify gap between actual and desired performance

Determine root cause• separate beliefs from facts• use “7 Basic Tools”• use “5 Whys”

Generate potential solutions and select action plan• Must be measurable/sustainable• Specific/assignable ownership• Understand expected results from each action

Implement solutions and evaluate• Compare data before and after solution• Document actuals and side effects• Compare with desired benchmark

Page 175: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 175

Carnegie MellonSoftware Engineering Institute

MBF: 4 WhatsCustomer satisfaction scores are too low.• What is too low?

Compared to best-in-class benchmark of 81%, we are at 63%.• What is the impact of this gap?

It represents lost revenue and earnings potential?• What is the correlation between customer satisfaction and

revenue?Each percent of customer satisfaction translates to 0.25 percentof market share which equals $100M US revenue.• What is the lost potential?

Final problem statement:

Customer satisfaction is 18% lower than best-in-class benchmark,which corresponds to a potential lost revenue of $1.8B US.

Page 176: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 176

Carnegie MellonSoftware Engineering Institute

MBF: 5 WhysThe marble in the Jefferson Memorial was deteriorating.Why?The deterioration was due to frequent cleanings with detergent. Why?The detergent was used to clean bird droppings from localsparrows. Why?The sparrows were attracted by spiders. Why?The spiders were attracted by midges. Why?The midges were attracted by the lights.

Solution: Delay turning on the lights until later at dusk.

Page 177: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 177

Carnegie MellonSoftware Engineering Institute

MBF: FormatFACTUAL STATEMENT OF PROBLEM, PERFORMANCETRENDS & OBJECTIVES

Graph ofperformance over time

Graph of supportiveor more detailed information

Prioritization &Root Cause

List of gaps inperformance and trueroot cause

Counter Measures &Activities

List of specific actions, whohas ownership and due date

Impact, Capability

List of expectedbenefits and impact ofeach countermeasure

Page 178: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 178

Carnegie MellonSoftware Engineering Institute

MBF: ExampleProblem Statement

Prioritization & Root Cause Counter Measures & Activities Who WhenExpected

Benefit/ImpactLarge Quantity of Syntax & Similar defects Clarify type definitions jms $4/30/2001that are repaired in <10 minutes on avg Improve subcategory data collection jms $4/30/2001Goal is 50% reduction in time, relative to historical data

Build a cause & effect diagram to be used for next round of analysis, improvement planning jmsIncrease correction efficiency by seeking all occurrences of a defect upon the detection of the first occurrence jms $4/30/2001Increase and log (new) usage of off-line programs to test small pieces of functionality jmsCreate & Use a syntax checklist jms $4/30/2001

"Big Hitter" (>10 minutes) defects involve Time breaks: phase completion & every hour jms $4/30/2001a variety of errors that escape to testing. Conduct a phase check prior to moving on jms $4/30/2001Design-injected and Test-removed errors fall into this category

Increase and log (new) usage of off-line programs to test small pieces of functionality jms $4/30/2001

Goal is 25% reduction in time, relative to historical data

Improve subcategory data collection to use for developing a more directed design review jms $4/30/2001Build a cause & effect diagram to be used for next round of analysis, improvement planning jms

Productivity

0

10

20

30

1 2 3 4 5 6 7 8 9 1

Program Number

LOC

/Hou

r

Customers A, B and C, representing x% of market share, are facing budget/cost constraints and require a 10% cost reduction in our product line XYZ in order to continue doing business with us. Baseline data shows that 33% of software development time is spent detecting and correcting defects.

Goal: 21 LOC/hr

About 1/2 of goal. In normalized terms, ~1 LOC/hr increase

About 1/2 of goal. In normalized terms, ~1 LOC/hr increase.

Defect Density

050

100150200250300350400450

1 2 3 4 5 6 7 8 9 10

Program

Def

ects

/KLO

C

Change in Defect Density

0.020.040.060.080.0

100.0120.0140.0160.0180.0200.0

R- Des

Rvw

R- Cod

e

R- Cod

e Rvw

R- Com

pile

R- Tes

tInj

- des

ignInj

- Cod

eIn

j- Tes

t

Def

ects

/KL

OC

Pre-Improvement

Post-Improvement

Trend of fix time / total time has similar pattern

Page 179: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 179

Carnegie MellonSoftware Engineering Institute

Tool Tip: Process Mapping

Description• representation of major activities/tasks, subprocesses,

process boundaries, key process inputs, and outputs

INPUTS(Sources ofVariation)

OUTPUTS(Measures ofPerformance)

• Perform a service• Produce a Product• Complete a Task

PROCESS STEP

A blending ofinputs to achieve

the desiredoutputs

• People• Material• Equipment• Policies• Procedures• Methods• Environment• Information

Page 180: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 180

Carnegie MellonSoftware Engineering Institute

Mapping: UsageFeeds other tools• Cause and Effects Matrix• Failure Modes and Effects Analysis (FMEA)• Control Plan Summary• Design of Experiments (DOE) planning

Tips for mapping current processes• Go to the actual place where the process is performed.• Talk to the actual people involved in the process and get the

real facts.• Observe and chart the actual process.• Consider creating “as is” and then “to be” maps.

Reality is invariably different from perception - few processeswork the way we think they do!

Page 181: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 181

Carnegie MellonSoftware Engineering Institute

Process Mapping: Basic ProcedureList inputs and outputs

Identify all steps in the process: value-added and non-value-added

Show key outputs at each step (process and product)

List key inputs and classify process inputs

Add the operating specifications and process targetsfor the controllable and critical inputs

Page 182: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 182

Carnegie MellonSoftware Engineering Institute

Process Mapping: Example

DetectDefects

CorrectDefectsPlan Trouble-

shoot!!!! Artifacts toinspect#, ! Artifact size# Reviewers" Data repository

• Defect Log• Record of plan

• Direct Cause• Root Cause

• CorrectiveAction

!!!! Review Rate!!!! , " Checklists#, " Inspectionmethod, procedure! Proficiency! Taxonomyinterpretations

What would youlist?!!!!Defect attributes# Proficiency# Effort# Tools

What wouldyou list?!!!! Correctionaction" Config control #, Effort

Inspection

Rework

!!!! Critical Inputs! Noise

" Standard Procedure# Control Knobs

Inspection process from earlier illustration

Page 183: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 183

Carnegie MellonSoftware Engineering Institute

Process Mapping: Value MapIdentify the process to map

Identify the boundaries

Create input-process-output for the critical processes

Create the process map

Color code each step identifying value- green = value added- red = non value added- yellow = non value added but necessary

• Identify hand-off points, queues, storage, and rework loopsin the process

• Quantitatively measure the map (throughput, cycle time, andcost)

• Validate map with process owners

Page 184: Data Analysis Dynamics - Carnegie Mellon University · PDF fileShewhart divides variation into two ... machines, material, environment, and methods). • Assignable Cause ... - represents

© 2003 by Carnegie Mellon University Version 1.0 page 184

Carnegie MellonSoftware Engineering Institute

Value Mapping: Change Request

Request(Need Identified)

SelectMethod/Path

ProvideAdditionalGuidance

Gather MoreInformation

FeedbackPreliminary

RequestAccepted?

AdditionalGuidanceNeeded?

Yes

No

No

No

YesYes

InitialAssessment*

RightDecision?

Forward toBoard

Yes

No

5% Rework

*Initial Assessment will:• Determine Impact Assessment• Identify Stakeholder• Coordinate with Product/Process Owner• Perform Impact Analysis

AssessmentCoordination

Validate

non value addednon value added but requiredvalue added

RequestValidated

?