View
222
Download
0
Category
Tags:
Preview:
Citation preview
Research in E
mpirical S
oftware E
ng.R
esearch in Em
pirical Softw
are Eng.
Reduced-Parameter Modeling (RPM) for Cost Estimation Models
Zhihao Chen
zhihaoch@cse.usc.edu
2
Research in E
mpirical S
oftware E
ng.R
esearch in Em
pirical Softw
are Eng.
Reduced-Parameter Modeling (RPM)
What Is RPM?
How Does It Work?
Why Is It Useful?
What Should You Not Use It?
3
Research in E
mpirical S
oftware E
ng.R
esearch in Em
pirical Softw
are Eng.
What is RPM?
• A machine learning technique for determining a minimum-essential set of cost model parameters
• Using an organization’s particular project data points
• Assuming that the organization’s project data points will be representative of its future projects
4
Research in E
mpirical S
oftware E
ng.R
esearch in Em
pirical Softw
are Eng.
Why Is It Useful?
• Simplifies cost model usage and data collection
• Often improves estimation accuracy– Eliminates highly-correlated, weak-
dispersion, or noisy-data parameters
• Identifies organization’s most important cost drivers for productivity improvement
5
Research in E
mpirical S
oftware E
ng.R
esearch in Em
pirical Softw
are Eng.
Organizations Have Different Data Distributions
Correlation Analysis of COCOMO81 63 Projects
Correlation Analysis of NASA Project02 22 Projects
6
Research in E
mpirical S
oftware E
ng.R
esearch in Em
pirical Softw
are Eng.
Under-sampling: A Case Study for CPLX in NASA 60
If the even higher complexity projects were the most important ones to NASA, redefine the complexity for the highly complex NASA systems.
Is software complexity a useful cost driver in this domain?
•In NASA60 data set, CPLX=high (usually);
•Little information in this parameter
•Consider dropping the parameter
2
5
50
2 1
0
5
10
15
20
25
30
35
40
45
50
Number
Low Nomi nal Hi gh Very_Hi gh Ext ra_Hi gh
CPLX i n 60 NASA COCOMO I proj ects
LowNomi nalHi ghVery_Hi ghExt ra_Hi gh
7
Research in E
mpirical S
oftware E
ng.R
esearch in Em
pirical Softw
are Eng.
How Does It Work – Technically?
• Organization collects critical mass of similar project data
• RPM tool starts with Size, tests which additional parameter produces most accurate estimates– By calibrating many times to random
data subsets, testing on holdout data points
• RPM tool continues to add next best parameters until accuracy starts to decrease– This produces best RPM for the data
set
8
Research in E
mpirical S
oftware E
ng.R
esearch in Em
pirical Softw
are Eng.
Real and Large Industry Data• Research is supported by CSE and NASA/JPL
• Two datasets are public and available from PROMISE Software Engineering Repository - http://promise.site.uottawa.ca/– 63 projects in Cocomo81/Software cost estimation– 60 projects NASA/Software cost estimation
• Two datasets from COCOMO II database– 161 projects in COCOMO II 2000– 119 projects in COCOMO II 2004
• More data are coming– 30 more projects from JPL
• The techniques can be applied and basic results generalized to any model
9
Research in E
mpirical S
oftware E
ng.R
esearch in Em
pirical Softw
are Eng.
Exampl e Resul t from NASA Proj ect02 dateset
0.00
20.00
40.00
60.00
80.00
100.00
LOC FS01 FS02 FS03 FS04 AllParameterSubset
Perc
enta
ge
MeanSD
Example Result
TURNLEXP
TI MEMODP
DATATOOLSCED
RELYVEXPCPLX
AEXPPCAPVI RTACAPSTOR
LOC FS01 FS02 FS03 FS04 Al lMean 85.24 92.86 97.14 94.76 84.76 15. 71SD 10.93 11.10 6.92 8.78 12. 40 12. 07
10
Research in E
mpirical S
oftware E
ng.R
esearch in Em
pirical Softw
are Eng.
What Should You Not Use It
• Do not subtract the parameters are important.– In many domains, expert business
users hold in their head more knowledge than might be available in historical databases
• Do not subtract parameter you still might need them. – User needs some of the subtracted
parameters to make a business decision.
11
Research in E
mpirical S
oftware E
ng.R
esearch in Em
pirical Softw
are Eng.
Published Results
• Chen, Menzies, Port, and Boehm. "Finding the Right Data for Software Cost Modeling", IEEE Software 11/2005.
• Menzies, Port, Chen, and Hihn. "Specialization and Extrapolation of Software Cost Models", ASE 2005, Long Beach, California, 11/2005.
• Menzies, Port, Chen, Hihn, and Stukes. "Validation Methods for Calibration Software Effort Models", ICSE 2005, 05/2005, St. Louis, Missouri
• Yang, Chen, Valerdi, and Boehm. "Effect of Schedule Compression on Project Effort", ISPA 2005, 06/2005, Denver, Colorado
• Chen, Menzies, Port, and Boehm. "Feature Subset Selection Can Improve Software Cost Estimation Accuracy", PROMISE 2005, 05/2005, St. Louis, Missouri
• Menzies, Chen, Port, and Hihn. "Simple Software Cost Analysis: Safe or Unsafe?", PROMISE 2005, 05/2005, St. Louis, Missouri
Some results have been recently published on the use of data mining and machine learning techniques to analyze cost estimation models and data
All papers are available from http://www.ssei.org/chen/papers/papers.html
12
Research in E
mpirical S
oftware E
ng.R
esearch in Em
pirical Softw
are Eng.
Question and Answer
Recommended