Design and Analysis of Computer Experiments

1

Design and Analysis of Computer Experiments

Nathan Soderborg

DFSS Master Black Belt

Ford Motor Co.

WCBF DFSS Conference Workshop

Feb 9, 2009

February 2009 N. Soderborg 2

Outline

� Background: Six Sigma Context

� Foundation: Useful Computer Models

� Deterministic vs. Probabilistic Approaches

� Monte Carlo Simulation

� Design of Computer Experiments

� Analysis of Computer Experiments

� Case Studies

3

Background: Six Sigma Context


Design for Six Sigma

� A scientific PD approach that leverages Six Sigma culture

� A means to re-instill rigorous deductive and inductive reasoning in PD processes…

� Definition of objective engineering metrics with targets correlated to customer needs and desires

� Characterization of product performance using transfer functions to assess risks

� Optimization of designs through transfer function knowledge and identification of counter-measures to avoid potential failure modes

� Verification that designs perform to targets and counter-measures eliminate potential failure modes

DefineCTS’s

CharacterizeSystem

OptimizeProduct/ Process

VerifyResults

(Ford DCOV)


Definition of a Transfer Function

� A mathematical model that relates an output measure Y to input variables (x’s):

Y = F(y1, …, yn), y1 = f(x1, …, xn), etc.

� Why “transfer” function?

(“function” or “equation” would suffice)

� For purposes of today’s discussion, transfer functions are computer models


Where Transfer Functions Come From

� Deduction: using first principles to characterize system physics, geometry, or material properties� Physics Equations that Describe Function

e.g., V=IR, f=ma, f=kx, k.e.= ½mv2

� Finite Element and other Analytic Models, e.g., computer models not expressible in closed-form equations

� Geometric Descriptions of Parts and Systemse.g., equations from schematics based on reverse engineering,

lumped mass models, drawings & prints; variation/tolerance stack-up

� Induction: analyzing experimental, empirical data� Directed Experimentation

e.g., response surface or multivariate regression equation from DOE

using analytic models or hardware

� Analysis of Existing Datae.g., regression to enhance informed observations

TF based on

“First Principles”

TF based on

Empirical Data

Incre

asin

g D

egre

e o

f A

ppro

xim

ation


What Transfer Functions are Used For

� In early phases of a project, a typical goal is to develop or improve transfer functions that

� Correlate customer needs to objective metrics

� Provide a formula for system output “y” based on input “x’s”

� In latter phases, a typical goal is to exploit those transfer functions to identify optimal robust designs, i.e., achieve performance

� On target

� With minimal variability

� At affordable cost

This requires probabilisticcapability & analysis, i.e., being able to represent the output of the model as a probability distribution

y

Target

Original

Design

Target

y

Optimized

Design

8

Foundation: Useful Computer Models

“All models are wrong; some are useful.”

--George Box


Characteristics of a Good Model

� Fits Data

� For a deductive, first principles based model: Fits data collected from physical tests

� For an inductive, statistical model: Fits the data sample used to construct the model

� Predicts Well

� Predicts responses well at points not included in the data sample or regions of space used to construct the model

� Interpolates well

� Extrapolates well

Did we do the modeling right?



� Parsimonious (conceptually)

� Is the simplest of competing models that adequately predicts a phenomenon

� Note: introducing more terms in a model may improve fit, but over-complicate the model (and impair prediction)

� Parsimonious (from a business perspective)

� Incurs reasonable development cost compared to the knowledge and results expected

� Incurs containable computation costs

Did we do the modeling right?



� Interpretable

� Correctly applies & represents physics, geometry, & material properties

� Provides engineering insight; answers the desired questions

� Contains terms that are fundamental (e.g., dimensional analysis)

� Has clear purpose & boundaries (domain can be small and still useful)

Did we model the right things?


P-Diagram

� In engineering, computer models should help us simulate or predict performance under real-world conditions

� We would like to account for variability in build, environment, and usage (aka: noise)

� A high-level framework for this is the Parameter Diagram (see Phadke, Davis)

error states/failure modes

SignalxS

xC

ySystem

y = f (xS,xC ,xN )Ideal Function

Noise Factors

Control Factors

y

xN


Challenge of Representing Noise in Models

� Models based on first principles will include factors from physics, such as:

� Loads, energy transfer

� Properties of materials

� Dimensions and geometries

� Often the particular noise factors we identify are not factors in our model—but are there “surrogates?”

� Try to understand and estimate the effect of variability in noise factors on factors included in the model

Typical Noise Factor Types:

• Manufacturing variation

• Deterioration over time

• Customer usage/duty cycles

• External environment

• System interactions

Typical Model Factor Types• Load/Energy Transfer

• Material Properties

• Geometry & Dimensions

Translate effects of variation in these into variation in these.

14

Deterministic vs. Probabilistic Approaches


Levels of Design Refinement

� Trial & Error

� Hand Calculations

� Physical tests as needed

� Learning from experience

� Planned Physical Experimentation (DOE)

� Empirical Learning

� Statistical Analysis

� Analytic Modeling (Deterministic)

� Computer calculations

� “What-if” Scenarios

� Analytic Robust Design (Stochastic/Probabilistic)

� Designed Experiments

� Optimization (Single and Multiple Objective)

Looking for a new

design concept?

That requires a

different set of tools.


Deterministic Analysis

Inputs

Nominal or Worst Case values of

• Dimensions

• Materials

• Load

• etc...

Input Examples

• Gages

• Young’s Modulus

•Cylinder Pressure

Model Examples

•Finite Element Analysis

•Regression Equation

•Numerical Model

Outputs

Point Estimate of

• Performance

• Life

Safety Factor or

Design Margin

Output Examples

•Deflection

•Life

•Voltage

Computer Model


Deterministic Analysis

Safety Margin

Mean

Stress

Mean

Strength

Mean

Stress

Mean

Strength

Safety Margin

DESIGN 1

DESIGN 2

Which design is more reliable?


Probabilistic Analysis

DESIGN 1Smaller safety factor,

higher reliability

DESIGN 2Larger safety factor,

lower reliability

Safety Margin

Mean

Stress

Mean

Strength

Mean

Stress

Mean

Strength

Safety Margin

The interference region betweenstress and strength defines theprobability of failure. Thisdetermines reliability.

A design with a larger safetyfactor may have lower reliabilitydepending upon stress and strength variability.


Probabilistic Analysis & Optimization

For a given nominal, sample the assumed distributionaround nominal:• Dimensions

• Material properties

• Loads

• Usage

• Manufacturing

• etc.

Performance variabilityat nominal• Dispersion

• Local Sensitivity

• Reliability Assessment

Inputs Outputs

Performance variabilityacross multiple designs• Global Sensitivity

• Robust Design Direction

• Robust Design Optimization

Iteration over

multiple

nominal values

Computer Model


Probabilistic Optimization: Example

� Objective: Find fixture design that minimizes deflection, accounting for manufacturing variation

� Design Variables:

� Locator Positions (4)

� Clamp Positions (4)

� Clamp Force

1. Optimization without variability2. Optimization including variability

Deflection: smaller is better

0

20

40

60

80

100

120

900 920 940 960

Clamp Position

1 2

Range of Response Variability

1

2

Engine Block Mfg. Fixture


Challenges to Probabilistic Design

� Statistical distributions for input factors may be unknown and costly to ascertain

� Data that is available may be imprecise

� The organization may

� Lack statistical expertise or training

� Have difficulty dealing with results that include uncertainty

All of this is OK!

� The goal should not be to predict reliability precisely

� Rather, the goal is to make and demonstrate improvement

� Learn by using data from similar processes when available

� Try a variety of assumptions to convey a range of possible risks

� Use analyses to make comparisons instead of absolute predictions

22

Monte Carlo Simulation


Monte Carlo Simulation

1. Assign a probability distribution to each input variable, xi

a. Generate a “random” instance for each xi from its distribution

b. Calculate and record the value of y from substituting the generated instance from each xi into the transfer function

2. Repeat steps a & b many times (e.g., 100-1,000,000)

3. Calculate y statistics, e.g., mean, std. dev., histogram

4. Estimate success or failure probability based on targets/limits

y = f (x1, x2, … xd)

Transfer Function/Computer Model

PDF: x1 PDF: x2 PDF: xd

PDF: y

limit


Example: Door “Drop-off”

� Performance Variable: Door drop-off

� Model: Finite Element Analysis

� Design Variables

� Number of missing welds

� Materials: Door, Hinge, Reinforcement, Hinge Pillar

� Gages: Door, Hinge, Reinforcement, Hinge Pillar

� Central Gravity Location

� Trim Weight

� Design Requirement: Drop-off < 1.5 mm

� Goal of the Study

� Check if the drop-off requirement is met when variations in design variables are considered

� Explore opportunities for design improvement or cost reduction


Example: Door “Drop-off”

� Conclusions

� Design meets the drop-off requirement even when variations in gages, material, trim weight, and center of gravity are present

� Door hinge reinforcement is most dominant factor for controlling door drop-off

� May be able to reduce cost by downgaging door hinge reinforcement from 2.4 mm to 2.0 mm (must demonstrate fatigue requirements can still be met)

0.87 0.89 0.91 0.93 0.95 0.97 0.99

freq

ue

nc

y

Door Drop-off, (mm)

Door Drop-off Distribution

0

0.02

0.04

0.06

0.08

0.1

99th Percentile

= 0.9560 mm

Contribution to Variability:

Door reinforcement gage: 37%Trim weight: 14%Central gravity: 12%Hinge pillar reinforcement gage: 11%


Example: Vehicle Vibration

� Problem: Irritating Vibration Phenomenon

� Response: Seat Track Shake

� Model: Vibration Analysis Tool

� Design Variables:

� Stiffness of over 30 Bushings

� Stiffness and/or Damping of over 20 Engine Mounts

� Over 20 others: characteristics of

� Struts

� Structural Mounts

� Subframe

� Subframe Mounts

� etc.


1107.0 921.8 736.4 550.9 365.5 180.0

590

540

490

440

Engm nt1

Shake

@58

Main Effects Plot - Means for Shake@58

Stiffness

Baseline Design

Mount Type A

Robust Design

Mount Type B

Example: Vehicle Vibration


Software for Monte Carlo Simulation

� Dimensional Variation Analysis tools such as VSA® employ Monte Carlo Simulation (MCS)

� Minitab® facilitates random number generation that can be used for MCS

� Several Excel-based tools are for sale—the most widespread is Crystal Ball®, which provides a custom interface for MCS in Excel

� Allows user to identify cells as “assumptions” (x-variables) and “forecasts” (y-variables)

� Includes automatic generation of y-histograms, real-time updating with simulation, optional optimization routines

� Excel’s built-in random number generation can suffice when supplemental software is not available


Monte Carlo Simulation in Excel

Without supplemental software… -generate “random” numbers for x’s

-calculate y-values with Excel formula

-use data analysis & histogram tools to characterize y-distribution

Select…

Tools/ Data Analysis

(available from Analysis ToolPak Add-in)


Random Number Generation in Excel

No. of xi’s (columns)

No. of instances of each xi’s (rows)

Distributions (PDFs):e.g., Uniform, Normal,

Bernoulli, Binomial, Poisson, Patterned, Discrete

Seed: same seed repeats

same set of pseudo-random numbers

Output: Worksheet &

cell range where numbers are stored


Additional Distributions

If the desired distribution is not an automatic selection in Excel (but

the inverse CDF can be coded as a function)

� For each xi, generate a set of uniformly distributed random numbers between 0 and 1

� Substitute each of the numbers into the inverse CDF of xi, to obtain a set distributed as xi

� Calculate response y for each element in this new set

� Create a histogram of the set of response values y and calculate statistics

-4 -3 -2 -1 0 1 2 3 40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

CDF

of xi

1

uniform

new dist for xi

y=f(x)

frequency of y

0


Door Latch Example

� A door latch design & production team developed mathematical equations for key customer outputs (outside release effort, outside travel) using the part drawings and applying principles of trigonometry and elementary physics

� These equations were coded into an Excel spreadsheet

� The team had production data (capability, mean, standard deviation) available for the input variables in the equations

� Part dimensions

� Part edge curvature and geometry

� Spring forces

� Etc.


Door Latch Example—Spreadsheet Model

Factor data: nominal, spread…

y=D4*(SQRT(Z4*Z4+AA4*AA4)/SQRT(AB4*AB4+AC4*AC4))

Transfer Function Equation(example)


Door Latch Example—Simulation & Results

Variable d1 d2 d3 d4 d5 d6

Variable Variable Variable Variable Variable Variable

NOMINAL 15.8000 25.3000 19.0400 16.2000 17.9500 1.6800

0

10

20

30

40

50

60

70

5.78

5.84 5.

95.

966.

026.08

6.14 6.

26.

266.32

6.38

6.44 6.

56.

56

15.94288 25.36693 19.10459 16.17876 17.98629 1.626887

15.86396 25.34382 19.01017 16.17493 17.92128 1.711719

15.80739 25.13663 18.96766 16.17173 17.98257 1.670076

15.82119 25.28508 19.06259 16.17272 17.98329 1.680418

15.79549 25.29365 19.1008 16.2428 17.93715 1.64764

15.88546 25.29864 19.13414 16.21519 17.8696 1.730449

15.74837 25.50067 18.98401 16.13 17.95443 1.717248

15.80886 25.39548 18.91719 16.20812 17.87082 1.646429

15.88649 25.21907 19.07608 16.1767 18.04452 1.737719

15.61927 25.31412 19.00349 16.18069 17.95738 1.648598

15.73831 25.19662 19.08068 16.33519 17.93109 1.72337

6.258756

6.031181

6.009915

5.990283

6.373316

6.09988

5.877883

6.072534

6.251613

6.114448

6.226314

τO/S

Calculated

6.171921

. . .

.

.

.

.

.

.

LSL* USL*

d1 d2 d3 d4 d5 d6 YT

. . .

. . .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.Genera

te 1

000 R

ow

sEach

colu

mn a

separa

te d

istr

ibution

Calc

ula

te y

for

each

row

usi

ng t

he

transf

er

funct

ion

Estimate % of product outside specs based on variation assumptions for x’s

Draw histogram

*Limits are examples only

Outside Travel


Crystal Ball Example

36

Design of Computer Experiments


Motivation

If you have a computer model already, why do designed experiments to create a model of the model?

� Make design decisions faster and cheaper

� Some models are computationally intensive, time consuming, and expensive to set up and run

� Robust design analysis needs a probabilistic approach that requires many runs

� You can replace expensive models with approximations (metamodels) for carrying out Monte Carlo simulation, robust design, multi-objective optimization, etc.

� Gain insight into the original model

� Often the model cannot be expressed explicitly (it is a “black box”), e.g. Finite Element Analysis

� A metamodel can be used to efficiently understand effects, interactions, and sensitivities


A Flow for Analytic Robust Design

Develop & Document System Understanding

Understand functions/failures, P-Diagram

Design a Computer ExperimentSample for uniformity, orthogonality

Run the ExperimentEvaluate the model at each sample point

Develop a Response Surface Model

Apply advanced regression, other methods

AnalyzeSensitivities

Find important factors

Assess ReliabilityQuantify risk

Optimize for Robustness

Select a robust design

P-Diagram

Control Factors

Noise FactorsResp.Signal

Run # x1 x40

1

n x40,n

x40,1

x1,n

x1,1

x5

x1x2x3x4


Computer-based Experimentation

� The move toward Analytic Robust Design along with ever-increasing computing power has fueled the development of a new field of study over the past few decades:

Design and Analysis of Computer Experiments (DACE)

� Early computer experimenters realized that traditional Experimental Designs were sometimes inadequate or inefficient compared to alternatives

� In addition, certain non-parametric techniques for fitting the data may offer more useful models than polynomial regression


Physical vs. Computer Experiments

Physical

� Responses are stochastic (involve random error)

� Replication helps improve precision of results

� Some inputs are unknown

� Randomization is recommended

� Blocking nuisance factors may help

Computer

� Responses are determinis-tic (no random error)

� Replication has no value

� All inputs are known

� Randomization has no value

� Blocking is irrelevant



Physical

� Experiment logistics can be resource-intensive

� Minimizing the number of runs is generally desirable

� Parameter adjustment requires physical work

� The set up is usually only available for a short time period (e.g., interruption of production)

Computer

� Experiment logistics often require fewer resources

� A relatively large number of runs may be feasible

� Parameter adjustments take place in software

� The set up can be “saved”and returned to; a sequential approach is more feasible


Computer

� Relative logistical ease allows variable sampling over many levels

� Multiple level sampling allows high order, nonlinear models

� Flexible alternatives to standard arrays available, e.g., Latin Hypercube, Uniform designs, etc. (close to orthogonal)

Physical

� Logistical requirements limit sampling to 2 or 3 levels per variable

� Thus, models are typically limited to be linear or quadratic

� Typical design is a standard orthogonal array, e.g., full or fractional factorial, or response surface methods



Desirable Computer Experiment Properties

� Is Balanced

� Each factor has an equal number of runs at each level

� This weights levels equally in estimating effects

� (The number of runs will be a multiple of the number of levels of each factor)

� Captures Response Non-linearity if Present

� Two levels for a factor allow modeling of linear effects

� Modeling higher order non-linearity requires a higher number of levels per factor

� Exhibits Good Projective Properties

� Projections onto significant factor subspaces…

� Include no “pseudo-replicates”

� Avoid significant point-clustering

� Maximizes information related to significant factor behavior


Desirable Computer Experiment Properties

� Is Orthogonal or Close to Orthogonal

� Correlation between factors is zero or close to zero (column orthogonality)

� Allows effects of factors to be distinguished and estimated cleanly

� Fills the Design Space� Sample points are spread throughout the design space as evenly

or uniformly as possible

� Helps model the full range of design behavior without any assumptions on factor importance

� Improves interpolation capability for building a good metamodel

How “space filling” a design is can be measured by various criteria; in practice, seek designs that have relatively good orthogonalityand good space filling properties


Computer Experiment Design

Example Strategies

� Orthogonal Array

� Response Surface Methods

� Space Filling Designs: sampling based� Random Sample

� Latin Hypercube Sample

� Space Filling Designs: based on optimizing various criteria� Management of minimum and maximum distances between points

� Minimum “discrepancy” or departure from uniformity

� Maximum “entropy” or unpredictability

� Low Discrepancy (Quasi Monte Carlo) Sequences

Traditional Approaches


Latin Hypercube Sampling

� Latin Squares

� Latin Hypercubes are extensions of Latin Squares to higher dimensions

� An NxN Latin Square has the property that each of N symbols appears exactly once in each row and exactly once in each column

� Latin Hypercubes

� Latin Hypercube Sampling divides each dimension of the design space into Nintervals

� A set of N points is selected so that when the set is projected onto any dimension, exactly one point is in each of the intervals for that dimension

� (Kind of like Sudoku!)

BCDEA

CDEAB

DEABC

EABCD

ABCDE

••••

••••

••••

••••

••••


Latin Hypercube/Factorial Comparison

� 3 replicates for each variable projection

� 3 levels for each variable: at most quadratic effects can be captured

� No replicates for variable projections

� 9 levels for each variable: higher order nonlinear effects can be captured

9 Run FactorialTypical Physical Experiment

9 Run Latin HypercubeFeasible in Computer Experimentation

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

x1

x2

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

x1

x2

If x2 is not significant, there are essentially 3 repeat

points, for each level of x1: “Pseudo-replicates”Large regions unsampled


Latin Hypercube Example

� 4-factor, 11-level LH Design

x1 x2 x3 x4

0.8 -0.8 -0.2 -0.6

0.2 -0.4 0.2 1

-1 -0.6 -0.6 0

-0.2 -1 0.6 0.4

0.4 1 0.4 -0.4

-0.4 0.6 -1 -0.2

0 0.2 -0.4 -1

-0.6 -0.2 1 -0.8

0.6 0 -0.8 0.6

-0.8 0.8 0 0.8

1 0.4 0.8 0.2

x1 x2 x3 x4

x1 * * * 0.027 0.145 -0.009p 0.937 0.670 0.979

x2 * * * -0.073 -0.036p 0.832 0.915

x3 * * * -0.027p 0.937

x4 * * *

0.5

-0.5

0.5

-0.5

0.5

-0.5

0.5-0.5

0.5

-0.5

0.5-0.5 0.5-0

.5 0.5-0.5

x1

x2

x3

x4

Design Matrix

Pearson Correlation

Matrix Plots: 2-dimensional projections


Uniform Designs

� A uniform design is a sample of points that minimizes some measure of discrepancy

� Discrepancy is a metric quantifying “how far” the points are from being uniformly distributed

� Uniform designs allow different numbers of levels for each factor

� An existing design can be “optimized” for uniformity, e.g.

� Subset of a full factorial

� Latin Hypercube

Initial Latin Hypercube Optimized for Uniformity


0.5

-0.5

0.5

-0.5

0.5

-0.5

0.5-0.5

0.5

-0.5

0.5-0.5 0.5-0.5 0.5-0.5

x1

x2

x3

x4

Uniform Mixed-Level Design Example4-factor, 12 Run, Mixed Level Design (subset of a full factorial design)

Design Matrix

Pearson Correlation

Matrix Plots: 2-dimensional projectionsx1 x 2 x 3 x 4

0 1 1 -1

0 0 .3 3 -0 .3 0 .6

-1 0 .3 3 -1 -0 .6

0 -1 0 .3 3 -0 .6

1 1 -1 -0 .2

-1 -0 .3 1 -0 .2

1 -0 .3 -0 .3 -1

1 0 .3 3 0 .3 3 0 .2

1 -1 1 0 .6

-1 1 0 .3 3 1

-1 -1 -0 .3 0 .2

0 -0 .3 -1 1

x1 x2 x3 x4

x1 * * * 0 0 -0.1195

p 1 1 0.71139

x2 * * * -0.1333 -0.0436

p 0.67953 0.89287

x3 * * * -0.0873

p 0.78736

x4 * * *

Number of Levels for each factorx1: 3, x2: 4, x3: 4, x4: 6


Low Discrepancy Sequences

� A sequential approach to identifying experimental points

� Useful when experiments can proceed sequentially, especially if the computer model is slow

� While waiting for the model to generate the next output, the analyst can do preliminary work to decide if the results are accurate enough

� Sequences are based on Monte Carlo approaches to space-filling sequences used for integration

� Such sequences may be used to substitute for sampling from a uniform probability distribution (Quasi-Monte Carlo)

� Some sequences are specifically designed to have low discrepancy

� Roughly speaking, the discrepancy of a sequence is low if

� the number of points in the sequence falling into an arbitrary set Bis close to proportional to the measure of B, as would happen on average in the case of a uniform distribution (Wikipedia)


Low Discrepancy Sequence Example

� Examples of sequences in the literature

� Sobol Sequence

� Hammersley Sequence

� Halton Sequence

100 Monte Carlo Samples 100 Halton Samples

“Open” spaces

Point clustering


Design of Computer Experiment Summary

� Latin Hypercube

� Computationally inexpensive to generate

� Allows large number of runs and factors sampled at many levels

� Good projective properties on low dimensional subspaces

� Available in many software sources

� Number of levels = number of runs—this can be a big constraint

� Uniform Designs

� Design matrices with good orthogonality and projective properties can be refined to improve uniformity

� Algorithms apply to any number of levels and factors per level

� Not as common in software as Latin Hypercube (JMP?)

� Computation required to optimize designs grows with number of runs, factors, and levels—can consume some time for big designs


Design of Computer Experiment Summary

� Low Discrepancy Sequences

� Provides a sequence of points that fill space close to uniformly

� Allows sequential experimentation

� Typically, number of levels is same as number of runs

� Can be used in the place of “random” sequences when more uniform sampling is desired

� Slowly becoming available in commercial software; code can be downloaded from various websites

55

Analysis of Computer Experiments


Analysis of Computer Experiments

Develop & Document System Understanding

Understand functions/failures, P-Diagram

Design a Computer ExperimentSample for uniformity, orthogonality

Run the ExperimentEvaluate the model at each sample point

Develop a Response Surface Model

Apply advanced regression, other methods

AnalyzeSensitivities

Find important factors

Assess ReliabilityQuantify risk

Optimize for Robustness

Select a robust design

P-Diagram

Control Factors

Noise FactorsResp.Signal

Run # x1 x40

1

n x40,n

x40,1

x1,n

x1,1

x5

x1x2x3x4

Analysis


Generating Response Surfaces

� A traditional approach is to treat the computer model as an unknown transfer function, f

� Assume that transfer function has a particular form e.g.,

� Polynomial

� Trigonometric function, etc.

� Find coefficients β that provide the “best fit” of a function of the assumed form to the response data, i.e., y=f(ββββ,x)

� Responses from physical experiments will not match the output of the generated function exactly due to

� Experimental measurement error

� Differences in the assumed form of the function vs. the true form

� Absence of some influential factors in the experiment

� Etc.


Interpolation

� However, with computer experiments it would be desirable for experimental responses to match the output of the generated function exactly

� Computer experiments are not subject to experimental error—responses reflect the true output of the analytical model

� All input factors are known

� The challenge is to generate a metamodel that

� Matches response data AND

� Predicts well the response values at points not used to construct the metamodel

� This is an interpolation problem: a specific case of curve fitting, in which the function must go exactly through the data points


Interpolation Examples

� In the plane, 2 points determine a unique line, 3 points determine a unique 2nd order polynomial, etc.

� However, if data subject to experimental error is interpolated assuming a polynomial functional form, the result can be severe “over-fitting”

0

2

4

6

8

10

12

14

16

0 2 4 6 8 10

0

1

2

3

4

5

6

7

8

9

10

0 2 4 6 8

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0 0.1 0.2 0.3 0.4 0.5

2 points fit by a 1st

order polynomial

3 points fit by a 2nd

order polynomial

6 points, subject to

experimental error, fit by a 6th

order polynomial; a better

predictor is a best fit line


Metamodel Building

� For any given true model there are many metamodels

� To find a metamodel with good prediction capability, often the best approach is to

� Try both “best fit” and interpolation methods and combinations

� Choose a final model based on validation studies

� Generally, the metamodel is constructed as a linear combination of elements from a set of building block functions called basis functions, i.e.,

� Where βi are the coefficients and Bj are the basis functions

( ) ( ) ( ) ( ) ( )xxxxx MM

M

j

jj BβBβBβBβf +++==∑=

...ˆ1100

0


Types of Basis Functions

� Polynomials

� Splines

� Fourier Functions

� Wavelets

� Radial Basis Functions

� Kriging Functions

� Neural Networks

� Etc.

•Most powerful for low-dimension input

variables (terms grow exponentially with

dimension)

•Results are interpretable in terms of

familiar functions

•May be more natural for high-dimension

input variables

•Results may be difficult to interpret in

terms of familiar functions


Basis Function Examples� Polynomials (up to 2nd order) � Fourier basis

(1-dimension, over [0,1])

( )( )

( )

( )

( )( )

( ) ddddd

d

dd

d

dd

xxB

xxB

xB

xB

xB

xB

B

12/)1(2

2112

22

211

11

0 1

−−+

+

+

=

=

=

=

=

=

=

x

x

x

x

x

x

x

M

M

M1st order

terms

2nd order

terms

Interaction

terms

constant

( )( ) ( )( ) ( )

( ) ( )( ) ( )xkxB

xkxB

xxB

xxB

xB

k

k

π

π

π

π

2sin

2cos

2sin

2cos

1

2

12

2

1

0

=

=

=

=

=

−

M


Metamodel Building

� For interpolation, find and select sufficiently many (M) basis functions so that y=Bββββ can be solved for ββββ, i.e.,

� For “best fit,” find basis functions and that minimizes

( ) ( )( ) ( )

( ) ( )

=

M

nMn

M

M

n BB

BB

BB

y

y

y

β

β

β

M

M

L

KKK

L

L

M

1

0

0

220

110

2

1

xx

xx

xx

( ) ( ) yBBBβxBβy TTn

i

M

j

ijji By1

2

1 0

2 ˆ i.e., ,−

= =

=

−=− ∑ ∑β

β̂

Least-squares

estimator

Coefficient

Vector

Matrix of M basis functions, each

evaluated at the n sample points

Response

Vector


Example: Splines, MARS

Multivariate Adaptive Regression Splines

� An automated, adaptive regression method

� Developed by Prof. Jerome Friedman of Stanford University in early 1990s, available in commercial software from Salford Systems

� Basis functions are built from functions of the form

>−

=− +otherwise ,0

if ,)(

κκκ

xxx

<−

=− +otherwise ,0

if ,)(

κκκ

xxx

x10

κ

-κy

x10

κ

κy“knot”

Piece-wise linear “hockey stick” functions


Example: MARS

MARS Model ComponentsCombined

Result

x

1

5

y1 = 5.02196

y

0

5

y2 = 0.238230*[x1-6.035000]

x

1

5

0

5

y

x

10

5

-5

y3 = -0.977209*[5.100000-x1]

y

y4 = -1.227350*[x1-5.100000]

x

10

5

-5

y

x

1

5

0

5

y

y = y1 + y2 + y3 + y4

Notation: [.] is also used to denote (.)+


Example: Gaussian Stochastic Kriging

� Proposed by Matheron for modeling spatial data in geostatistics (1963)

� Systematically introduced to computer experiments by Mitchell (1989)

� Uses continuous basis functions of the form

where xij is the jth dimension of the ith sample point, and θi is estimated in the process of generating the response surface

( )

−− ∑=

d

j

ijjj xxθ1

2exp


Example: Gaussian Stochastic Kriging

GSK one variable example; 4 data points


Comparison of Different Methods

� The true function contains a reasonable amount of nonlinearity

� Using the same sampling strategy (30 points, LHS), compare the fits to the true function of different modeling methods:

� Polynomial Regression, RSM

� Polynomial Regression, stepwise

� MARS

� Kriging

Actual function: y = (30+x1*SIN(x1))*(4+EXP(-x2))

x1

x2

150

140

130

120

110

150

140

130

120

110

543210

5

4

3

2

1

0

x1

x2

543210

5

4

3

2

1

0

>

-

-

-

-

< 110

110 120

120 130

130 140

140 150

150

y

4.5

3.0100

y

120

140

x2

160

1.50.01.5 0.03.0

4.5x1


Polynomial Regression: RSM Fit

Function y sampled with LH DOE, 30runs: i.e., x1, x2 have 30 levels in [0,5]

y = 142.93+9.56x1 - 13.51x2-2.87x12 + 1.81x22


2D Response

Surface Method

(Minitab)

x1

x2

543210

5

4

3

2

1

0

>

-

-

-

-

< 110

110 120

120 130

130 140

140 150

150

y

x1

x2

543210

5

4

3

2

1

0

>

-

-

-

-

-

< 100

100 110

110 120

120 130

130 140

140 150

150

y1


Polynomial Regression: Stepwise Fit

y = 144.9+14.35x1-5.3x12+0.326x13

-23.2 x2 +6.7 x22-0.65x23


2D Stepwise

Regression Method

(Minitab)

x1

x2

543210

5

4

3

2

1

0

>

-

-

-

-

< 110

110 120

120 130

130 140

140 150

150

y

x1

x2

543210

5

4

3

2

1

0

>

-

-

-

-

-

< 100

100 110

110 120

120 130

130 140

140 150

150

y2

Function y sampled with LH DOE, 30

runs: i.e., x1, x2 have 30 levels in [0,5]


MARS Fit


y = 134.05-11.88(x1-2.24)+-4.76(2.24-x1)+ -1.50(x2-1.55)+

+13.74(1.55-x2)+


x1

x2

150

140

130

120

110

543210

5

4

3

2

1

0

x1

x2

543210

5

4

3

2

1

0

>

-

-

-

-

< 110

110 120

120 130

130 140

140 150

150

y

x1

x2

543210

5

4

3

2

1

0

>

-

-

-

-

-

< 100

100 110

110 120

120 130

130 140

140 150

150

y3

2D MARS

Prediction (Ford

Encore software)


Gaussian Stochastic Kriging Fit



x1

x2

543210

5

4

3

2

1

0

>

-

-

-

-

< 110

110 120

120 130

130 140

140 150

150

y

2D GSK Prediction

(Ford Encore software)

x1

x2

543210

5

4

3

2

1

0

>

-

-

-

-

-

< 100

100 110

110 120

120 130

130 140

140 150

150

Kriging


MARS/Kriging Comparison

� MARS Strengths

� Non-parametric: no assumption of underlying model required

� Well suited for high-dimensional problems; good for data mining

� Reasonably low computational demand

� MARS Limitations

� While often useful for understanding general trends, models sometimes do not accurately capture local behavior

� GSK Strengths� Interpolates data

� GSK Limitations� Relatively computationally demanding� Can over-fit data

74

Case Studies


Case Study: Piston Slap

REFERENCE

SAE Paper: 2003-01-0148, “Robust Piston Design and Optimization Using Piston Secondary Motion Analysis”

Problem/Opportunity

� Piston slap is an unwanted engine noise that results from piston secondary motion

� A combination of transient forces and certain piston clearances can result in

� Lateral movement of the piston within the cylinder

� Rotation of the piston about the piston pin

� This can cause the piston to impact the cylinder wall at regular intervals

� The design team has developed CAE model that predicts piston secondary motion, so this phenomenon can be explored analytically

Goals

� Achieve minimal piston friction and minimal piston noise simultaneously

� Reduce customer complaints“Deep Dive”


Case Study: Steering Wheel Nibble

REFERENCE

SAE Paper: 2005-01-1399, “Using Computer Aided Engineering to Find and Avoid the Steering Wheel ‘Nibble’ Failure Mode”

Problem/Opportunity

� Steering System is highly coupled:� Desire efficiency steering wheel to road wheel� Desire inefficiency road wheel to steering wheel

� Steering wheel nibble (undesired tangential oscillation between 10-20 Hz) is a potential failure mode typically in autos with rack and pinion steering

� A result of a chassis system response to wheel-end excitations

� Excitations can result in up to 0.05 mm of steering rack displacement which gets amplified into undesired steering wheel oscillations of up to 0.2o

Goals

� Use any and all approaches to avoid the Nibble Failure Mode

� Focus on making the designs less sensitive to Noise

“Deep Dive”


Case Study: Side Impact Design Criteria

REFERENCE

SAE Paper: 2005-01-0291, “Model of IIHS Side Impact Torso Response Measures using Transfer Function Equations”

Problem/Opportunity

� IIHS Side Impact Evaluation

� New test mode – side impact

� Develop guidelines to specify minimum targets

� Improve program efficiency by providing vehicle content guidelines

� Currently a variety of vehicle specific solutions are being developed

� Measures to be balanced with existing Regulatory & Company requirements

Goals

� Develop transfer functions

� Develop design guidelines

“Deep Dive”


Case Study: Hybrid Electric Vehicle Motor

Problem/Opportunity

� Ford’s Hybrid Electric Escape is the company’s first production Hybrid Electric Vehicle (HEV)

� The Power Split Transmission incorporates new technologies, including high power permanent magnet motors & torque control

Goals

� Ensure that the Escape’s electric motor meets targets based on comparable gas engine performance for

� Torque Accuracy

� Power Loss

� Traction Motor Noise, Vibration, Harshness

“Deep Dive”


Bibliography (1 of 2)� T. Davis, “Science, engineering, and statistics,” Applied Stochastic Models in

Business and Industry, Vol. 22, Issue 5-6, pp. 401-430, 2006.

� K.-T. Fang, R. Li, and A. Sudjianto, Design and Modeling for Computer Experiments, Chapman & Hall/CRC, New York, 2006.

� I. Farooq, J. Pinkerton, N. Soderborg, et. al., “Model of IIHS Side Impact Torso Response Measures using Transfer Function Equations,” SAE World Congress, April 11-14, 2005, SAE-2005-01-0291.

� R. Hoffman, A. Sudjianto, X. Du, and J. Stout, “Robust Piston Design and Optimization Using Piston Secondary Motion Analysis,” SAE World Congress, March 3-6, 2003, SAE-2003-01-0148.

� J. Lee, et. al., “An Approach to Robust Design Employing Computer Experiments, Proceedings of DETC ’01, ASME Design Automation Conference, Sept 9-12, 2001, Pittsburgh, PA, DETC2001/DAC-21095.

� T. Santner, B. Williams, and W. Notz, The Design and Analysis of Computer Experiments, Springer Verlag, New York, 2003.

� T. Simpson, “Comparison of Response Surface and Kriging Models in the Multidisciplinary Design of an Aerospike Nozzle,” NASA/CR-1998-206935.

� N. Soderborg, “Challenges and Approaches to Design for Six Sigma in the Automotive Industry,” SAE World Congress, April 11-14, 2005, SAE-2005-01-1211.


Bibliography (2 of 2)� N. Soderborg, “Design for Six Sigma at Ford,” Six Sigma Forum Magazine,

November 2004, 15-22.

� N. Soderborg, “Applications and Challenges in Probabilistic and Robust Design Based on Computer Modeling,” Invited Talk, Proceedings of the American Statistical Association Section on Physical and Engineering Sciences, 1999 Spring Research Conference on Statistics in Industry and Technology, June 2, 1999, Minneapolis, MN, pp. 207-212.

� R. Thomas, N. Soderborg, and S. Borders, “Using CAE to Find and Avoid Failure Modes: A Steering Wheel ‘Nibble’ Case Study,” SAE World Congress, April 11-14, 2005, SAE-2005-01-1399.

� G. Wang and S. Shan, “Review of Metamodeling Techniques in Support of Engineering Design Optimization”, Transactions of the ASME, Vol. 129, Apr. 2007, p. 370-380.

� S. Wang, Reliability & Robustness Engineering Using Computer Experiments (AR&R), 2000 Spring Research Conference on Statistics in Industry and Technology, Seattle, WA, June 26, 2000.

� G. Wiggs, “Design for Six Sigma (DFSS): The First 10 Years at GE,” SAE 2008, Application of Lean and Six Sigma for the Automotive Industry conference, Dec. 2-3, 2008.

Education

Design and Analysis of Computer Experiments