Modeling, Filtering, and Control Semiconductor Manufacturingcden.ucsd.edu/.../Seminar/Poolla_Tutorial_09.13.2004.pdf · 2004. 9. 13. · − Cho + Kailath, IEEE TSM and TAC, RTP modeling

Modeling, Filtering, and Control

for

Semiconductor Manufacturing

Kameshwar Poolla

University of California

Berkeley

September 13, 2004

kp/ 0

Outline

1 Modeling

− role of modeling

− types of models, model properties

− modeling methods

− example: PEB plates

2 Model based Filtering

− role of filtering

− kalman filtering

− example: reflectometry

3 Model based Control

− role of control

− example: PEB Control

− general control architecture

− example: resist thickness control

kp/ 1

Role of Modeling in IC Manufacturing

• Fault Detection and Isolationex: plasma etch

during main etch step for a particular process, we monitor

inputs u such as pressures, powers, etc.

outputs y such as optical emission at specific frequencies, etc.

given input-output data (u, y) and input-output models

Po (normal behavior) and Pk (for kth fault, k = 1, · · · , N)

solve mink ‖y − Pku‖ to detect if fault has occurred and determine which one

• Equipment Designex: optimal heating lamp design in RTP.

typically requires very detailed models based on first principles.

• Process Optimization and Model-based Controlex: PEB plate control (details later)

• Improving Metrology via Model-based Filteringex: ellipsometry (details later)

kp/ 2

Types of Models

• Different objectives call for different kinds of models

− phenomenological models

very accurate, detailed

mass and energy transport, FEM based

can be very complex (> 104 voxels)

do not run in real-time

needed for equipment design, process simulation

− control-oriented models

• Control-oriented models

− based on expt

− simplified, low order

− gives first-order trend information

− may run in real-time

− used for control, diagnostics, metrology enhancement

kp/ 3

Model Properties – I

• Linearexcellent because they are simple.

• Nonlinearproblem is guessing the correct form of the nonlinearity.

Ex: arhenius dependence of reaction rates on temperature.

• Important pointlinear models are useful even for nonlinear processes.

nonlinear model: y = N(u)

linearize about the operating point (uo, yo) =⇒ δy = L(δu)

• Dealing with Nonlinearities

gain-scheduled linear models

ad hoc nonlinear models (Hammerstein, Weiner, etc.)

kp/ 4

Model Properties – II

• Staticex: PEB zone model. PEB plates have 7 or more heating elements.

individually controlled, but effects overlap spatially.

inputs are zone (offset) settings u1 to u7.

outputs are steady state temperatures at wafer surface at 32 locations

yk : k = 1, · · · , 32

model is a matrix relating u to y

• Dynamicex: RTP. 3 concentric heating lamps.

inputs are powers u1(t), u2(t), u3(t).

outputs are temperature trajectories at wafer surface at 32 locations

yk(t) : k = 1, · · · , 32

model is a differential equation relating u to y

kp/ 5

Model Classes

• Parametric

transfer functions

ARX, ARMAX

general parametric

required for any computation

• Non-parametric

frequency response

impulse / step response

wiener kernels

need model reduction to get parametric model before control is done

kp/ 6

General Modeling Principles – I

• Averaging reduces effects of noise

− averaging to reduce effects of noise

variance in estimated parameters ≈noise variance

reduction factor=

σ2e ∗ npL ∗ ne

L number of data points

ne number of noise channels

np number of parameters

σ2e noise variance

• Model verification is essential

− use fresh input-output data set

− check residuals : white ? uncorrelated with input ?

kp/ 7

General Modeling Principles – II

• Bias – Variance tradeoff

− too many parameters to estimate =⇒ estimates will have high variance

− too few parameters =⇒ estimates will be biased

• Predictive capability of model

− will have good prediction on inputs with spectral content

similar to those used to build model in the first place

kp/ 8

Modeling Methods

• Parametric

− Maximum Likelihood

− Least squares (a very important special case!)

• Non-parametric

− Empirical Transfer Function Estimate

− State Space Methods

− Neural Network Methods

kp/ 9

Empirical Transfer Function Estimate (ETFE)

• Basic idea

take DFT of input u and output y

H(jω) =Y (jω)

U(jω)

computed at N frequencies where N = length of data record

bias = 0 (asymptotically)

variance =σ2e(ω)

σ2u(ω)

estimates at different frequencies are uncorrelated

• Issues

smoothing the estimate, choice of smoothing window

bias vs. variance

converting to a parametric model

nonlinearities, time-variations (under-modeling)

kp/ 10

State-space ID

• Basic idea

state-space model

xk+1 = Axk + Buk

yk = Cxk + Duk

model parameters A, B, C, D obtained by factoring a certain matrix computed from

data

works very well for multi-variable systems

no proven optimality properties

can get order estimates also

• Issues

loss of physically meaningful parameters

computationally intense: cannot be done in real-time

nonlinearities, time-variations (under-modeling)

kp/ 11

ML parameter estimation

• General problem

− given: data ud, yd, model y = f(u, θ, e)

where e is unit-variance Gaussian white noise

− find: “best” estimate of parameters θ

• Basic idea: (maximum likelihood estimation)

− note that for any fixed θ, y is a random vector.

− compute the density function pY(y; θ). Note that

pY(yd; θ) ∝ the likelihood that y = yd, given parameter values θ

− good choice of parameters

θML

= arg maxθ

pY(yd : θ)

= arg minθ

J(θ)

where J(θ) = − log pY(yd : θ) = log-likelihood function.

kp/ 12

ML Estimation (contd.)

• Computing the estimate: nonlinear programming

• Can easily

incorporate prior information (statistics on θ)

compute variance of parameter estimates

• Issues

can only realistically treat gaussian noise case

nonlinear models are linearized, and then

resulting parameter estimates can be biased

choice of model structure f is key : basic physics/chemistry

too many parameters is bad

sensitive to noise model

kp/ 13

Example: PEB Modeling

• Details

− 7 zone plate

− inputs are zone offsets u1, · · · , u7

− outputs are changes in steady-state temperature from nominal at 32 locations

− model: y = Mu where M is a 32 × 7 matrix

• Modeling procedure

− pick at operating point (say 125o C)

− get baseline temperature map

− conduct N experiments: apply offsets u[i], measure temps y[i]

− use least square to find model M :

Y =[

y[1] · · · y[N ]]

= M[

u[1] · · ·u[N ]]

+ noise = MU + E

M = Y UT(

UUT)−1

− interpretation of jth column of M :

expected changes in steady state temps due to changes in jth zone offset

kp/ 14

Example: PEB Modeling (contd.)

kp/ 15

Final Remarks on Modeling

• References

− Ljung

− Soderstrom + Stoica

− Astrom + Wittenmark

• Software

− MATLAB System ID Toolbox

• Examples in IC Manufacturing

− Cho + Kailath, IEEE TSM and TAC, RTP modeling case study.

− Khargonekar et al., IEEE TSM, RIE modeling case study.

kp/ 16

Role of Filtering

• Improving metrology

• Two ways to get Y

− use sensor (but it could be noisy)

− use a model (but it could be inaccurate)

U process input

YM measured sensor output

YP predicted output

YF filtered output

YP

Model

CorrectionProcess Sensor

ModelProcess Sensor

Filter

YFYM

U

kp/ 17

Kalman Filtering

• Central to many ID problems

• Consider a linear system

ẋ(t) = Ax(t) + Buu(t) + Bvv(t)

y(t) = Cx(t) + Duu(t) + w(t)

x = process state

v = process noise : used to account for under-modeling

w = measurement noise

u = process input

• Assumptionsw(t), v(t), x(0) uncorrelated, known mean and covariance

• Problembuild a box (in software) that processes y(t) and u(t) and

produces an estimate x̂(t) of the state

kp/ 18

Kalman Filtering: solution

• Optimal filter realization

− simple structure

ẋ(t) = Ax̂(t) + Buu(t) + K(t)ν(t)

− copy of model driven by input u and innovations

ν(t) = (y(t) − Cx̂(t) − Duu(t))

− kalman gain K(t) computed by solving a Riccati differential equation

• Optimality in what sense ?

− for gaussian noises, optimal over all filters

− for general noises, optimal over all linear filters

kp/ 19

Kalman Filtering: remarks

• Kalman gain K(t)

− captures tradeoff between model accuracy (process noise) and sensor noise

− results in white innovations, uncorrelated with output y(t)

• Easy recursive implementation

− requires one linear system simulation

− requires solving one riccati equation

• Nonlinear systems ?

− Extended Kalman Filtering

− basic idea: kalman filter for the linearized system

− not optimal in general

− optimal nonlinear filtering: extremely difficult

kp/ 20

Example: reflectometry

• Basic idea

− significant improvement in reflectometry sensor for etch-rate estimation

by incorporating a model and using EKF

− conventional etch-rate metrology

• Setup

− fix a wavelength λ (analysis can be done for multiple λ)

− reflected light intensity Ir is

Ir = Ior(d, λ, φ1, · · · , φn)

r : reflection coefficient

d : film thickness

φ1, · · · , φn : index and thickness parameters for underlying stack

− measurements

y = αIr + v = αf(d) + v

where v is meas. noise, and α captures effect of optics

kp/ 21

Example (contd.)

• Conventional methods for etch rate estimation

− peak counting

insensitive to noise, easy implementation

gives average, not instantaneous, etch rate

difficult to merge information from multiple wavelengths

− Nonlinear least squares

solve at each time

mind

‖yk − f(dk)‖2

computationally expensive

etch rate is estimated by numerical differentiation

• EKF based method

gives direct etch rate estimate

computationally reasonable

can merge multiple wavelength data easily

kp/ 22

Example (contd.)

• state-space model used

dk+1 = dk − Terk

erk+1 = erk + w(1)k

αk+1 = αk + w(2)k

yk = αkf(dk) + vk

here, er is etch rate, T is sampling time

drift of etch rate and optical gain is modeled as a random walk

• More details

− Vincent + Khargonekar, IEEE TSM

kp/ 23

Final Remarks on Filtering

• References

− Anderson + Moore

− Kwaakernaak + Sivan

• Software

− MATLAB Signal Processing Toolbox


− Vincent et al., IEEE TSM, ellipsometry modeling case study.

− Modeling actually enables some sensors, especially in MEMS

kp/ 24

Role of Model-based Control

• Why do control ?

− control is an enabling technology for the next generation of integrated electronics

− control is a cost-effective means of retrofitting existing fab lines

− importance is gaining acceptance in Industry

• How does control help ?

− can reduce variability in product parameters (ex: CD, sidewall, etc.)

by regulating process parameters (ex: etch rate, bake temp, etc.)

− can reduce time to re-calibrate a process

(ex: etchers taken off line for routine cleaning)

− required for effective sensor development

− can provide early diagnostic warnings (ex: control signals )

− can accelerate time-to-yield

− result: higher yields & equip utilization, lower product variance

with modest capital cost

kp/ 25

Example: PEB control/optimization

• PEB plate model: y = Mu

• Control problem

− keep mean temp of plate at 125o C

− minimize across wafer temperature spread

• Solution

− get baseline temp profile T b

− problem is then

minu

maxi

|yi + Tbi − 125|

subject to mean (y) = 125, y = Mu

− this is linear programming!

• Results on 4 plates

− before 124.21 ± 0.34o C

− after 124.98 ± 0.02o C

123

Before After

125

124

kp/ 26

Model based control

• Ingredients of control

− model : control-oriented, need not be detailed or accurate

− sensors : to measure process parameters

− actuators : to change process operating point

− reference command : desired value (or trajectory) of

process parameters

• Hierarchical control levels

− supervisory : oversees commands for wafers lot-by-lot,

uses SPC and response surface models,

− run-to-run : command for a wafer are generated based

on metrology of previous wafer

− real-time : requires modern control and ID methods,

accurate in situ metrology

kp/ 27

General control architecture

State

Noises &

Process WaferState

CommandController

Actuated

Setpoints

Process Wafer

Generator

ModelModel

Disturbances

Inputs

• controller keep process parameters constant even with disturbances andusing a coarse model

• this results in product parameters being regulated to constant values

• we can later map the process parameters to the product parameters togenerate appropriate reference commands

kp/ 28

Example: Resist Thickness RTR Control

• equipment used

− SVG 8626 wafer track

− 8363 HPO bake plate

− OCG-820 resist

• processing steps

− pre-bake and HMDS

− resist deposition

− post-bake, cool, measure

Wafer Cassette

Spin Chuck

Sensor

Cold Plate

Soft-Baking

Hot Plate

Wafer Cassette

kp/ 29

Sensors, Actuators, Disturbances

• metrology details

− indirect measurement of resist thickness t

− use Xe light source and photospectrometer to get reflectance vs. λ curves

− software to curve fit in one band to get t

• thickness most significantly affected by spin speed ω

− increased ω results in decreased t

− increments limited to ±100 rpm

• in addition to actuated inputs, process is affected by

− resist viscosity

− environmental effects

− wear and age of wafer track

• we regard these as unmodeled disturbances.

• could have used an environmentally controlled chamber – expensive.

kp/ 30

Process Modeling

• to regulate resist thickness t, we need a model

− simple log-linear model in literature

log t = a log(ω) + b + w

− w is measurement noise, a, b are constants

• could have used a complex model

− good for simulation

− bad for control as controller complexity ≈ model complexity

• simple model has poor predictive capability across a lot

− does not include process disturbances

− to do this, we let model parameters drift slowly

log tk = ak log(ωk) + bk + wk[

ak+1

bk+1

]

=

[

ak

bk

]

+ vk

− vk is process noise to account for disturbance effects

kp/ 31

Results: thickness control

• experimental details

− bake temperature held constant to suppress its effects

− unpatterned wafers SiO2 on Si substrate

− SiO2 layer was 1188 ± 6Å

− prebake and HMDS omitted

• sensor variance R obtained by repeated measurements on a coated waferat same location R ≈ 27Å

• process variance Q obtained by minimizing prediction error over a waferrun of 15 wafers

• initial parameter values θ̂0 and variance P0 obtained by optimal fittingfor each run and sample-path averaging over a 15 wafer run

kp/ 32

Results: thickness control

2 4 6 8 10 12 14 161.265

1.27

1.275

1.28

1.285

1.29

1.295

1.3

1.305x 10

4

wafer#

angs

troms

• Open Loop

− offset from target (lot-to-lot variance) ≈ ±300Å

− wafer-to-wafer variance ≈ ±60Å

• Closed Loop

− offset from target (after two wafers) ≈ ±30Å

− wafer-to-wafer variance ≈ ±30Å

theoretical limit = sensor variance (27Å)

kp/ 33

Discussion

• need to run more wafers to conclusively show benefits of control

• run-to-run control results in performance improvementwith minimal capital cost

• no need to do real-time control here

kp/ 34

Final Remarks on Model-based Control

• References

− PID and classical: Dorf, Astrom

− State-space: Chen

− Adaptive: Bodson + Sastry

• Software

− MATLAB Control Systems Toolbox


− Cho + Kailath, IEEE TSM, RTP case study.

− Many, many other examples

kp/ 35

Documents

Modeling, Filtering, and Control Semiconductor Manufacturingcden.ucsd.edu/.../Seminar/Poolla_Tutorial_09.13.2004.pdf · 2004. 9. 13. · − Cho + Kailath, IEEE TSM and TAC, RTP modeling