Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Modeling, Filtering, and Control
for
Semiconductor Manufacturing
Kameshwar Poolla
University of California
Berkeley
September 13, 2004
kp/ 0
Outline
1 Modeling
− role of modeling
− types of models, model properties
− modeling methods
− example: PEB plates
2 Model based Filtering
− role of filtering
− kalman filtering
− example: reflectometry
3 Model based Control
− role of control
− example: PEB Control
− general control architecture
− example: resist thickness control
kp/ 1
Role of Modeling in IC Manufacturing
• Fault Detection and Isolationex: plasma etch
during main etch step for a particular process, we monitor
inputs u such as pressures, powers, etc.
outputs y such as optical emission at specific frequencies, etc.
given input-output data (u, y) and input-output models
Po (normal behavior) and Pk (for kth fault, k = 1, · · · , N)
solve mink ‖y − Pku‖ to detect if fault has occurred and determine which one
• Equipment Designex: optimal heating lamp design in RTP.
typically requires very detailed models based on first principles.
• Process Optimization and Model-based Controlex: PEB plate control (details later)
• Improving Metrology via Model-based Filteringex: ellipsometry (details later)
kp/ 2
Types of Models
• Different objectives call for different kinds of models
− phenomenological models
very accurate, detailed
mass and energy transport, FEM based
can be very complex (> 104 voxels)
do not run in real-time
needed for equipment design, process simulation
− control-oriented models
• Control-oriented models
− based on expt
− simplified, low order
− gives first-order trend information
− may run in real-time
− used for control, diagnostics, metrology enhancement
kp/ 3
Model Properties – I
• Linearexcellent because they are simple.
• Nonlinearproblem is guessing the correct form of the nonlinearity.
Ex: arhenius dependence of reaction rates on temperature.
• Important pointlinear models are useful even for nonlinear processes.
nonlinear model: y = N(u)
linearize about the operating point (uo, yo) =⇒ δy = L(δu)
• Dealing with Nonlinearities
gain-scheduled linear models
ad hoc nonlinear models (Hammerstein, Weiner, etc.)
kp/ 4
Model Properties – II
• Staticex: PEB zone model. PEB plates have 7 or more heating elements.
individually controlled, but effects overlap spatially.
inputs are zone (offset) settings u1 to u7.
outputs are steady state temperatures at wafer surface at 32 locations
yk : k = 1, · · · , 32
model is a matrix relating u to y
• Dynamicex: RTP. 3 concentric heating lamps.
inputs are powers u1(t), u2(t), u3(t).
outputs are temperature trajectories at wafer surface at 32 locations
yk(t) : k = 1, · · · , 32
model is a differential equation relating u to y
kp/ 5
Model Classes
• Parametric
transfer functions
ARX, ARMAX
general parametric
required for any computation
• Non-parametric
frequency response
impulse / step response
wiener kernels
need model reduction to get parametric model before control is done
kp/ 6
General Modeling Principles – I
• Averaging reduces effects of noise
− averaging to reduce effects of noise
variance in estimated parameters ≈noise variance
reduction factor=
σ2e ∗ npL ∗ ne
L number of data points
ne number of noise channels
np number of parameters
σ2e noise variance
• Model verification is essential
− use fresh input-output data set
− check residuals : white ? uncorrelated with input ?
kp/ 7
General Modeling Principles – II
• Bias – Variance tradeoff
− too many parameters to estimate =⇒ estimates will have high variance
− too few parameters =⇒ estimates will be biased
• Predictive capability of model
− will have good prediction on inputs with spectral content
similar to those used to build model in the first place
kp/ 8
Modeling Methods
• Parametric
− Maximum Likelihood
− Least squares (a very important special case!)
• Non-parametric
− Empirical Transfer Function Estimate
− State Space Methods
− Neural Network Methods
kp/ 9
Empirical Transfer Function Estimate (ETFE)
• Basic idea
take DFT of input u and output y
H(jω) =Y (jω)
U(jω)
computed at N frequencies where N = length of data record
bias = 0 (asymptotically)
variance =σ2e(ω)
σ2u(ω)
estimates at different frequencies are uncorrelated
• Issues
smoothing the estimate, choice of smoothing window
bias vs. variance
converting to a parametric model
nonlinearities, time-variations (under-modeling)
kp/ 10
State-space ID
• Basic idea
state-space model
xk+1 = Axk + Buk
yk = Cxk + Duk
model parameters A, B, C, D obtained by factoring a certain matrix computed from
data
works very well for multi-variable systems
no proven optimality properties
can get order estimates also
• Issues
loss of physically meaningful parameters
computationally intense: cannot be done in real-time
nonlinearities, time-variations (under-modeling)
kp/ 11
ML parameter estimation
• General problem
− given: data ud, yd, model y = f(u, θ, e)
where e is unit-variance Gaussian white noise
− find: “best” estimate of parameters θ
• Basic idea: (maximum likelihood estimation)
− note that for any fixed θ, y is a random vector.
− compute the density function pY(y; θ). Note that
pY(yd; θ) ∝ the likelihood that y = yd, given parameter values θ
− good choice of parameters
θML
= arg maxθ
pY(yd : θ)
= arg minθ
J(θ)
where J(θ) = − log pY(yd : θ) = log-likelihood function.
kp/ 12
ML Estimation (contd.)
• Computing the estimate: nonlinear programming
• Can easily
incorporate prior information (statistics on θ)
compute variance of parameter estimates
• Issues
can only realistically treat gaussian noise case
nonlinear models are linearized, and then
resulting parameter estimates can be biased
choice of model structure f is key : basic physics/chemistry
too many parameters is bad
sensitive to noise model
kp/ 13
Example: PEB Modeling
• Details
− 7 zone plate
− inputs are zone offsets u1, · · · , u7
− outputs are changes in steady-state temperature from nominal at 32 locations
− model: y = Mu where M is a 32 × 7 matrix
• Modeling procedure
− pick at operating point (say 125o C)
− get baseline temperature map
− conduct N experiments: apply offsets u[i], measure temps y[i]
− use least square to find model M :
Y =[
y[1] · · · y[N ]]
= M[
u[1] · · ·u[N ]]
+ noise = MU + E
M = Y UT(
UUT)−1
− interpretation of jth column of M :
expected changes in steady state temps due to changes in jth zone offset
kp/ 14
Example: PEB Modeling (contd.)
kp/ 15
Final Remarks on Modeling
• References
− Ljung
− Soderstrom + Stoica
− Astrom + Wittenmark
• Software
− MATLAB System ID Toolbox
• Examples in IC Manufacturing
− Cho + Kailath, IEEE TSM and TAC, RTP modeling case study.
− Khargonekar et al., IEEE TSM, RIE modeling case study.
kp/ 16
Role of Filtering
• Improving metrology
• Two ways to get Y
− use sensor (but it could be noisy)
− use a model (but it could be inaccurate)
U process input
YM measured sensor output
YP predicted output
YF filtered output
YP
Model
CorrectionProcess Sensor
ModelProcess Sensor
Filter
YFYM
U
kp/ 17
Kalman Filtering
• Central to many ID problems
• Consider a linear system
ẋ(t) = Ax(t) + Buu(t) + Bvv(t)
y(t) = Cx(t) + Duu(t) + w(t)
x = process state
v = process noise : used to account for under-modeling
w = measurement noise
u = process input
• Assumptionsw(t), v(t), x(0) uncorrelated, known mean and covariance
• Problembuild a box (in software) that processes y(t) and u(t) and
produces an estimate x̂(t) of the state
kp/ 18
Kalman Filtering: solution
• Optimal filter realization
− simple structure
ẋ(t) = Ax̂(t) + Buu(t) + K(t)ν(t)
− copy of model driven by input u and innovations
ν(t) = (y(t) − Cx̂(t) − Duu(t))
− kalman gain K(t) computed by solving a Riccati differential equation
• Optimality in what sense ?
− for gaussian noises, optimal over all filters
− for general noises, optimal over all linear filters
kp/ 19
Kalman Filtering: remarks
• Kalman gain K(t)
− captures tradeoff between model accuracy (process noise) and sensor noise
− results in white innovations, uncorrelated with output y(t)
• Easy recursive implementation
− requires one linear system simulation
− requires solving one riccati equation
• Nonlinear systems ?
− Extended Kalman Filtering
− basic idea: kalman filter for the linearized system
− not optimal in general
− optimal nonlinear filtering: extremely difficult
kp/ 20
Example: reflectometry
• Basic idea
− significant improvement in reflectometry sensor for etch-rate estimation
by incorporating a model and using EKF
− conventional etch-rate metrology
• Setup
− fix a wavelength λ (analysis can be done for multiple λ)
− reflected light intensity Ir is
Ir = Ior(d, λ, φ1, · · · , φn)
r : reflection coefficient
d : film thickness
φ1, · · · , φn : index and thickness parameters for underlying stack
− measurements
y = αIr + v = αf(d) + v
where v is meas. noise, and α captures effect of optics
kp/ 21
Example (contd.)
• Conventional methods for etch rate estimation
− peak counting
insensitive to noise, easy implementation
gives average, not instantaneous, etch rate
difficult to merge information from multiple wavelengths
− Nonlinear least squares
solve at each time
mind
‖yk − f(dk)‖2
computationally expensive
etch rate is estimated by numerical differentiation
• EKF based method
gives direct etch rate estimate
computationally reasonable
can merge multiple wavelength data easily
kp/ 22
Example (contd.)
• state-space model used
dk+1 = dk − Terk
erk+1 = erk + w(1)k
αk+1 = αk + w(2)k
yk = αkf(dk) + vk
here, er is etch rate, T is sampling time
drift of etch rate and optical gain is modeled as a random walk
• More details
− Vincent + Khargonekar, IEEE TSM
kp/ 23
Final Remarks on Filtering
• References
− Anderson + Moore
− Kwaakernaak + Sivan
• Software
− MATLAB Signal Processing Toolbox
• Examples in IC Manufacturing
− Vincent et al., IEEE TSM, ellipsometry modeling case study.
− Modeling actually enables some sensors, especially in MEMS
kp/ 24
Role of Model-based Control
• Why do control ?
− control is an enabling technology for the next generation of integrated electronics
− control is a cost-effective means of retrofitting existing fab lines
− importance is gaining acceptance in Industry
• How does control help ?
− can reduce variability in product parameters (ex: CD, sidewall, etc.)
by regulating process parameters (ex: etch rate, bake temp, etc.)
− can reduce time to re-calibrate a process
(ex: etchers taken off line for routine cleaning)
− required for effective sensor development
− can provide early diagnostic warnings (ex: control signals )
− can accelerate time-to-yield
− result: higher yields & equip utilization, lower product variance
with modest capital cost
kp/ 25
Example: PEB control/optimization
• PEB plate model: y = Mu
• Control problem
− keep mean temp of plate at 125o C
− minimize across wafer temperature spread
• Solution
− get baseline temp profile T b
− problem is then
minu
maxi
|yi + Tbi − 125|
subject to mean (y) = 125, y = Mu
− this is linear programming!
• Results on 4 plates
− before 124.21 ± 0.34o C
− after 124.98 ± 0.02o C
123
Before After
125
124
kp/ 26
Model based control
• Ingredients of control
− model : control-oriented, need not be detailed or accurate
− sensors : to measure process parameters
− actuators : to change process operating point
− reference command : desired value (or trajectory) of
process parameters
• Hierarchical control levels
− supervisory : oversees commands for wafers lot-by-lot,
uses SPC and response surface models,
− run-to-run : command for a wafer are generated based
on metrology of previous wafer
− real-time : requires modern control and ID methods,
accurate in situ metrology
kp/ 27
General control architecture
State
Noises &
Process WaferState
CommandController
Actuated
Setpoints
Process Wafer
Generator
ModelModel
Disturbances
Inputs
• controller keep process parameters constant even with disturbances andusing a coarse model
• this results in product parameters being regulated to constant values
• we can later map the process parameters to the product parameters togenerate appropriate reference commands
kp/ 28
Example: Resist Thickness RTR Control
• equipment used
− SVG 8626 wafer track
− 8363 HPO bake plate
− OCG-820 resist
• processing steps
− pre-bake and HMDS
− resist deposition
− post-bake, cool, measure
Wafer Cassette
Spin Chuck
Sensor
Cold Plate
Soft-Baking
Hot Plate
Wafer Cassette
kp/ 29
Sensors, Actuators, Disturbances
• metrology details
− indirect measurement of resist thickness t
− use Xe light source and photospectrometer to get reflectance vs. λ curves
− software to curve fit in one band to get t
• thickness most significantly affected by spin speed ω
− increased ω results in decreased t
− increments limited to ±100 rpm
• in addition to actuated inputs, process is affected by
− resist viscosity
− environmental effects
− wear and age of wafer track
• we regard these as unmodeled disturbances.
• could have used an environmentally controlled chamber – expensive.
kp/ 30
Process Modeling
• to regulate resist thickness t, we need a model
− simple log-linear model in literature
log t = a log(ω) + b + w
− w is measurement noise, a, b are constants
• could have used a complex model
− good for simulation
− bad for control as controller complexity ≈ model complexity
• simple model has poor predictive capability across a lot
− does not include process disturbances
− to do this, we let model parameters drift slowly
log tk = ak log(ωk) + bk + wk[
ak+1
bk+1
]
=
[
ak
bk
]
+ vk
− vk is process noise to account for disturbance effects
kp/ 31
Results: thickness control
• experimental details
− bake temperature held constant to suppress its effects
− unpatterned wafers SiO2 on Si substrate
− SiO2 layer was 1188 ± 6Å
− prebake and HMDS omitted
• sensor variance R obtained by repeated measurements on a coated waferat same location R ≈ 27Å
• process variance Q obtained by minimizing prediction error over a waferrun of 15 wafers
• initial parameter values θ̂0 and variance P0 obtained by optimal fittingfor each run and sample-path averaging over a 15 wafer run
kp/ 32
Results: thickness control
2 4 6 8 10 12 14 161.265
1.27
1.275
1.28
1.285
1.29
1.295
1.3
1.305x 10
4
wafer#
angs
troms
• Open Loop
− offset from target (lot-to-lot variance) ≈ ±300Å
− wafer-to-wafer variance ≈ ±60Å
• Closed Loop
− offset from target (after two wafers) ≈ ±30Å
− wafer-to-wafer variance ≈ ±30Å
theoretical limit = sensor variance (27Å)
kp/ 33
Discussion
• need to run more wafers to conclusively show benefits of control
• run-to-run control results in performance improvementwith minimal capital cost
• no need to do real-time control here
kp/ 34
Final Remarks on Model-based Control
• References
− PID and classical: Dorf, Astrom
− State-space: Chen
− Adaptive: Bodson + Sastry
• Software
− MATLAB Control Systems Toolbox
• Examples in IC Manufacturing
− Cho + Kailath, IEEE TSM, RTP case study.
− Many, many other examples
kp/ 35