36
Modeling, Filtering, and Control for Semiconductor Manufacturing Kameshwar Poolla University of California Berkeley September 13, 2004 kp/ 0

Modeling, Filtering, and Control Semiconductor Manufacturingcden.ucsd.edu/.../Seminar/Poolla_Tutorial_09.13.2004.pdf · 2004. 9. 13. · − Cho + Kailath, IEEE TSM and TAC, RTP modeling

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • Modeling, Filtering, and Control

    for

    Semiconductor Manufacturing

    Kameshwar Poolla

    University of California

    Berkeley

    September 13, 2004

    kp/ 0

  • Outline

    1 Modeling

    − role of modeling

    − types of models, model properties

    − modeling methods

    − example: PEB plates

    2 Model based Filtering

    − role of filtering

    − kalman filtering

    − example: reflectometry

    3 Model based Control

    − role of control

    − example: PEB Control

    − general control architecture

    − example: resist thickness control

    kp/ 1

  • Role of Modeling in IC Manufacturing

    • Fault Detection and Isolationex: plasma etch

    during main etch step for a particular process, we monitor

    inputs u such as pressures, powers, etc.

    outputs y such as optical emission at specific frequencies, etc.

    given input-output data (u, y) and input-output models

    Po (normal behavior) and Pk (for kth fault, k = 1, · · · , N)

    solve mink ‖y − Pku‖ to detect if fault has occurred and determine which one

    • Equipment Designex: optimal heating lamp design in RTP.

    typically requires very detailed models based on first principles.

    • Process Optimization and Model-based Controlex: PEB plate control (details later)

    • Improving Metrology via Model-based Filteringex: ellipsometry (details later)

    kp/ 2

  • Types of Models

    • Different objectives call for different kinds of models

    − phenomenological models

    very accurate, detailed

    mass and energy transport, FEM based

    can be very complex (> 104 voxels)

    do not run in real-time

    needed for equipment design, process simulation

    − control-oriented models

    • Control-oriented models

    − based on expt

    − simplified, low order

    − gives first-order trend information

    − may run in real-time

    − used for control, diagnostics, metrology enhancement

    kp/ 3

  • Model Properties – I

    • Linearexcellent because they are simple.

    • Nonlinearproblem is guessing the correct form of the nonlinearity.

    Ex: arhenius dependence of reaction rates on temperature.

    • Important pointlinear models are useful even for nonlinear processes.

    nonlinear model: y = N(u)

    linearize about the operating point (uo, yo) =⇒ δy = L(δu)

    • Dealing with Nonlinearities

    gain-scheduled linear models

    ad hoc nonlinear models (Hammerstein, Weiner, etc.)

    kp/ 4

  • Model Properties – II

    • Staticex: PEB zone model. PEB plates have 7 or more heating elements.

    individually controlled, but effects overlap spatially.

    inputs are zone (offset) settings u1 to u7.

    outputs are steady state temperatures at wafer surface at 32 locations

    yk : k = 1, · · · , 32

    model is a matrix relating u to y

    • Dynamicex: RTP. 3 concentric heating lamps.

    inputs are powers u1(t), u2(t), u3(t).

    outputs are temperature trajectories at wafer surface at 32 locations

    yk(t) : k = 1, · · · , 32

    model is a differential equation relating u to y

    kp/ 5

  • Model Classes

    • Parametric

    transfer functions

    ARX, ARMAX

    general parametric

    required for any computation

    • Non-parametric

    frequency response

    impulse / step response

    wiener kernels

    need model reduction to get parametric model before control is done

    kp/ 6

  • General Modeling Principles – I

    • Averaging reduces effects of noise

    − averaging to reduce effects of noise

    variance in estimated parameters ≈noise variance

    reduction factor=

    σ2e ∗ npL ∗ ne

    L number of data points

    ne number of noise channels

    np number of parameters

    σ2e noise variance

    • Model verification is essential

    − use fresh input-output data set

    − check residuals : white ? uncorrelated with input ?

    kp/ 7

  • General Modeling Principles – II

    • Bias – Variance tradeoff

    − too many parameters to estimate =⇒ estimates will have high variance

    − too few parameters =⇒ estimates will be biased

    • Predictive capability of model

    − will have good prediction on inputs with spectral content

    similar to those used to build model in the first place

    kp/ 8

  • Modeling Methods

    • Parametric

    − Maximum Likelihood

    − Least squares (a very important special case!)

    • Non-parametric

    − Empirical Transfer Function Estimate

    − State Space Methods

    − Neural Network Methods

    kp/ 9

  • Empirical Transfer Function Estimate (ETFE)

    • Basic idea

    take DFT of input u and output y

    H(jω) =Y (jω)

    U(jω)

    computed at N frequencies where N = length of data record

    bias = 0 (asymptotically)

    variance =σ2e(ω)

    σ2u(ω)

    estimates at different frequencies are uncorrelated

    • Issues

    smoothing the estimate, choice of smoothing window

    bias vs. variance

    converting to a parametric model

    nonlinearities, time-variations (under-modeling)

    kp/ 10

  • State-space ID

    • Basic idea

    state-space model

    xk+1 = Axk + Buk

    yk = Cxk + Duk

    model parameters A, B, C, D obtained by factoring a certain matrix computed from

    data

    works very well for multi-variable systems

    no proven optimality properties

    can get order estimates also

    • Issues

    loss of physically meaningful parameters

    computationally intense: cannot be done in real-time

    nonlinearities, time-variations (under-modeling)

    kp/ 11

  • ML parameter estimation

    • General problem

    − given: data ud, yd, model y = f(u, θ, e)

    where e is unit-variance Gaussian white noise

    − find: “best” estimate of parameters θ

    • Basic idea: (maximum likelihood estimation)

    − note that for any fixed θ, y is a random vector.

    − compute the density function pY(y; θ). Note that

    pY(yd; θ) ∝ the likelihood that y = yd, given parameter values θ

    − good choice of parameters

    θML

    = arg maxθ

    pY(yd : θ)

    = arg minθ

    J(θ)

    where J(θ) = − log pY(yd : θ) = log-likelihood function.

    kp/ 12

  • ML Estimation (contd.)

    • Computing the estimate: nonlinear programming

    • Can easily

    incorporate prior information (statistics on θ)

    compute variance of parameter estimates

    • Issues

    can only realistically treat gaussian noise case

    nonlinear models are linearized, and then

    resulting parameter estimates can be biased

    choice of model structure f is key : basic physics/chemistry

    too many parameters is bad

    sensitive to noise model

    kp/ 13

  • Example: PEB Modeling

    • Details

    − 7 zone plate

    − inputs are zone offsets u1, · · · , u7

    − outputs are changes in steady-state temperature from nominal at 32 locations

    − model: y = Mu where M is a 32 × 7 matrix

    • Modeling procedure

    − pick at operating point (say 125o C)

    − get baseline temperature map

    − conduct N experiments: apply offsets u[i], measure temps y[i]

    − use least square to find model M :

    Y =[

    y[1] · · · y[N ]]

    = M[

    u[1] · · ·u[N ]]

    + noise = MU + E

    M = Y UT(

    UUT)−1

    − interpretation of jth column of M :

    expected changes in steady state temps due to changes in jth zone offset

    kp/ 14

  • Example: PEB Modeling (contd.)

    kp/ 15

  • Final Remarks on Modeling

    • References

    − Ljung

    − Soderstrom + Stoica

    − Astrom + Wittenmark

    • Software

    − MATLAB System ID Toolbox

    • Examples in IC Manufacturing

    − Cho + Kailath, IEEE TSM and TAC, RTP modeling case study.

    − Khargonekar et al., IEEE TSM, RIE modeling case study.

    kp/ 16

  • Role of Filtering

    • Improving metrology

    • Two ways to get Y

    − use sensor (but it could be noisy)

    − use a model (but it could be inaccurate)

    U process input

    YM measured sensor output

    YP predicted output

    YF filtered output

    YP

    Model

    CorrectionProcess Sensor

    ModelProcess Sensor

    Filter

    YFYM

    U

    kp/ 17

  • Kalman Filtering

    • Central to many ID problems

    • Consider a linear system

    ẋ(t) = Ax(t) + Buu(t) + Bvv(t)

    y(t) = Cx(t) + Duu(t) + w(t)

    x = process state

    v = process noise : used to account for under-modeling

    w = measurement noise

    u = process input

    • Assumptionsw(t), v(t), x(0) uncorrelated, known mean and covariance

    • Problembuild a box (in software) that processes y(t) and u(t) and

    produces an estimate x̂(t) of the state

    kp/ 18

  • Kalman Filtering: solution

    • Optimal filter realization

    − simple structure

    ẋ(t) = Ax̂(t) + Buu(t) + K(t)ν(t)

    − copy of model driven by input u and innovations

    ν(t) = (y(t) − Cx̂(t) − Duu(t))

    − kalman gain K(t) computed by solving a Riccati differential equation

    • Optimality in what sense ?

    − for gaussian noises, optimal over all filters

    − for general noises, optimal over all linear filters

    kp/ 19

  • Kalman Filtering: remarks

    • Kalman gain K(t)

    − captures tradeoff between model accuracy (process noise) and sensor noise

    − results in white innovations, uncorrelated with output y(t)

    • Easy recursive implementation

    − requires one linear system simulation

    − requires solving one riccati equation

    • Nonlinear systems ?

    − Extended Kalman Filtering

    − basic idea: kalman filter for the linearized system

    − not optimal in general

    − optimal nonlinear filtering: extremely difficult

    kp/ 20

  • Example: reflectometry

    • Basic idea

    − significant improvement in reflectometry sensor for etch-rate estimation

    by incorporating a model and using EKF

    − conventional etch-rate metrology

    • Setup

    − fix a wavelength λ (analysis can be done for multiple λ)

    − reflected light intensity Ir is

    Ir = Ior(d, λ, φ1, · · · , φn)

    r : reflection coefficient

    d : film thickness

    φ1, · · · , φn : index and thickness parameters for underlying stack

    − measurements

    y = αIr + v = αf(d) + v

    where v is meas. noise, and α captures effect of optics

    kp/ 21

  • Example (contd.)

    • Conventional methods for etch rate estimation

    − peak counting

    insensitive to noise, easy implementation

    gives average, not instantaneous, etch rate

    difficult to merge information from multiple wavelengths

    − Nonlinear least squares

    solve at each time

    mind

    ‖yk − f(dk)‖2

    computationally expensive

    etch rate is estimated by numerical differentiation

    • EKF based method

    gives direct etch rate estimate

    computationally reasonable

    can merge multiple wavelength data easily

    kp/ 22

  • Example (contd.)

    • state-space model used

    dk+1 = dk − Terk

    erk+1 = erk + w(1)k

    αk+1 = αk + w(2)k

    yk = αkf(dk) + vk

    here, er is etch rate, T is sampling time

    drift of etch rate and optical gain is modeled as a random walk

    • More details

    − Vincent + Khargonekar, IEEE TSM

    kp/ 23

  • Final Remarks on Filtering

    • References

    − Anderson + Moore

    − Kwaakernaak + Sivan

    • Software

    − MATLAB Signal Processing Toolbox

    • Examples in IC Manufacturing

    − Vincent et al., IEEE TSM, ellipsometry modeling case study.

    − Modeling actually enables some sensors, especially in MEMS

    kp/ 24

  • Role of Model-based Control

    • Why do control ?

    − control is an enabling technology for the next generation of integrated electronics

    − control is a cost-effective means of retrofitting existing fab lines

    − importance is gaining acceptance in Industry

    • How does control help ?

    − can reduce variability in product parameters (ex: CD, sidewall, etc.)

    by regulating process parameters (ex: etch rate, bake temp, etc.)

    − can reduce time to re-calibrate a process

    (ex: etchers taken off line for routine cleaning)

    − required for effective sensor development

    − can provide early diagnostic warnings (ex: control signals )

    − can accelerate time-to-yield

    − result: higher yields & equip utilization, lower product variance

    with modest capital cost

    kp/ 25

  • Example: PEB control/optimization

    • PEB plate model: y = Mu

    • Control problem

    − keep mean temp of plate at 125o C

    − minimize across wafer temperature spread

    • Solution

    − get baseline temp profile T b

    − problem is then

    minu

    maxi

    |yi + Tbi − 125|

    subject to mean (y) = 125, y = Mu

    − this is linear programming!

    • Results on 4 plates

    − before 124.21 ± 0.34o C

    − after 124.98 ± 0.02o C

    123

    Before After

    125

    124

    kp/ 26

  • Model based control

    • Ingredients of control

    − model : control-oriented, need not be detailed or accurate

    − sensors : to measure process parameters

    − actuators : to change process operating point

    − reference command : desired value (or trajectory) of

    process parameters

    • Hierarchical control levels

    − supervisory : oversees commands for wafers lot-by-lot,

    uses SPC and response surface models,

    − run-to-run : command for a wafer are generated based

    on metrology of previous wafer

    − real-time : requires modern control and ID methods,

    accurate in situ metrology

    kp/ 27

  • General control architecture

    State

    Noises &

    Process WaferState

    CommandController

    Actuated

    Setpoints

    Process Wafer

    Generator

    ModelModel

    Disturbances

    Inputs

    • controller keep process parameters constant even with disturbances andusing a coarse model

    • this results in product parameters being regulated to constant values

    • we can later map the process parameters to the product parameters togenerate appropriate reference commands

    kp/ 28

  • Example: Resist Thickness RTR Control

    • equipment used

    − SVG 8626 wafer track

    − 8363 HPO bake plate

    − OCG-820 resist

    • processing steps

    − pre-bake and HMDS

    − resist deposition

    − post-bake, cool, measure

    Wafer Cassette

    Spin Chuck

    Sensor

    Cold Plate

    Soft-Baking

    Hot Plate

    Wafer Cassette

    kp/ 29

  • Sensors, Actuators, Disturbances

    • metrology details

    − indirect measurement of resist thickness t

    − use Xe light source and photospectrometer to get reflectance vs. λ curves

    − software to curve fit in one band to get t

    • thickness most significantly affected by spin speed ω

    − increased ω results in decreased t

    − increments limited to ±100 rpm

    • in addition to actuated inputs, process is affected by

    − resist viscosity

    − environmental effects

    − wear and age of wafer track

    • we regard these as unmodeled disturbances.

    • could have used an environmentally controlled chamber – expensive.

    kp/ 30

  • Process Modeling

    • to regulate resist thickness t, we need a model

    − simple log-linear model in literature

    log t = a log(ω) + b + w

    − w is measurement noise, a, b are constants

    • could have used a complex model

    − good for simulation

    − bad for control as controller complexity ≈ model complexity

    • simple model has poor predictive capability across a lot

    − does not include process disturbances

    − to do this, we let model parameters drift slowly

    log tk = ak log(ωk) + bk + wk[

    ak+1

    bk+1

    ]

    =

    [

    ak

    bk

    ]

    + vk

    − vk is process noise to account for disturbance effects

    kp/ 31

  • Results: thickness control

    • experimental details

    − bake temperature held constant to suppress its effects

    − unpatterned wafers SiO2 on Si substrate

    − SiO2 layer was 1188 ± 6Å

    − prebake and HMDS omitted

    • sensor variance R obtained by repeated measurements on a coated waferat same location R ≈ 27Å

    • process variance Q obtained by minimizing prediction error over a waferrun of 15 wafers

    • initial parameter values θ̂0 and variance P0 obtained by optimal fittingfor each run and sample-path averaging over a 15 wafer run

    kp/ 32

  • Results: thickness control

    2 4 6 8 10 12 14 161.265

    1.27

    1.275

    1.28

    1.285

    1.29

    1.295

    1.3

    1.305x 10

    4

    wafer#

    angs

    troms

    • Open Loop

    − offset from target (lot-to-lot variance) ≈ ±300Å

    − wafer-to-wafer variance ≈ ±60Å

    • Closed Loop

    − offset from target (after two wafers) ≈ ±30Å

    − wafer-to-wafer variance ≈ ±30Å

    theoretical limit = sensor variance (27Å)

    kp/ 33

  • Discussion

    • need to run more wafers to conclusively show benefits of control

    • run-to-run control results in performance improvementwith minimal capital cost

    • no need to do real-time control here

    kp/ 34

  • Final Remarks on Model-based Control

    • References

    − PID and classical: Dorf, Astrom

    − State-space: Chen

    − Adaptive: Bodson + Sastry

    • Software

    − MATLAB Control Systems Toolbox

    • Examples in IC Manufacturing

    − Cho + Kailath, IEEE TSM, RTP case study.

    − Many, many other examples

    kp/ 35