Performance Evaluation of Blended Metrology Schemes Incorporating Virtual Metrology

506 IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 26, NO. 4, NOVEMBER 2013

Performance Evaluation of Blended MetrologySchemes Incorporating Virtual Metrology

Jae Yeon Baek, Student Member, IEEE, and Costas J. Spanos, Fellow, IEEE

Abstract—This paper formulates and explores the tradeoffbetween re-calibration and off-line metrology to find the optimalnumber of samples that maximizes the profit. A sequenceof metrology samples using a regression model with linearlydrifting coefficients is simulated, a model realistically applyingto a manufacturing process with linearly drifting hidden vari-ables. Three different types of statistical models, linear regres-sion, exponentially-weighted linear regression (EWLR), and theKalman Filter are used as VM prediction tools. We simulate twoblended metrology sampling scenarios, one that automaticallydiscards flagged wafers and another that allows re-inspectionand process re-tuning. We alternate between training sets andtesting sets, and compare the resulting net profit, Type I, andType II errors as a function of varying VM prediction samplesizes. Results show that each VM prediction model has a differenttradeoff between the Type I and Type II errors that determinethe optimal sampling scheme. The ultimate goal is to create ageneral framework that quickly leads to the optimal design ofsuch schemes given the characteristics of the process in question.

Index Terms—Manufacturing processes, metrology, plasmaapplications, predictive models, process control, semiconductordevice measurement, virtual manufacturing.

I. Introduction

AN increase in complex manufacturing processes andreduced device dimensions has demanded even tighter

control in semiconductor manufacturing. Introduced in theearly 1930s by W. Shewhart, statistical process control (SPC)has been used in the microelectronics fabrication industryto monitor deviations from statistical control [1]. Using thecontrol chart as its main tool, SPC aims to detect occur-rence of shifts in process performance so that investigationand corrective action may be undertaken to bring incorrectlybehaving manufacturing processes back under control [2], [3].Although effective in monitoring process variability, SPC doesnot prescribe automatic control actions [4]. Due to this needfor continuous process tuning, advanced process control (APC)evloved rapidly in the semiconductor industry in the 1980s

Manuscript received December 30, 2012; revised May 1, 2013; acceptedJune 4, 2013. Date of publication July 9, 2013; date of current versionOctober 28, 2013. This work was supported by Samsung Scholarship andIMPACT+ (Applied Materials, ARM, ASML, Global Foundries, IBM, In-tel, KLA-Tencor, Mentor Graphics, Panoramic Tech, Qualcomm, Samsung,SanDisk, and Tokyo Electron).

The authors are with the Department of Electrical Engineering and Com-puter Sciences, University of California, Berkeley, CA 94720 USA (e-mail:[email protected]; [email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TSM.2013.2271999

and 1990s with Run-to-Run (R2R) control emerging as thefirst technologically viable product of that evolution [1].

High measurement quality is absolutely critical in orderfor SPC and APC to be successful. SPC is based on plot-ting measurement values on a control chart, and APC usesmeasurement values to construct a model that will fine tunethe process in subsequent steps. Most R2R control systemsemployed currently are lot-to-lot, and uses lot level metrologydata [5], [6]. This means that a few wafers within a lot aremeasured at the metrology station and the control values arekept the same for all of the wafers in a lot [6], [7]. How-ever, with shrinking device dimensions and increasing waferdiameters, wafer-to-wafer (W2W) control becomes crucial tomaintaining high quality and productivity [6], [7], [8], [9],[10], and wafer level metrology will be needed in the future.

Although all process parameters are important for effectiveproduct manufacturing, some so-called critical dimensions(CDs) are determined to be the most important. CD’s typicallyencompass those that define the electrical performance oftransistors or interconnect. Currently, the main workhorse formeasuring the length of transistor gates has been the scanningelectron microscope (SEM). SEM measurements are col-lected off-line using dedicated, throughput-limited equipment.Although this kind of metrology produces relatively accuratemeasurements, it is also costly and time-consuming becauseof the time gap from sending the test wafer to the metrologyequipment to getting the results [9]. For example, even if thereis a sudden drift or shift in the process, this cannot be detectedreal-time, causing defectiveness to all wafers processed inthe delay period [8]. The fab has to incur a significantcost due to these various disadvantages of conventional SEMmetrology [6].

The metrology delay from physical metrology stationsmakes wafer-to-wafer monitoring and control impossible fortoday’s manufacturing lines. A potential solution to this isvirtual metrology (VM), which takes the processing dataproduced by the processing tool in real time, (e.g. plasmaetching data during isolation trench formation, for example)and predicts an outcome of the wafer (e.g. critical dimension ofthe trench) utilizing an empirical model. Tool data utilized infault detection is usually used as the input to the VM model.Implementation of a good virtual metrology model enablesW2W real-time quality control and reduces the cycle time, inaddition to decreasing number of test wafers [11].

Various authors have developed VM models to predict waferetch rate using high-dimensional sensor data [12], [13], [14].

0894-6507 c© 2013 IEEE

BAEK AND SPANOS: BLENDED METROLOGY SCHEMES INCORPORATING VIRTUAL METROLOGY 507

Fig. 1. A blended metrology scenario: the VM model is constructed throughthe initial training step, then applied to the process, and recalibrated later toincorporate process drift.

Su et al. and Hung et al. developed VM models for CVDfilm thicknesses using neural networks [15], [16]. Moreover,few authors have implemented an APC model using VMmeasurements as inputs [6], [7]. The performance of the VMmodel is usually measured against conventional metrologydata by criteria such as the error variance or the mean squarederror (MSE).

Although VM has its advantages, VM model predictionquality is not as good as that of conventional metrology andthe models need frequent recalibration in order to maintainacceptable predictive capability. In real life, we envision thatpractical metrology schemes will involve VM in combinationwith conventional or actual metrology, the latter being used forthe needed periodic recalibration of the VM empirical model.Such a scenario is shown in Fig. 1. The initial training step isused to construct the VM model; the model is then applied toincoming wafers and later recalibrated to incorporate processdrift and other faults. We also envision an algorithm thatwould respond to a fault being predicted by the VM model bypossibly requiring additional actual measurements.

Generating more accurate empirical models is a criticalaspect of implementing virtual metrology. However, not manyhave focused on the problem of analyzing the cost of usingvirtual metrology vs. actual metrology. What really matters iswhether the introduction of VM actually improves the overallperformance of the fabrication sequence, and if so, analyzingthe conditions under which such a blended metrology schemewould be advantageous and optimized. Yung-Cheng et al.have proposed a model for calculating the fab’s profitabilityof using VM [9]. However, this does not include a term forthe quality of the VM model. As mentioned above, we thinkthat virtual metrology alone will not suffice as a stand-alonemetrology tool due to its limited predictive capability, butcould be profitable for the fab with the use of conventionalmetrology as a recalibration tool.

Any metrology scheme, whether conventional, virtual orblended, may produce false alarms (Type I error) or may failto detect a fault (Type II error). Those two types of errors areassociated with operational costs that depend on the metrologyscheme and the algorithm used in response to an alarm. Inthis work, we formulate the costs associated with Type I andType II errors that result from a blended metrology scheme,and we propose a general framework that can be used to

quickly lead to the optimal design of such schemes giventhe characteristics of the process in question. This is doneby exploring the effects of variables such as the frequencyof samples that go through actual metrology, the predictionquality of the VM model, the cost of missed or false alarms,processing and actual metrology costs etc.

In our earlier work [17], we demonstrated that it is morebeneficial to see VM in context of a faulty process than a well-controlled process. In fact, a well-controlled process wouldneed minimal metrology. Thus, we simulate a drifting processwith low Cpk and calculate the total profit for three differentbut realistic VM models: Linear Regression, Exponentially-weighted Linear Regression (EWLR), and the Kalman Filter.This is done for two possible blended metrology sampling sce-narios, one where flagged wafers are automatically discarded,and another where they are re-inspected and the process isre-tuned. We do not extend our analysis to a W2W controlsystem with VM inputs but this analysis can be done similarlyin the future. From here on, our analysis assumes conventionalmetrology to be the SEM and the unknown quantities to bethe critical dimension (CD).

II. Design of Cost Analysis

A. Defining Total Profit and Relevant Parameters

In any metrology operation, the value obtained is an es-timate of the true value that is being measured. For eitherconventional or virtual metrology, let us assume that the truevalue of the quantity in question (unknown to the processengineer), is y, and the value estimated by the metrologyoperation is y. We assume there are three types of costassociated with each sample wafer going through a metrologyscheme:

1) Revenue per wafer: If the true value of a wafer sample isin-spec and the wafer goes through the processing line,it ultimately becomes an integrated circuit product (e.g.microprocessor). The fab then generates revenue fromselling this product. It is calculated by

revenue per die × # die per wafer × yield (1)

2) Processing cost per wafer: Cost required to furtherprocess one wafer until it becomes a final product. Itis calculated by

process cost per die × # die per wafer (2)

3) SEM cost per N wafers: Cost of using the physicalmetrology tool for processing N wafers. This includesdepreciation cost and employment of the operation en-gineer. We calculate the depreciation cost per year as

depreciation cost

year=

SEM tool cost

years used(3)

Then the SEM cost for N number of wafers is

depreciation cost

year× N wafers

wafers started per month × 12(4)


TABLE I

Values Used for Parameters in (1) - (5)

Fig. 2. Plot of true and estimated metrology values for an R2 of 0.94 for aVM model. The dotted lines are the ±3σ limits. The Type I and Type II errorregions are highlighted and labeled.

Here we assume with no loss of generality that the entire waferis accepted or rejected by the metrology operation. In addition,we assume for simplicity that there is no cost of operating avirtual metrology software tool. Thus, we can define a totalprofit for each wafer as:

total profit = revenue − process cost − SEM cost (5)

The financial parameters mentioned above vary widely de-pending on the type of chip (e.g. microprocessor or memory).As we are proposing a methodology for blended metrologyoptimization, we chose to demonstrate it using approximatebut realistic vlaues for current microprocessor manufacturingprocesses. We list the numbers used in equations (1) - (5) inTable I.

B. Costs Associated with Imperfect VM Predictions

We determine that a sample is “faulty” or “bad” if ametrology value is under or over the respective specificationlimits. Thus, if the estimated metrology value is bad, theprocess engineer will have to throw away (or re-work) thesample, and if the estimated value is good, the wafer continuesonto subsequent manufacturing processes. Overall, there are

four cases that can happen when a process engineer uses eitherof the metrology schemes. Depending on which category awafer is in, it incurs a different set of costs.

1) Correctly classified as good: The metrology tool (eitherconventional or virtual) classifies the sample as goodwhen the true value is good. The wafer goes throughsubsequent processes and is turned into final product,generating revenue for the company. Given some upperspecification limit (USL) and lower specification limit(LSL), this directly translates to

P(yvm = good | y = good)

= P(USL ≥ yvm ≥ LSL | USL ≥ y ≥ LSL)(6)

2) Correctly classified as bad: The metrology tool classifiesthe sample as bad when the true metrology value isbad, and the wafer is discarded. The correct metrologyestimate saves the company cost that could have incurredif the wafer was processed subsequently, as the finalproduct would have been defective due to the faulty truevalue.

P(yvm = bad | y = bad)

= P(yvm > USL ‖ yvm < LSL | y > USL ‖ y < LSL)(7)

3) Type I error: Also known as a “false alarm”, the metrol-ogy tool classifies the sample as bad when actually thetrue value is good (in-spec). The proportion of sampleswith Type I error is given by:

P(yvm = bad | y = good)

= P(yvm > USL ‖ yvm < LSL | USL ≥ y ≥ LSL)(8)

In this case, the wafer is discarded even though it couldhave been processed to become a final product.

4) Type II error: Also known as a “missed alarm”, themetrology tool classifies the sample as good whenactually the true value is bad. Similarly, the proportionof samples with Type II error is given by:

P(yvm = good | y = bad)

= P(USL ≥ yvm ≥ LSL | y > USL ‖ y < LSL)(9)

A missed alarm incurs the greatest cost because even thoughthe wafer goes through the whole manufacturing process, it isnot made into a final product at the end due to its defect. Thedifferent costs that occur for each classification category aresummarized in Table II.

As a visual example, we plot simulated pairs of true andestimated VM values with correlation coefficient ρ of 0.96, asseen in Fig. 2. Notice that this is a very high correlation for astatistical model. The corresponding Type I and Type II errorregions discussed above are highlighted and labeled.

III. Sampling and Process Scenarios for Blended

Metrology

A. Metrology Sampling Scenarios

One question that remains from Chapter II is how to devisea metrology sampling scenario. Traditionally, 2-3 wafers are


TABLE II

Different Kinds of Costs that Occur for a Metrology Scheme

sampled out of a lot and are observed through the SEM.We devise two sampling scenarios, both applied to a processwith low Cpk. The first is a short, non-interrupted process of3,000 wafers (or 120 lots) in which flagged wafers by VMor SEM are automatically discarded. The second is a long-term process of 10,000 wafers (or 400 lots) that allows fora SEM re-inspection whenever VM flags a wafer as faulty.Moreover, this scenario allows for process re-tuning. Since theobjective of this work is to determine when virtual metrologyis beneficial, we compare the total profit for the blended caseand conventional case (no VM used) for both scenarios. Wedefine some terms before we delve into details.

1) Initial Training Length (Rit): Number of wafers used toinitially train the VM model.

2) Length of VM Run (Rvm): Number of wafers “mea-sured” only through VM before additional SEM require-ments are collected.

3) Recalibration Length (Rrc): Number of wafers that needto be measured by the SEM for each recalibration step.

4) Metrology Delay Length (Rmd): Number of wafers pro-cessed while waiting for SEM measurements to becomeavailable. These wafers are considered to be at-risk ofbeing processed under faulty process conditions.

5) Retune Delay Length (Rrd): Number of wafers that couldhave been processed but missed because the process isbeing re-tuned back to target.

1) Scenario without Re-inspection and Re-tuning: Thenon-interrupted process of 120 lots consists of initially trainingthe VM model with real-time processing data and SEM metrol-ogy data (given by Rit). We assume that whenever a wafer issent to the metrology station, there is a metrology delay of2 lots (given by Rmd), or 50 wafers, that are unconditionallyprocessed while waiting for the metrology results. When thenew metrology data is available, the VM model is updatedusing this new data and applied to the next number of wafers.When the SEM or VM flags a wafer as being faulty, it isdiscarded. Finally, a number of wafers equal to Rrc are sent tothe SEM station to update the model again. This whole processis repeated throughout the simulation, as seen in Fig. 3. Xi

represents a n × p matrix of process conditions, and yi is an × 1 vector that represents actual metrology values or VMpredictions depending on the step. The conventional metrologyscheme for this scenario follows the same SEM metrologysampling frequency as that of blended metrology but no VMis applied and all other wafers unconditionally go throughsubsequent processing.

2) Scenario with Re-inspection and Re-tuning: The ex-tended process of 400 lots similarly trains the first Rit number

Fig. 3. Sampling scenario showing process condition matrices and CDvalues/estimates.

Fig. 4. Extended sampling scenario with interrupted flag inspection.

of samples to construct the first VM model and applies itto the next Rvm number of wafers. However, we account forthe limited prediction quality of VM by inspecting all flaggedwafers. That is, when VM estimates a wafer to be faulty, thesample is sent to the SEM and inspected one more time withthe risk of delaying Rmd wafers. If the wafer is determined tobe in-spec after the SEM inspection (i.e. VM has generateda Type I error), the process goes back to being monitoredby VM; if the SEM confirms the wafer is bad, the processis halted and re-tuned back to target, delaying Re-tune DelayLength amount of wafers to be processed. After either finishingone VM Run or re-tuning the process, Rrc number of wafersis collected from the SEM to update the VM model. If awafer is flagged during this phase, the process is re-tunedagain (there is no need to double-check as we assume SEMis 100% accurate). An overview of this scenario is shown inFig. 4. For the conventional case, process re-tuning only occurswhen SEM flags a wafer and all other wafers are processedunconditionally.

Fig. 5 shows a cartoon of what would happen to the total fabprofit vs. Rvm in a blended metrology scheme. With short VMRuns, there would be frequent updates of the VM model butalso more frequent SEM metrology delays, leading to a lot ofat-risk wafers. At the upper Rvm, the delay would not impactas much, but since the VM model is not updated enough, anincrease in Type I and Type II errors would bring down theprofit. In the middle would be a maximum profit optimized tothe fab’s characteristics.


Fig. 5. Optimization of blended metrology schemes as a function of Rvm.

B. Drifting Regression Model

As mentioned in Section I, we propose that a well-controlledprocess setting is not a compelling application for virtualmetrology. In addition, we want a dynamic yet realistic processthat will allow us to explore the dependence between VMmodel accuracy and Rvm. Such a process is a regression modelwith drifting coefficients. Consider a general linear process

yt = βt0 + βt

1xt1 + βt

2xt2 + . . . + βt

hidxthid + ε (10)

where ε is the observation error, and t is the superscript fortime. It is possible that the VM model misses estimating anindependent variable xt

hid and it’s coefficient βthid , and suppose

this hidden variable drifts over time by the amount δxhid . Atthe next time step, the process is now explained by

yt+1 = βt0 + βt

1xt1 + βt

2xt2 + . . . + βt

hid(xthid + �xhid) + ε

= βt0 + (βt

1 + �β)xt1 + βt

2xt2 + . . . + βt

hidxthid + ε.

(11)

Now the drift �xhid is captured in βt1 with �βxt

1 = βthid�xhid

as a linearly drifting coefficient. Thus, the final model of thetrue process would be

yt = �βtXt + εt, εt ∼ N(0, σ2ε )

�βt+1 = �βt + �u + �ηt, �ηt ∼ N(0, Q)(12)

where yt is the true CD, Xt are the regressor variables, andεt , �ηt are white noise variables. Finally, �u controls how muchthe coefficient drifts over time.

IV. Virtual Metrology Models

In this section, we provide a brief theoretical backgroundon the three VM models that are used in the analysis: LinearRegression, Exponentially-weighted Least Squares (EWLS),and the Kalman Filter (KF). To clarify how they are usedin our work, we explain the background in context of thesampling scenario shown in Fig. 3.

A. Linear Regression

Assuming the process is of the form

y = �βX + ε, ε ∼ N(0, σ2ε ) (13)

the well-known estimate of the coefficients is�βSEM = (XT

SEMXSEM)−1XTSEM�ySEM (14)

In the context of this work,

XSEM =

⎛⎜⎝

X1

X4...

⎞⎟⎠ , �ySEM =

⎛⎜⎝

�y1

�y4...

⎞⎟⎠

where Xi and �yi, i = 1, 4, 7... refer to Fig. 3 and denote thetraining data from the SEM. That is, we accumulate all datafrom the SEM not knowing the true process is faulty. Giventhis is our VM model, the CD prediction for subsequent VMsamples is:

�yVM = XVM�βSEM

= XVM(XTSEMXSEM)−1XT

SEM�ySEM

(15)

B. Exponentially-weighted Linear Regression

For this model, we assume that the engineer knows thereis some type of drift going on and wants to weight the recentobservations more heavily than the earlier ones. The weightsare given by

wt = α(1 − α)(Ntrain−t) (16)

where α is a tunable parameter. The best value for α was foundby training the model on different α values and finding theone that minimized the MSE. With W = diag( �w), the EWLSestimates of �β are given by

�βSEM = (XTSEMWXSEM)−1XT

SEMW�ySEM (17)

and similarly, the estimated y are given by

�yVM = XVM�βSEM

= XVM(XTSEMWXSEM)−1XT

SEMW�ySEM

(18)

C. Kalman Filter

This section largely references and follows the notation from[18]. More details about the Kalman Filter are in the text. Forsimplicity, we suppress matrix and vector notation. In our case,yt is a scalar, αt, εt, ηt are vectors, and Zt, Tt, Qt, Ht, Rt arematrices. Consider the linear Gaussian state space model

yt = Ztαt + εt, εt ∼ N(0, Ht)

αt+1 = Ttαt + Rtηt, ηt ∼ N(0, Qt)

α1 ∼ N(a1, P1), t = 1, . . . , n

(19)

where yt are the observations (e.g. CDs), αt are hidden statevariables controlling the process (e.g. process conditions anddrift of the process), and εt , ηt are independent Gaussian noisesequences with corresponding covariance matrices Ht and Qt .In

The objective of the filter is to obtain the conditional distri-bution of αt+1 given Yt for t = 1, . . . , n where Yt = y1, . . . , yt .That is, given SEM measurements up to time t, estimate theprocess conditions αt+1. Since all distributions are normal,conditional distributions of subsets of variables given othersubsets of variables are also normal. Now all we need isαt+1|Yt ∼ N(at+1, Pt+1) and then

at+1 = E(αt+1|Yt)

Pt+1 = Cov(αt+1|Yt).(20)


Since αt+1 = Ttαt + Rtηn, we have

at+1 = E(Ttαt + Rtηt|Yt)

= Tt E(αt|Yt),

Pt+1 = Cov(Ttαt + Rtηt|Yt)

= Tt Cov(αt|Yt)TTt + RtQtR

Tt

(21)

for t = 1, . . . , n. We now define vt , the one-step forecast errorof yt given Tt−1.

vt = yt − E(yt|Yt−1)

= yt − E(Ztαt + εt|Yt−1)

= yt − Ztat

(22)

Observing Yt is the same as observing Yt−1, vt , we see thatE(αt|Yt) = E(αt|Yt−1) + E(αt|vt). It is easy to see that E(vt) =0, and Cov(vt, Yt−1) = 0. Through the lemma in multivariatenormal regression

E(αt|Yt) = E(αt|Yt−1, vt)

= E(αt|Yt−1) + E(αt|vt)

= E(αt|Yt−1) + Cov(αt, vt)[Var(vt)]−1vt

= at + MtF−1t vt,

(23)

where Mt = Cov(αt, vt) = PtZTt , and Ft = Var(vt) = ZtPtZ

Tt +

Ht , when Yt−1 and vt are uncorrelated. Substituting everythinginto expression for at+1 and Pt+1, we get

at+1 = Ttat + TtMtF−1t vt

= Ttat + Ktvt, t = 1, . . . , n,

where Kt = TtMtF−1t = TtPtZ

Tt F−1

t .

(24)

Through similar analysis,

Pt+1 = TtPtLTt + RtQtR

Tt , t = 1, . . . , n, (25)

where Lt = Tt − KtZt. (26)

All the filtering recursion equations are given by

vt = yt − Ztat, Ft = ZtPtZTt + Ht,

Kt = TtPtZTt F−1

t , Lt = Tt − KtZt, t = 1, . . . , n,

at+1 = Ttat + Ktvt, Pt+1 = TtPtLTt + RtQtR

Tt

(27)

For missing observations and observations to be forecasted, vt

and Kt of the filter are set to zero [18], and the updates justbecome

at+1 = Ttat, Pt+1 = TtPtTTt + RtQtR

Tt .

Given these update equations, we use the following Zt , Tt , andαt matrices for the Kalman Filter to model the true process in(12).

αt =

(u

βt

), Zt =

(0 · · · 0 Xt

),

Tt =

(I 0I I

), Rt =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

0...01...1

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠

.

TABLE III

Maximum Mean Profit and Corresponding Rvm for Each VM

Model. (Without Re-Inspection)

For this analysis, we assume that Tt , Rt , and the variancesof the error terms are known. Realistically, when implement-ing the Kalman Filter, these parameters should be estimatedvia some parameter estimation algorithm (e.g., Expectation-Maximization algorithm) since the engineer does not knowthe true form of the state space model.

V. Results and Discussion

A. Scenario without Re-inspection and Re-tuning

We generate data based on (12) with the following param-eters:

β0 =

⎛⎝1

11

⎞⎠ , u =

⎛⎝0.0004

00.0004

⎞⎠ , Ht = 0.2, Qt = 02p×2p.

The Rit is set to 200 wafers, the Rrc to 20 wafers, andthe Rmd to 50 wafers (2 lots). 3,000 wafers are simulatedand the corresponding total profit, Type I / Type II error rate,and the MSE are calculated for each VM model (Fig. 6, 7,and 8). For repeatability, 100 simulations are repeated andthe average is plotted in bold to distinguish them from thefirst first 20 simulations. For the Kalman Filter, we start witha randomized hidden state vector as our initial guess. Onesubtle difference is that in Section II-B, we defined the errorsin terms of conditional probabilities. As it is hard to calculateP(y = good or bad) explicitly, we plot the joint probabilitiesinstead. Estimates of the beta coefficients for each model arealso shown in Fig. 9 (a), (b), and (c) along with the truecoefficient values in the background.

1) Linear Regression: One noticeable feature about Fig. 6is the non-existent Type I errors and the very large amountof Type II errors. This is due to the under-estimation of theregression coefficients as seen in Fig. 9(a). Since all past andpresent observations are weighed equally in the recalibrationphase, the coefficient estimates have significantly lower valuesthan the true ones. This constraint in the gain leads to VMestimates that are mostly inside the LSL/USL limits and as aresult, the model fails to catch most of the alarms that happen,leading to the significant Type II error probabilities. Relativelyconstant Type I / II patterns result in a flat total profit curve.The maximum mean profit occurs at $5.874 × 107 when theRvm is 145.

2) Exponentially-weighted Linear Regression: EWLRshows a significantly reduced VM MSE than Linear Regres-sion (Fig. 7), as only the recent observations are used inthe recalibration stage. This is also seen in Fig. 9(b), wherethe EWLR estimates accurately track the drift in the true


Fig. 6. Total profit, MSE, and Type I / II error probabilities for the Linear Regression VM model. The increase in estimation error is significant as the Rvm

increases. Relatively constant Type I / II patterns result in a flat total profit curve.

Fig. 7. Total profit, MSE, and Type I / II error probabilities for the Exponentially-weighted Linear Regression VM model. MSE and error probabilities havesignificantly improved compared to Linear Regression. Tradeoff between Type I / II errors results in an optimal point on the total profit curve.

Fig. 8. Total profit, MSE, and Type I / II error probabilities for the Kalman Filter VM model. This model shows the highest accuracy and lowest errorprobabilities as it is a filter constructed for a state-space model. Again, the tradeoff between Type I / II errors results in an optimal point on the total profitcurve.

coefficients. However, in contrast to Fig. 9(a), the estimateshave more variance. This translates to an increased probabilityof Type I error, and as seen in the third column of Fig. 7, theType I probability is not zero anymore. We see an increasein Type II errors as the Rvm increases because the coefficient

estimates for a certain VM stage stay constant while theprocess keeps on drifting. The tradeoff between Type I / IIerrors results in an optimal point on the total profit curve. Themaximum mean profit occurs at $5.921 × 107 when the Rvm

is 665.


Fig. 9. Regression coefficient estimates for each VM model for a Rvm of 100. (a) Estimates for LR. Notice that the coefficients are underestimated dueto retained memory from start of the process, leading to an increase in Type II error probabilities. (b) Estimates for EWLR. Variance of estimations haveincreased due to smaller training size, but tracks the drifting coefficients. (c) Estimates for KF. Most accurate performance out of the three models.

Fig. 10. Averaged total profit for the three VM models and no VM case.This is for the scenario without re- inspection/re-tuning. “Length of VMRun” would actually be the number of unmonitored wafers between SEMrecalibration measurements for the no virtual metrology case.

3) Kalman Filter: The Kalman Filter gives us the leastMSE out of all three VM models. Note that this is becausethe filter is constructed on such a state-space model like thetrue process in (12). Moreover, we have assumed that the exactvalue of some of the parameters are known, including the noisecovariance matrices. As seen in Fig. 9(c), the Kalman Filterestimates accurately track the true values. Although the TypeI and Type II error probabilities are quite low, we see that thevariances of these quantities increase as Rvm increases. Notethe increase in total profit variance as the Rvm increases. Thisis due to the increased variance of the hidden state variableestimate. Again, the tradeoff between Type I / II errors resultsin an optimal point on the total profit curve. The maximummean profit occurs at $5.909 × 107 when the Rvm is 340.Although the KF has the highest accuracy, this quantity islower than the weighted linear regression model due to thetime the KF needs to converge from the initial guess.

The average total profit for all VM models was plotted alongwith that of the conventional metrology scenario in Fig. 10. Forthis scenario, blended metrology schemes profit more than theconventional scheme in most regions. Although Fig. 10 shows

that the weighted linear regression VM model has the highestaverage total profit out of all three VM models, we want toemphasize that the example given in this paper is one of manycases and the results depend on different cost and revenuevalues. Consider an extreme example of a process with a veryhigh Type I cost; in this case, a linear regression-like modelwould be the most beneficial to the fab, even though this modelhas the worst MSE out of the three models. Even more, fordifferent processes, using VM may not result in higher profitfor the fab.

Given a LSL/USL, each model generates a different Type Iand Type II error pattern that directly translates to total profit.It is the combination of model accuracy and missed / falsealarm patterns that determine the optimal total metrologyprofit.

B. Scenario with Re-inspection and Re-tuning

We generate data based on (12) with the following param-eters

β0 =

⎛⎝1

11

⎞⎠ , u =

⎛⎝0.0002

00.0002

⎞⎠ , Ht = 0.2, Qt = 02p×2p.

The Rit is set to 200 wafers, the Rrc to 20 wafers, and theRmd to 50 wafers (2 lots) as well as the Rrd . 10,000 wafersare simulated and the corresponding total profit is plotted inFig. 11. We do not plot the MSE, Type I, and Type II errorrates of the VM models because the VM phase is no longercontinuous as in the first scenario.

As discussed in Section III-A2, a wafer is re-inspected everytime it is flagged by VM. If there were no virtual metrology,the total profit would again look like Fig. 5. Too short ofa VM run will decrease profit due to more frequent SEMrecalibration and more delays. Too long of a VM run willdecrease profit because it takes longer for the SEM to catchfaults and the process will have drifted farther from target.This is what we see in the fourth column of Fig. 11.

The LR VM model underestimates the CD and hardly flagsa wafer. Thus, the graph is similar to that of the no VM case,as seen in the first column of Fig. 11. For Exponentially-weighted Linear Regression, the total profit steadily decreasesand plateaus as the Rvm increases because the VM model is notupdated and again, the CD’s are underestimated. Total profit


Fig. 11. Total profit for each VM model and the conventional metrology scheme. This is for the scenario with re-inspection/re- tuning.

TABLE IV

Maximum Mean Profit and Corresponding Rvm for Each VM

Model. (With Re-inspection)

for the Kalman Filter steadily increases and plateaus becausethe filter continuously tracks the process drift accurately. Thetotal profit curve for both EWLR and the KF converges asRvm increases because even though the VM model is notrecalibrated enough, subsequent SEM measurements detect thedrift and the process is retuned. We again emphasize that eventhough the Kalman Filter seems to generate the most profit, itdoes not mean it is the best VM model in general. The maxi-mum total profit and corresponding Rvm is shown in Table IV.

VI. Conclusion

This work takes a step back from empirical VM model con-struction and analyzes when VM is actually useful for the faband how it can be optimized as a function of VM recalibrationfrequency. This is done by identifying 4 categories of VMestimate classification and defining critical parameters such asthe revenue, processing cost, and actual metrology cost. Theseparameters were combined into a metric we called the totalprofit. We concentrated our analysis on a largely deviatingprocess, the regression model with drifting coefficients, tolink the relationship between VM recalibration frequency, VMmodel accuracy, and total profit.

Two blended metrology scenarios were carried out on asimulated drifting process for 3 VM models, Linear Re-gression, Exponentially-weighted Linear Regression, and theKalman Filter. The first scenario was for a relatively shortprocess with less than optimal Cpk. The second scenario wasa long-term process with better but still low Cpk, which alsoallowed re-inspection to compensate for limited VM predictionquality and process re-tuning. Although the second scenario ismore realistic, the continuity of the first scenario provided uswith some valuable insights about blended metrology. Results

indicated that each VM model had different false (Type I)and missed (Type II) alarm patterns that translated to differenttotal profit patterns as a function of Rvm. Moreover, an optimalvalue of Rvm that gave the maximal total profit was identifiedfor each model. The analysis indicated the need for missedand false alarm pattern analysis, rather than solely focusingon increasing a certain accuracy metric.

In practical reality this means that a high VM proportionmay be beneficial early into the process lifecycle, while thisproportion may be gradually reduced as the process becomesmore mature, and therefore more stable. This finding is similarto the intuition that the overall role of metrology changeswith process maturity, starting with detecting and diagnosingsignificant deviations, and continuing to using metrology todrive more subtle run-to-run control adjustments.

In this work, we use plasma etching as the example becauseetching has historically been a step more suitable for VMdeployment. One could easily extend our analysis to physicalmetrology tools such as scatterometry, or even to “virtual”tools that consist of blending various physical tools (i.e. CD-SEM and scatterometry).

Although this work gives us a view into how the cost,error probabilities, and recalibration sizes are related, a lotof assumptions were incorporated into the process model. Forinstance, we assumed the observation errors were independentfrom one time to the next, whereas in reality some autocorrela-tion would need to be accounted for. In addition, we assumedexact parameter values for the Kalman Filter. It would beinteresting to see how the results change for a real processingdataset where the parameters of the Kalman Filter wouldhave to be estimated via some parameter estimation algorithm.Moreover, we believe the next step is to provide a theoreticalframework past Monte Carlo simulations that would providerobust and accurate decision rules for the metrology industry.

Acknowledgment

The authors want to thank R. S. of ASML for helpfuldiscussions on APC and metrology, and the reviewers for theirconstructive feedback.

References

[1] J. Moyne, E. del Castillo, and A. M. Hurwitz, Run-to-Run Control inSemiconductor Manufacturing, 1st ed. Boca Raton, FL, USA: CRC,2001.


[2] C. J. Spanos and G. J. May, Fundamentals of Semicondcutor Manufac-turing and Process Control, 1st ed. Hoboken, NJ, USA: Wiley-IEEEPress, 2006.

[3] C. J. Spanos, “Statistical process control in semiconductor manufactur-ing,” Microelectron. Eng., vol. 10, pp. 271–276, Feb. 1991.

[4] E. Sachs, A. Hu, and A. Ingolfsson, “Run by run process control:Combining SPC and feedback control,” IEEE Trans. Semicond. Manuf.,vol. 8, no. 1, pp. 26–43, Feb. 1995.

[5] F. T. Cheng, H. C. Huang, and C. A. Kao, “Dual-phase virtual metrologyscheme,” IEEE Trans. Semicond. Manuf., vol. 20, no. 4, pp. 566–571,Nov. 2007.

[6] A. A. Khan and J. R. Moyne, “An approach for factory-wide controlutilizing virtual metrology,” IEEE Trans. Semicond. Manuf., vol. 20,no. 4, pp. 364–375, Nov. 2007.

[7] A. A. Khan, J. R. Moyne, and D. M. Tilbury, “Virtual metrologyand feedback control for semiconductor manufacturing processes usingrecursive partial least squares,” J. Process Control, vol. 18, no. 10, pp.961–974, Dec. 2008.

[8] Y. C. Su, T. H. Lin, F. T. Cheng, and W. M. Wu, “Accuracy and real-timeconsiderations for implementing various virtual metrology algorithms,”IEEE Trans. Semicond. Manuf., vol. 21, no. 3, pp. 426–434, Aug. 2008.

[9] J. C. Y. Cheng and F. T. Cheng, “Application development of virtualmetrology in semiconductor industry,” in Proc. 31st IEEE Annu. Conf.Ind. Electron. Soc., 2005, pp. 124–129.

[10] P. H. Chen, S. Wu, J. Lin, F. Ko, H. Lo, J. Wang, C. H. Yu, andM. S. Liang, “Virtual metrology: A solution for wafer to wafer advancedprocess control,” in Proc. IEEE Int. Symp. Semicond. Manuf. Conf., Sep.2005, pp. 155–157.

[11] A.-J. Su, J.-C. Jeng, H.-P. Huang, C.-C. Yu, S.-Y. Hung, and C.-K. Chao,“Control relevant issues in semiconductor manufacturing: Overview withsome new results,” Control Eng. Pract., vol. 15, no. 10, pp. 1268–1279,2007.

[12] D. Zeng and C. J. Spanos, “Virtual metrology modeling for plasma etchoperations,” IEEE Trans. Semicond. Manuf., vol. 22, no. 4, pp. 419–431,Nov. 2009.

[13] P. Kang, H. J. Lee, S. Cho, D. Kim, J. Park, and C. K. Park, “A virtualmetrology system for semiconductor manufacturing,” Expert Syst. Appl.,vol. 36, no. 10, pp. 12554–12561, 2009.

[14] S. Lynn, J. Ringwood, E. Ragnoli, S. McLoone, and N. MacGearailt,“Virtual metrology for plasma etch using tool variables,” in Proc.IEEE/SEMI Advanced Semicond. Manuf. Conf., 2009, pp. 143–148.

[15] M. H. Hung, T. H. Lin, F. T. Cheng, and R. C. Lin, “A novelvirtual metrology scheme for predicting CVD thickness in semiconduc-tor manufacturing,” IEEE/ASME Trans. Mechatron., vol. 12, no. 3,pp. 308–312, Jun. 2007.

[16] Y. C. Su, T. H. Lin, F. T. Cheng, and W. M. Wu, “Implementationconsiderations of various virtual metrology algorithms,” IEEE Trans.Autom. Sci. Eng., pp. 276–281, Sep. 2007.

[17] J. Y. Baek and C. J. Spanos, “Optimization of blended virtual and actualmetrology schemes,” Proc. SPIE, vol. 8324, pp. 83 241K-1–83 241K-9,Apr. 2012.

[18] J. Durbin and S. J. Koopman, Time Series Analysis by State SpaceMethods, 1st ed. Oxford, U.K.: Oxford Univ. Press, 2001.

Jae Yeon Baek (S’13) received the B.S. degreein chemical engineering from Stanford University,Stanford, CA, USA, in 2010, and the M.S. degreein electrical engineering and computer sciences fromthe University of California, Berkeley, CA, USA, in2012. She is currently pursuing the Ph.D. degreeat the Department of Electrical Engineering andComputer Sciences, University of California.

Her current research interests include metrology,inspection, and advanced process control in semi-conductor manufacturing.

Costas J. Spanos (S’77–M’85–SM’95–F’00) re-ceived the Electrical Engineering Diploma from theNational Technical University of Athens, Athens,Greece, in 1980, and the M.S. and Ph.D. degrees inelectrical and computer engineering from CarnegieMellon University, Pittsburgh, PA, USA, in 1981 and1985, respectively.

In 1988, he joined the faculty in the Universityof California, Berkeley, CA, USA. He was the Di-rector of the Berkeley Microfabrication Laboratoryfrom 1994 to 2000, the Director of the Electronics

Research Laboratory from 2004 to 2005, and the Associate Dean for Researchat the College of Engineering from 2004 to 2008. From 2008 to 2010, heserved as the Associate Chair for the Department of Electrical Engineeringand Computer Sciences, and from 2010 to 2012, he was the Chair of thedepartment. He has published more than 200 refereed articles. From 1985to 1988, he was with the Advanced Computer-Aided Design Group, DigitalEquipment Corporation, where he focused on the statistical characterization,simulation, and diagnosis of very large scale integration processes. He hasco-authored a textbook on semiconductor manufacturing. His current researchinterests include the application of statistical analysis in the design and fab-rication of integrated circuits, and the development and deployment of novelsensors and computer-aided techniques in semiconductor manufacturing.

Dr. Spanos has served on technical committees of numerous confer-ences and was an editor of the IEEE Transactions on Semiconductor

Manufacturing from 1991 to 1994. He has received several Best PaperAwards. In 2000, he was elected a fellow of the Institute of Electricaland Electronic Engineers for contributions and leadership in semiconductormanufacturing.

Documents

Performance Evaluation of Blended Metrology Schemes Incorporating Virtual Metrology