Intel Technology Journal Q4, 1997 · Intel Technology Journal Q4, 1997 Preface Lin Chao ... Greg Atwood, Flash Technology and Manufacturing, Santa Clara, ... Q4. Intel Technology

Intel Technology Journal Q4, 1997

Preface Lin Chao Editor Intel Technology Journal This Q4'97 issue of the Intel Technology Journal focuses on innovations in memory chips and on better-yielding manufacturing techniques. The first two papers are written by the co-inventors of Intel StrataFlash™ memory. Intel's engineers produced the first flash memory product that could store multiple bits of digital information in one memory cell. Specifically, they gave us "2x the bits in 1x the space," and at a much lower cost. The road to success for StrataFlash memory was not without its trials and tribulations. Staffing and development resources were minimal, and at one time, the project was even shut down for a year. In the end, however, the engineers triumphed and proved that StrataFlash memory worked and could be produced reliably. It can be used in a variety of different applications: memory cards, resident code and file storage, digital imaging, and audio storage mediums. It is one of Intel's strategic assets. The third and fourth papers in this issue describe redundancy, the technique of using spare array elements in SRAM memory to replace elements that have tested defective. A mathematical model for redundancy is described in the third paper. The second-level cache packaged in the Pentium®Pro processor is one of the first devices at Intel to use redundancy, and its usage is outlined in the fourth paper in this issue. The fifth paper introduces a new PROM element compatible with CMOS logic processes.

Copyright © Intel Corporation 1997. This publication was downloaded from http://www.intel.com/. Legal notices at http://www.intel.com/sites/corporate/tradmarx.htm

http://www.intel.com/technology/itj/chao_bio.htm

1

Intel StrataFlashTM Memory Technology Overview

Greg Atwood, Flash Technology and Manufacturing, Santa Clara, CA, Intel Corp.Al Fazio, Flash Technology and Manufacturing, Santa Clara, CA, Intel Corp.

Duane Mills, Memory Components Division, Folsom, CA, Intel Corp.Bill Reaves, Memory Components Division, Folsom, CA, Intel Corp.

Index words: StrataFlash, MLC, flash, memory

Abstract

The Intel StrataFlashTM memory technology represents acost breakthrough for flash memory devices by enablingthe storage of two bits of data in a single flash memorytransistor. This paper will discuss the evolution of thetwo bit/cell technology from conception to production.

The flash memory business has grown from about $50Min 1987 to roughly $2.5B in 1997 due to its unique mixof functionality and cost. Flash memory devices are nowfound in virtually every PC and cellular phone and areone of the key components of the emerging digitalimaging and audio markets.

Cost per bit reduction of flash memory devices has beentraditionally achieved by aggressive scaling of thememory cell transistor using silicon process-scalingtechniques such as photolithography line widthreduction. In an attempt to accelerate the rate of costreduction beyond that achieved by process scaling, aresearch program was started in 1992 to develop methodsfor the reliable storage of multiple bits of data in a singleflash memory cell. The Intel StrataFlash two bit/cellmemory technology is the first output of the multi-bit percell storage effort. By storing two bits in a singlememory transistor, the memory cell area is effectively cutin half allowing the storage of twice as much data in thesame area as the standard single bit per cell technology.

This paper provides insight into the Intel StrataFlashmemory technology development effort. It discusses theevolution of the two bit/cell capability from conception toproduction and the challenges that were successfullyovercome to produce a high-quality product compatiblewith the standard single bit per cell devices. This paperalso presents examples that showcase the benefits of thecurrent Intel StrataFlash memory devices and discussessome of the driving forces for high density flash memory.

IntroductionHistory has shown that as the price of memory drops andthe density increases, the application usage and demandfor that memory will increase. The cost forsemiconductor memories (i.e., DRAM, SRAM, ROM,and flash) is largely determined by the amount of siliconarea it takes to store a data bit of information. As withother semiconductor memories, flash memory, whichretains its data even when the power is removed, achieveshigher density and lower cost through traditional siliconprocess scaling techniques, such as feature size reduction.To build on process scaling, a concept called Multi-Level-Cell (M.L.C.) technology was introduced. Thistechnology lowers the cost by enabling the storage ofmultiple bits of data per memory cell thereby reducingthe consumption of silicon area. The two bit/cell IntelStrataFlash memory technology provides a cost structureequivalent to the next generation of process technologywhile using the current generation of process technologyequipment. Figure 1 illustrates the substantialacceleration of the rate of cost reduction possible withM.L.C.

0.01

0.1

1

1996 1997 1998 1999 2000 2001 2002

Year

Rel

ativ

e C

ost

$/M

Byt

e

M.L.C. +ProcessTechnologyScaling

ProcessTechnologyScaling

Figure 1: Accelerated cost reduction using M.L.C.

Intel Technology Journal Q4’97

2

Introduction to Flash MemoryA discussion of the Intel StrataFlash memory andtechnology first requires a brief overview of the standardETOXTM flash memory technology and its use. Flashmemory is a member of the non-volatile class of memorydevices, storage devices that maintain their data in theabsence of applied power. The ETOX technology is thepredominate flash technology, representing over 70% offlash memory shipments. Data is entered into the flashmemory on a bit, byte, word, or page boundary throughan operation called programming. Once data is enteredinto the device it will remain, regardless of the presenceor absence of power. Data is cleared from the flashmemory with an erase operation. The contents of theflash memory are erased on a block boundary, where ablock size can be anywhere from 8Kbits to 1Mbitdepending on the product design.

The ETOX flash memory storage element, or memorycell, shown in Figure 2, is a single transistor with theaddition of an electrically isolated polysilicon floatinggate capable of storing charge (electrons). The amountof stored charge modifies the behavior of the memory celltransistor. This change in transistor behavior istranslated into stored data: the presence of charge isinterpreted as data "0;" the absence of charge isinterpreted as data "1." The single transistor memory cellresults in a small cell size, and thus a small amount ofsilicon area is consumed for the storage of one bit of data,resulting in low cost.

SourceDrain

Gate

Floating Gate

Stores Electrons

Figure 2: Single transistor flash memory cell

Flash Memory MarketThe combination of non-volatility, electrical alterability,and low cost is attractive to small systems that do nothave access to a continuous power source such as battery-powered devices. For example, almost every cellularphone sold today contains a flash memory device. Thisdevice stores the program that the cellular phone uses tocommunicate over the wireless network and interact with

the phone user. In some cases, the memory device alsostores incoming messages, much like a cellular digitalanswering machine. The non-volatility of flash ensuresthat, when you remove the battery from the phone, it willnot forget how to communicate nor will it loose any ofyour messages.

The unique combination of features provided by flashmemory has enabled the market for these devices to growfrom less than $50M in 1987 to over $2.5B in 1997.Flash memory devices are found in over 90% of PC’s,over 90% of cellular phones, and over 50% of modems.Applications are as diverse as airline flight recorders,medical recording equipment, digital answeringmachines, arcade games, printers, and network routers.Flash memory is a key component of the emerging digitalimaging and audio markets where it serves as the digital“film” or digital “tape.”

The Pursuit of Lower Cost MemoryThe growth of the flash memory market has been drivenby a continual increase in density and reduction in cost,enabling new applications to emerge and further fuel thedemand for more flash. Figure 3 illustrates the rapidincrease in the flash market size driven by the reductionin memory price. As the price of memory was reduced,new applications for flash memory emerged (someexamples are shown in Figure 3) fueling further marketgrowth.

1

10

100

1000

1988 1990 1992 1994 1996

Mem

ory

Pri

ce (

$/M

Byt

e)

0

500

1000

1500

2000

2500

3000

Market S

ize ($M)Industrial

NotebookPC BIOS

CellularNetworking

Source: Semico Research, May ‘97

Figure 3: Flash memory price and market size

Traditionally, cost reduction and density increase forflash memory has been driven by process scaling in thesame way as other semiconductor memory devices suchas DRAMs and SRAMs. As the ability of thesemiconductor manufacturing process equipmentimproves, smaller features can be resolved on the siliconwafer resulting in a smaller memory cell and thus morebits in a given amount of silicon area. More bits in agiven silicon area result in higher density memories andlower cost per bit. Using the technique of process


3

technology scaling, the flash memory cell size has beenreduced by 18 times in the past 10 years as shown inFigure 4. The reduced cell area combined with increasesin the size of the memory product (brought about byimproved manufacturing techniques and yields) hasresulted in a product density increase of over 100 times inthe same 10-year period.

1

10

100

1985 1987 1989 1991 1993 1995 1997

Year

Cel

l Siz

e µµ

m2

100

1000

10000

100000

Pro

du

ct Den

sity Kb

itFigure 4: Cell area and product density in time

The flash memory cell is a single transistor; one bit ofdata is stored in one transistor. By comparison, aSRAM memory cell requires six transistors (or fourtransistors and two resistors), a DRAM memory cellrequires one transistor and one capacitor, and anE2PROM cell requires two transistors.

A single transistor has been generally considered thesmallest practical unit for the storage of a bit of data. In1992, the Intel flash development team began a researcheffort to reduce the amount of silicon required to store abit of data to a fraction of a transistor through the storageof more than one bit in a single memory cell transistor.The Intel StrataFlash two bit/cell memory technologyrecently introduced is the first achievement of this multi-bit per cell storage effort. It provides the cost structure ofthe next-generation process technology while using thecurrent generation process technology equipment (seeFigure 5).

0.1

1

10

100

0 0.2 0.4 0.6 0.8 1 1.2

Lithography (µµm)

Cel

l Siz

e µµ

m2 /b

it

Two Bit/Cell Yields0.25µµ Litho Cell SizeWith 0.40µµ LithoEquipment

One Bit/Cell

Figure 5: Cell area as a function of lithography

The Multi-Bit Storage Breakthrough:Intel StrataFlash TechnologyAs discussed earlier, the flash memory device is a singletransistor that includes an isolated floating gate. Thefloating gate is capable of storing electrons. Thebehavior of the transistor is altered depending on theamount of charge stored on the floating gate. Charge isplaced on the floating gate through a technique calledprogramming. The programming operation generateshot electrons in the channel region of the memory celltransistor. A fraction of these hot electrons gain enoughenergy to surmount the 3.2eV barrier of the Si-SiO2

interface and become trapped on the floating gate. Forsingle bit per cell devices, the transistor either has littlecharge (<5,000 electrons) on the floating gate and thusstores a “1” or it has a lot of charge (>30,000 electrons)on the floating gate and thus stores a “0.” When thememory cell is read, the presence or absence of charge isdetermined by sensing the change in the behavior of thememory transistor due to the stored charge. The storedcharge is manifested as a change in the threshold voltageof the memory cell transistor. Figure 6 illustrates thethreshold voltage distributions for a half million cell(1/2Mc) array block. After erasure or programming, thethreshold voltage of every memory cell transistor in the1/2Mc block is measured, and a histogram of the resultsis presented. Erased cells (data 1) have thresholdvoltages less than 3.1v, while programmed cells (data 0)have threshold voltages greater than 5v.


4

1

10

100

1000

10000

100000

1.5 2 2.5 3 3.5 4 4.5 5 5.5 6

Cell Threshold Voltage in Volts

Nu

mb

er o

f C

ells

Erase

Program

Data = 1 0

Figure 6: Single bit/cell array threshold voltagehistogram

The charge storage ability of the flash memory cell is akey to the storage of multiple bits in a single cell. Theflash cell is an analog storage device not a digital storagedevice. It stores charge (quantized at a single electron)not bits. By using a controlled programming technique,it is possible to place a precise amount of charge on thefloating gate. If charge can be accurately placed to one offour charge states (or ranges), then the cell can be said tostore two bits. Each of the four charge states isassociated with a two-bit data pattern. Figure 7illustrates the threshold voltage distributions for a 1/2Mcblock for two bit per cell storage. After erasure or preciseprogramming to one of three program states, thethreshold of each of the 1/2Mc is measured and plotted asa histogram. Notice the precise control of the center twostates, each of which is approximately 0.3v (or 3,000)electrons in width.

1

10

100

1000

10000

100000

1 2 3 4 5 6 7Cell Threshold Voltage in Volts

Nu

mb

er o

f C

ells

1 1 1 0 0 1 0 0Data =

Figure 7: Two bit/cell array threshold voltagehistogram

Higher bit per cell densities are possible by even moreprecise charge placement control. Three bits per cellrequires eight distinct charge states; four bits per cell

requires sixteen distinct charge states. In general, thenumber of states required is equal to 2N where N is thedesired number of bits.

The ability to precisely place charge on the floating gateand at some later time sense the amount of charge thatwas stored has required substantial innovations andextensive characterization and understanding of celldevice physics, memory design, and memory test. Theseinnovations are discussed in detail in the paper entitled“Intel StrataFlash Memory Technology Development andImplementation” also published in this issue of the IntelTechnology Journal.

Evolution of the Intel StrataFlash MemoryTechnology DevelopmentThis section will outline the development of the IntelStrataFlash memory technology from conception in 1992to productization in 1997, highlighting the keyinnovations along the way. The 64Mbit product recentlyintroduced differs markedly from the 1992 view of what atwo bit/cell product might look like. The learning thathas occurred over the past four years has enabled thedevelopment of a two bit/cell memory device thatfunctionally looks almost identical to a one bit/celldevice, far exceeding the capability that was consideredpossible when the development program started. Figure8 shows the timeline of the major Intel StrataFlashmemory technology development milestones.

Figure 8: Intel StrataFlash development program

The Multi-Level-Cell (M.L.C.) Concept

92 93 94 95 96 97 98

MLCTechnologyAnnounced

2b/c 32MISSCC Paper

2b/c MiniatureCard Demo

2b/c ProductIntroduced

MLC R&DStarted


5

Storage of analog data in a floating gate memory deviceis not a new concept. It was suggested as early as 1971for EPROM devices[1] and was implemented on E2PROMdevices for use in neural networks, voice recorders, andtoys as early as 1982. These analog storage applicationscan tolerate a high error rate and thus do not placestringent requirements on the memory reliability oraccuracy. Neural networks are, by their nature, faulttolerant. Voice storage and simple talking toys cantolerate a few lost bits without any audible impact. Thesehigh error rate, lossy memories are generally not usablefor mainstream digital storage and thus have had limitedacceptance. The goal of the M.L.C. program was toproduce a two bit/cell digital storage technology capableof penetrating the larger non-volatile memory market,enabling the growth of new digital flash memoryapplications.

The 1992 View of M.L.C.

In the early 1990s, flash memory was considered as apotential replacement for hard disks at lower densities forapplications that require small, rugged, and low-powerstorage. One of the main issues for use of flash in thisapplication was the high cost of the flash memory ascompared to that of magnetic storage. A lower cost flashmemory was required. The hard disk requirements aremuch relaxed over silicon memory due to the inclusion oferror correction in the hard disk subsystem, the blocktransfer of data (no byte access), and the relative low readperformance. Multi-level cell technology appeared to bean ideal solution for the solid state disk, addressing thelower cost through two bit/cell (and later three or fourbit/cell) technology. The use of error correction, and thelarge block transfer of data in the solid state disk wouldaddress any reliability issues with multi-level storage.The Intel M.L.C. program was thus started with a goal ofa high-density, low-cost, solid-state disk.

The basic techniques for accurate charge placement andsensing were developed in the lab and implemented intoa 32Mbit silicon test chip. During this time frame, thethree major challenges for multi-bit storage wereidentified:

• Precise Charge Placement: The flash memory cellprogramming must be very accurately controlled,requiring a detailed understanding of the physics ofprogramming as well as the control and timing ofthe voltages applied to the cell.

• Precise Charge Sensing: The read operation of aM.L.C. memory is basically an analog to digitalconversion of the analog charge stored in thememory cell to digital data- a concept new tomemory devices.

• Stable Charge Storage: Meeting the data retentiongoals would require the stored charge to be stablewith a leakage rate of less than one electron per day.

The 32Mbit test chip clearly demonstrated the ability tostore multiple bits in a single cell. Based on thefunctionality of this device, the M.L.C. technology wasannounced in 1994.

The ‘First’ M.L.C. ‘Product’

With the knowledge gained from the 32Mbit test chip,the first attempt at a two bit/cell storage product wasstarted. This device was aimed at the solid-state diskgoal. The solid-state disk system would include errorcorrection and would generate non-standard voltages tointerface to the two bit/cell memory device. A specialDC to DC voltage converter was commissioned thatwould generate 12v+1% and 5.5+1% from a 3v+10%external supply. The M.L.C. part required these precisesupply voltages to perform the accurate program and readoperations. An error corrector was designed to beintegrated with the other control logic of the solid statedisk. A paper based on this 32Mbit M.L.C. memory waspresented at the prestigious International Solid StateCircuits Conference (ISSCC) in 1995[2], winning the bestpaper of the conference award. The 32Mbit devicebecame the workhorse for the M.L.C. technologydevelopment effort, demonstrating the ability of M.L.C.to meet Intel’s stringent reliability requirements and toproduce yield equivalent to single bit/cell flash memories.It was also used to develop the M.L.C. testing and todebug the manufacturing process for test and packaging.

The Question of Reliability

The primary concern for M.L.C. was the reliability of thestorage of the multiple charge states. Charge stateswould be separated by a few thousand electrons in anM.L.C. device, and a loss of one electron per day fromthe floating gate could result in a bit error after ten yearsof storage. To understand the detailed physics of chargestorage, a large experiment was started to monitor thecharge storage behavior of 200 billion cells (2x1011

cells). This massive experiment could resolve changes inthe stored charge of as small as 100 electrons on all ofthe cells under evaluation. The rate of charge loss wasaccelerated through the use of elevated temperatures.This experiment, which was started in early 1994, is stillrunning today with an accumulated high temperaturestress time of over three years, representing over 50 yearsat normal operating temperatures. The knowledgegained and models developed based on this experimenthave resulted in changes to the design of both the productand the process, allowing removal of the error correction


6

requirement for two bit/cells. This data fundamentallychanged the direction of the multi-bit storage program.

Removing the Constraints

Toward the end of 1995, the M.L.C. project had grownfrom a small research effort to a full blown program.Almost two years worth of reliability data was showingexcellent performance indicating that the error correctorwas not required. The 32Mbit device had demonstratedthe viability of the circuit techniques and the devicephysics used for the precision program and readoperations. Moreover, the yield was looking excellent,and the manufacturing issues were understood. Testcircuits had demonstrated the ability to provide therequired voltages and voltage regulation on the memorychip, eliminating the need for the external DC to DCconverter. It became clear that the project couldaccomplish much more than the initial vision of a solidstate disk. The team believed that it was possible toremove the two major requirements initially envisionedfor M.L.C.: error correction and precision external powersupplies. The solid state disk market, while developing,had not reached the desired volume levels. The decisionwas made to not take the 32Mbit device to productionand focus on the design of an M.L.C. two bit/cell partwith functionality substantially equivalent to the standardone bit/cell products.

The 1997 View of M.L.C.

The first two bit/cell Intel StrataFlash memory devicewas introduced in September of 1997, a 64Mbit device.This device has functionality that is largely equivalent tothe standard one bit/cell flash products. A highlightcomparison of the Intel StrataFlash memory features toan Intel 16Mbit single bit/cell product is shown in Figure9.

Read performance is in line with expectations formemories of 32Mbit and 64Mbit densities with about a20% increase in read access time for a doubling ofmemory density. Two bits/cell doubles the erase blocksize as compared to one bit/cell since each cell now storestwice as much data. The power supply is maintained atthe 5v industry standard. The two bit/cell writeperformance is maintained equivalent to one/bit cell,even with the more complex (and slower) precision writealgorithm, through the use of an eight-byte write bufferand a higher write bandwidth into the array. The 10,000erase/write endurance specification is more thanacceptable for virtually all flash applications and easilyjustified by the reduced cost.

The 64Mbit device integrates all of the knowledge gainedfrom the two previous test vehicles and advances beyondthem with the introduction of precision internal voltageregulation and internal test capability. The first siliconwafer out of the manufacturing line was fully functional,and the program is on track for volume shipments. The64Mbit two bit/cell Intel StrataFlash memory is just 5%larger than the 32Mbit one bit/cell device on the 0.4µETOX flash memory process, delivering on the promiseof 2x the bits in 1x the space and setting a new costparadigm for flash memory devices. A photomicrographof the 64Mbit Intel StrataFlash memory is shown inFigure 10.

Figure 10: The Intel StrataFlash 64Mbit memory

Examples of Intel StrataFlash Memory UsageIntel StrataFlash memory is finding acceptance in a widevariety of applications that all share a common need forhigh density and low cost. Applications evaluating theuse of Intel StrataFlash memory include small officevoice-mail PBX systems to store incoming messages,network routers to store operating programs, digitalcameras for digital image “film,” digital voice recordersfor digital audio “tape,” Windows* CE hand-heldcomputers for storage of programs and data, and set-topboxes for storage of programs and data. As for the goalof replacing hard disks in small, low-power, or ruggedsystems, Intel StrataFlash memory will enable solid state

1b/c Flash Memory

Intel StrataFlashtm

2b/c MemoryDensity 16Mbit 32Mbit 64MbitRead Speed 100 ns 120 ns 150 nsBlock Size 64KByte 128KByteArchitecture x8 x8 / x16

Vcc Power Supply (+/-10%) 5V 5V

Vpp (Program/Erase Voltage) 5V or 12V 5V

Effective Write Speed 6 µS/Byte 6 µs/Byte

Iccr (Read Current) 35 mA 55 mA

Ippw + Iccw (Write Current) 75 mA 90 mA

Endurance 100,000 Cycles 10,000 CyclesOperating Temperature Extended Comercial

Figure 9: Comparison of 1b/c and 2b/c productfeatures


7

disks to be cheaper than hard disks for capacities of32MByte and lower.

The Future of M.L.C.With two bits per cell well along the way to mainstreamproduction, attention turns to the prospects of evenhigher bit per cell densities. It is now recognized that theIntel ETOX flash memory cell and array structure isideally suited to the storage of multiple bits per cell dueto its simple direct access array architecture and provenhigh volume manufacturability. The direct accessmemory array allows the precise control of the cellvoltages required to reliably and reproducibly placeprecise amounts of charge on the floating gate. The wellcontrolled, high yielding ETOX process technologyensures that sufficient process margins exist for the morestringent M.L.C. requirements. Three bits per cellstorage has been demonstrated in the laboratory,achieving state widths of less than 0.15v on a 48Mbit testchip.

ConclusionThe concept of multi-level storage using an ETOX flashmemory cell transistor has been demonstrated. Thisconcept builds on traditional semiconductor processscaling, providing the cost structure of the next-generation silicon process technology with the currentgeneration of silicon process equipment. The evolutionof the multi-level-cell development from concept in 1992to production in 1997 has required many innovations inthe areas of device physics, circuit design, and producttest. The combination of precisely controlled on-chipvoltages and timing, the direct access ETOX flashmemory array architecture, and a highly manufacturablesilicon process technology has resulted in a two bit/cellmemory device largely identical to the industry standardone bit/cell flash memory. The Intel StrataFlash memorysets a new cost paradigm for the flash industry.

*All other brands and names are the property of theirrespective owners.

AcknowledgmentsA program of this scope requires many dedicated hoursby many creative individuals. The authors would like tothank the members of the Intel StrataFlash memorydevelopment team for the effort leading to the world’sfirst two bit/cell ETOX flash memory product.

References [1]Frohman-Bentchowsky, Floating Gate Solid State

Storage Device and Methodology for Charging and

Discharging Same, U.S. Patent #3,755,721, Aug. 28,1973.

[2] Bauer, M., et. al., “A Multilevel-Cell 32Mb FlashMemory,” Technical Digest IEEE International SolidState Circuits Conference, 1995, pp. 132,133.

Authors’ BiographiesGreg Atwood is an Intel Fellow and is the director ofFlash Memory Architecture. He received an M.Sc.degree in Physics from Purdue University in 1979 andjoined Intel the same year. He has worked on numeroustechnology development programs including Logic,SRAM, EPROM, E2PROM, Flash, and Multi-LevelFlash. He was the program manager for the IntelStrataFlash memory development program responsiblefor all aspects of the project. He holds 20 patentscovering a wide range of technical topics. Greg ispresently responsible for next-generation Flash and IntelStrataFlash memory technologies. His e-mail address [email protected].

Al Fazio is a principal engineer in Flash Technologydevelopment. He received a B.Sc. in Physics from theState University of New York at Stony Brook in 1982 andjoined Intel the same year. He has been involved indevelopment programs such as SRAM, EPROM,E2PROM, NVRAM, and Flash Memories. He wasresponsible for the technology development of the IntelStrataFlash memory. He holds more than a dozenpatents and has authored or co-authored several technicalpapers. He is presently responsible for Intel’s Multi-Level-Cell and Advanced Flash Memory Celldevelopment and currently serves as general chairman ofthe IEEE Non-Volatile Semiconductor MemoryWorkshop. His e-mail address [email protected]

Duane Mills is the Intel StrataFlash Memory DesignEngineering Manager. He received a B.Sc. in ElectricalEngineering from Purdue University in 1984 and joinedIntel the same year. He has worked on numerousEPROM and Flash memory designs during the past 13years. Mr. Mills was the design manager for the firstIntel StrataFlash memory product and is currentlyresponsible for the definition and design of the next-generation Intel StrataFlash product. His e-mail addressis [email protected].

Bill Reaves is the Intel StrataFlash Memory ProductEngineering Manager. He received a B.Sc. in Electrical


8

and Computer Engineering from U.C. Santa Barbara in1984 and joined Intel the same year. He has beeninvolved in numerous non-volatile memory projects,including the test development and manufacturing start-up of Intel's first flash memories. He is currently theProduct Engineering manager for the Intel StrataFlashMemory program and the project manager for the initialproduct line. His e-mail address [email protected].

1

Intel StrataFlashTM Memory Technology Developmentand Implementation

Al Fazio, Flash Technology Development and Manufacturing, Santa Clara, CA. Intel Corp.Mark Bauer, Memory Components Division, Folsom, CA. Intel Corp.

Index words: StrataFlash, MLC, flash, memory.

Abstract

This paper will review the device physics governing theoperation of the industry standard ETOXTM flash memorycell and show how it is ideally suited for multiple bit percell storage, through its storage of electrons on anelectrically isolated floating gate and through its directaccess to the memory cell. The device and reliabilityphysics aspects of the three key technology features ofmultiple-levels-per-cell (M.L.C.): precise chargeplacement, precise charge sensing, and precise chargeretention are discussed. The mixed signal designimplementation of these features is reviewed along withchallenges for low periphery circuit overhead andstandard flash memory product performance. Lastly,process manufacturing aspects are reviewed and it isshown how Intel StrataFlashTM memory is manufacturedon the same process flow and at the same high yields asstandard flash memory.

IntroductionThe concept of M.L.C. is ideally suited to the flashmemory cell. The cell operation is governed by electroncharge storage on an electrically isolated floating gate.The amount of charge stored modulates the flash cell’stransistor characteristic. M.L.C. requires three basicelements: (1) Accurate control of the amount of chargestored, or placed, on the floating gate such that multiplecharge levels, or multiple bits, can be stored within eachcell, an operation called placement; (2) accuratemeasurement of the transistor characteristics to determinewhich charge level, or data bit, is stored, an operationcalled sensing; and (3) accurate charge storage, such thatthe charge level, or data bit, remains intact over time, anoperation called retention. These elements are achievedby exploiting stable device operation regions and by thedirect cell access of the ETOX flash memory array.

Flash Cell Structure and OperationAn explanation of M.L.C. first requires a review of theflash memory cell. The ETOX flash memory cell andproducts[1] have a long manufacturing history, havingevolved in the late 1980’s from EPROMs, which had beenan industry standard from the early 1970’s.

Cell Structure

N+Source N+ Drain

Control Gate

Floating GateTunnel

Oxide

ONO

P- Substrate

Figure 1: ETOX flash memory cell cross section

Figure 1 shows a cross-sectional view of a flash cell. Itconsists of an N-channel transistor with the addition of anelectrically isolated poly-silicon floating gate. Electricalaccess to the floating gate is only through a capacitornetwork of surrounding SiO2 layers and source, drain,transistor channel, and poly-silicon control gate terminals.Any charge present on the floating gate is retained due tothe inherent Si-SiO2 energy barrier height, leading to thenon-volatile nature of the memory cell. Characteristic ofthe structure is a thin tunneling oxide (~100Å), an abruptdrain junction, a graded source junction, ONO (oxide-nitride-oxide) inter-poly oxide, and a short electricalchannel length (~0.3µ). Because the only electricalconnection to the floating gate is through capacitors, theflash cell can be thought of as a linear capacitor networkwith an N-channel transistor attached. The totalcapacitance of the cell (CTOT) is equal to the additive


2

capacitance of the network. For convenience, couplingratio terms, which are defined as the ratio of terminalvoltage coupled to the floating gate, can be defined asfollows:

GCR =control gate coupling ratio,

DCR = drain coupling ratio, and

SCR = source coupling ratio.

Therefore, a change in control gate voltage will result in achange in the floating gate voltage, ∆VFG=∆VCG*GCR.The basic equation for the capacitor network is

VFG=QFG/CTOT + GCR*VCG + SCR*VSRC+DCR*VDRN

(1)

where QFG = the charge stored on the floating gate.

A simple first-order transistor equation of drain currentsays

ID=GM*(V FG-VCG-VDRN/2)*VDRN (2)

where GM=qµeCOXZE/LE

This equation is very inexact for the small geometry of theflash cell, but nevertheless the conclusions hold.Substituting VFG of the basic coupling ratio Equation (1)into the basic transistor I-V Equation (2) leads to theconclusions that the transconductance of the transistor(and also the pre-threshold slope) degrades by GCR, whilethe threshold voltage, VT, depends upon QFG, the chargestored on the floating gate. Therefore, the VT dependsupon QFG, while the I-V shape does not. Very simply, theflash cell can be thought of as a capacitor which ischarged and discharged, the charge value beingdetermined by the amplification of the transistor I-V. Togive an idea of the amount of charge, every volt of cellthreshold corresponds to approximately 10,000 electronsof floating gate charge.

Cell Operation: Programming

Source Drain

Gate ~ 12V

~ 6 V

Figure 2: Cell bias conditions during programming

Programming a flash cell means that charge, or electrons,are added to the floating gate. Figure 2 shows the cell

bias conditions during program operation. A high drain tosource bias voltage is applied, along with a high controlgate voltage. The gate voltage inverts the channel, whilethe drain bias accelerates electrons towards the drain.Programming a flash cell, by channel hot electrons, can beunderstood by use of the lucky electron model[2], asillustrated by the energy band diagram in Figure 3. In thelucky electron model, an electron crosses the channelwithout collision thereby gaining 5.5-6.0eV of kineticenergy, more than sufficient to surmount the 3.2eV Si-SiO2 energy barrier. However, the electron is traveling inthe wrong direction. Its momentum is directed towardsthe drain. Prior to entering the drain and being sweptaway, this lucky electron experiences a collision with thesilicon lattice and is re-directed towards the Si-SiO2

interface, with the aid of the gate field. It has sufficientenergy to surmount the barrier. However, an electrondoes not have to be completely lucky. It can be“somewhat lucky” or “barely lucky,” making the processof programming efficient. We can observe from thismodel that the lateral field, determined by bias voltage,junction profiles, electrical channel length, and channeldoping are important to the effectiveness of generatingenergetic electrons and are therefore key to the M.L.C.placement operation. Hence the abrupt drain junction andshort channel length of the cell structure. Afterprogramming is completed, electrons are added to thefloating gate, increasing the cell’s threshold voltage.Programming is a selective operation, uniquely occurringon each individual cell.

Figure 3: Energy band diagram of programming

Cell Operation: Erase

The distinguishing feature between EPROM and flashmemory is the erase operation. EPROM removeselectrons from the floating gate by exposure to ultra-violet


3

light. A photon of this light source has high enoughenergy that if transferred to an electron on the floatinggate, that electron will have enough energy to surmountthe Si-SiO2 energy barrier and be removed from thefloating gate. This is a rather cumbersome operationrequiring a UV-transmissive package and a light source. Itis also rather slow and costly, often requiring the removalof the memory from the system. In flash, the contents ofthe memory, or charge, are removed by means of applyingelectrical voltages, hence to be erased in a flash, with thememory remaining in the system. The electrical erase offlash is achieved by the quantum-mechanical effect ofFowler-Nordheim Tunneling[3], for which the biasconditions are shown in Figure 4. Under these conditions,a high field (8-10MV/cm) is present between the floatinggate and the source. The source junction experiences agated-diode condition during erase, hence the gradedsource junction of the cell structure. As evidenced by theenergy band diagram of Figure 5, electrons tunnelingthrough the first ~30Å of the SiO2 are then swept into thesource. After erase has been completed, electrons havebeen removed from the floating gate, reducing the cellthreshold. While programming is selective to eachindividual cell, erase is not, with many cells (typically64k-Bytes) being erased simultaneously.

Source Drain

Gate: ~-10V

Float5-6V

Figure 4: Cell bias conditions during erase

Figure 5: Cell energy band diagram during erase

Cell Operation: Read

The read operation of the cell should now be apparent.Storing electrons (programming) on the floating gate (QFG

< 0), increases the cell Vt. By applying a control gatevoltage and monitoring the drain current, the differencebetween a cell with charge and a cell without charge ontheir floating gates can be determined (Figure 6). A senseamplifier compares the cell drain current with that of areference cell (typically a flash cell which is programmedto the reference level during manufacturing test). Anerased cell has more cell current than the reference celland therefore is a logical “1,” while a programmed celldraws less current than the reference cell and is a logical“0.” The floating-gate charge difference between thesetwo states is roughly 30,000 electrons.


4

Cel

l Dra

in C

urre

nt

Read Gate Bias

Control Gate Voltage

Erase Current“1”

ReferenceCurrent

Program Current“0”

Erased Cell

Reference Cell

Program Cell

Figure 6: Erase, program and reference cell I-V

Array Configuration

Figure 7 shows a schematic drawing of the flash memorycells in a NOR array configuration. In this configuration,cells on the same wordline, or row, share common controlgates. Cells on common bitlines, or columns, sharecommon drains, which are connected via low resistancemetalization, providing direct access to each cell’s drainjunction. The sources for cells in the array are common.They are connected locally via common degeneratelydoped silicon and globally via low resistance metalization.Decoders are linked to the control gate wordlines anddrain bitlines to uniquely select cells at the cross pointlocation. The direct access to the cell in this configurationversus alternative array architectures that have parasiticresistance or devices, ensures that accurate voltages canbe applied to the cell and IR drops are minimized. This isa key aspect to achieving M.L.C.

Source Lines (SL)SL SLSL

WL

WL

WL

WL

BL BL BL

Bitlines (BL)

Wor

dlin

es (

WL)

Figure 7: Array configuration

M.L.C. Key Features

We have reviewed thus far how a one bit per cell (1B/C)flash memory operates. As can be inferred from theprevious discussion, M.L.C. is simply a means by whichcharge on the floating gate is modulated and detected tolevels lower than the 30,000 electrons described above,such that intermediate charge levels, or states, can beextracted from the cell. These states can now representnot just the simple 1B/C “1” and “0,” but rather an M.L.C.representation with four distinct charge states: “11,” “10,”“01” and “00,” or 2 bits in one cell. These four distinctlevels are illustrated in the I-V curve of Figure 8. The keyaspects of achieving these intermediate states, or levels,are precise charge placement, precise charge sensing, andprecise charge retention.

Cel

l Dra

in C

urre

nt

Read Gate Bias

Control Gate Voltage

Erase Current “11”

Reference 1 current

Level “10” Current Range

Ref. 2 Current

Level “01” Current Range

Ref. 3 Current

Level “00” Current Range < 0

Figure 8: Cell and reference I-V curves of 4-level 2B/C

Precise Charge Placement

A comparison of Figures 6 and 8 shows that M.L.C.requires a means to control how much programmingoccurs within a cell. For a 1B/C product, all that isnecessary is to have enough programming to change a “1”into a “0.” Over-programming a cell to much higher Vt’s(adding more floating gate charge) would be fine. This isnot the case for M.L.C., where too much programmingwould cause an intermediate level to overshoot onto thenext level. For instance, if a “10” was desired, but a cellwas over programmed, a “01” might occur, leading toerroneous data. Therefore, a method of controllingprecisely how much charge is transferred to the floatinggate is required. Enough charge is needed to reach a statelevel without overshooting the desired level.


5

1.00E-07 1.00E-06 1.00E-05 1.00E-04 1.00E-03

Programming Time (seconds)

Pro

gram

min

g V

tControl Gate = N

Control Gate = N+1

Linear | Saturation

Figure 9: Programming threshold vs. time curve

To gain insight into how such precise control can beobtained, let’s take a deeper look into the flash cell’sprogramming characteristics. Figure 9 shows how theflash cell’s Vt changes as a function of log-time under twodifferent bias conditions. Two regions of operation areshown: linear and saturation, so-called because the linearregion is linear when plotted in linear time, and thesaturation region is where the cell Vt changes little withtime, analogous to a MOS transistor I-V curve. Note alsothat in the linear region, the control gate voltage has littleinfluence on the rate of programming, while in thesaturation region, the control gate voltage has a strongdependence upon the saturated Vt. A characteristic ofFigure 9 is that the flash cell programming slows as morecharge is added to the floating gate. The reason for thisbehavior is that when in the linear region, energeticelectrons, near the drain, are attracted to the floating gate.As programming progresses, the floating gate (which iscoupled to the control gate and drain biases as governedby Equation 1) becomes charged more negatively, until iteventually reaches the same bias potential as the drainvoltage. At this point, the energetic electrons becomerepelled by the floating gate charge. Programming slows,as near-drain electrons must tunnel through the SiO2

barrier, or less energetic mid-channel electrons “jump”over the barrier. The strong gate dependence results fromthe vertical field limitation in this region. One can also seefrom Figure 9 that the saturated Vt increases in a one forone fashion with an increase in the programming controlgate voltage. This is a simple result of the couplingEquation (1).

Given this characteristic curve, one could devise severalpossible methods of controlling the charge transfer to thefloating gate. These methods would have to pass thecriteria of being reliable (no overshoot), controllable(simple to implement), and fast (to ensure compatibilitywith standard flash memory product features).Programming in the linear region while being fast is notcontrollable. In this region, programming Vt is

exponentially dependent upon time and the electronenergy distribution (as determined by drain bias, channellength, doping profiles, etc.). Small variations will lead tolarge changes in the cell threshold and therefore overshootof the desired state, thereby having a high likelihood ofbeing unreliable. Minimization of these variations wouldbe also difficult to implement. In the saturated region, thecell Vt simply depends upon the applied control gatevoltage. Control in this region is more achievable. Withease of control, design optimization practices can beemployed to achieve fast programming. This will beshown later. Therefore, to achieve speed and control, aplacement algorithm that employed programming in thesaturated region was developed.

This leaves us with reliability. Unlike Fowler-NordheimTunneling, used for programming in addition to erase insome versions of flash memories and subject to erraticprogramming due to the presence or absence of as few asone or two holes trapped in the oxide[4], channel hot-electron programming has no erratic programmingmechanism. The programming threshold in saturation issimply a linear function of the applied control gatevoltage. Programming in this region can be forced into anunstable operating point, known as impact-ionizationinduced latch-up[5]. This is the point where an excess ofholes in the silicon substrate, created by the collisions ofthe energetic electrons with the silicon lattice, build up tothe point where the parasitic NPN transistor in the siliconsubstrate turns on. Proper architectural design of thesilicon process flow (i.e., use of EPI silicon) can easilyprevent this from happening.

Therefore, the three success criteria are satisfied byexploiting stable device operation regions, namelyprogramming in saturation. Now, a simple placementalgorithm was chosen for implementation (outlined in theflowchart in Figure 10). The algorithm consists of thesimple loop of programming in saturation, checking thecell Vt to determine if the desired state has been reached,stopping if the desired state is reached, or if not,incrementing the control gate voltage and providing anadditional programming pulse and continuing in thisfashion until the desired Vt has been achieved. In theIntel StrataFlash™ memory two bit per cell device, eachprogramming pulse within the placement algorithm willtransfer roughly 3,000 electrons of charge to the floatinggate.


6

Set InitialProgrammingParameters

Determine cellsrequiring placement, giveprogramming pulse forsaturated regime

Has cellReachedDesired Vt(Charge)Level? Done

YesNo

IncrementControl GateVoltage

Figure 10: Placement algorithm flowchart

Precise Charge Sensing

As can be seen from the flowchart in Figure 10, integral tothe placement algorithm is a means of detecting whetheror not the desired cell Vt has been achieved. Without aprecise means of sensing the floating gate charge, precisecharge placement would not be possible. A look back atEquation 2, the cell drain current-voltage relationship,gives some insight into what is required to achieve precisecharge sensing. Control gate and drain voltage controland process Leff, Zeff, mobility and oxide capacitancecontrol are important aspects of precise charge sensing.Drain voltage control is facilitated by direct access to thecell drain junction (bypassing any resistive IR drops)allowable in the ETOX NOR flash memory arrayarchitecture; and by applying a high enough drain voltageto operate in the saturated mode (normal MOS devicesaturated I-V, not programming saturation as previouslydiscussed) where drain bias variations have minimalcurrent impact. Process control is important, since the33,554,432 memory cells contained within a single IntelStrataFlash memory 64Mbit device represent a >10 sigmavariation, and is achieved by proper process architectureand manufacturing process control, derived from the tenyears of manufacturing experience with flash memories.Control gate voltage control is achieved by an on-chipread regulation circuit, which is fully explained in a latersection.

Flash memory has a unique feature associated with itsnon-volatility: the data write (placement) can occur underone condition of ambient temperature and system powersupply, while the read out of data (sensing) can occur at alater date, at different ambient temperature and system

power supply. Being fundamentally a MOS transistor, theflash cell’s drain current is a function of these ambientconditions. As such, the precise charge sensing isrequired to span wide ranges of operation. To facilitatethis needed precision, the reference levels that separatecharge state levels are generated by reference flash cellscontained on-chip. These reference cells, whose Vt levelsare precisely placed at manufacturing test under acontrolled environment, will have the same tracking withtemperature and power supply as the array flash cells.This contrasts to reference levels generated by othertransistor types (i.e., NMOS or PMOS), which havedifferent temperature, voltage, and process tracking thanthe flash memory cell. This lessens the necessaryconstraints on the read regulation circuitry.

Precise Charge Retention

Due to the non-volatility requirement of flash memory, itis important that any charge placed on the floating gateremain intact for extended periods of time, typically tenyears. This translates to a requirement of not losing morethan one electron per day from the floating gate. Ifelectron loss occurs from even one memory cell in anarray of millions, the data will be corrupted. The inherentstorage capability exists due to the Si-SiO2 energy barrierwhich traps electrons on the floating gate. The inter-poly-silicon oxide (ONO film mentioned in the cell structure) isprocessed to maximize charge storage capabilities[6].Under normal circumstances, the energy barrier allowscharge storage for hundreds of years. There areconditions of trapped oxide charge, known as intrinsiccharge loss[7], which can cause one-time shifts inthreshold. These shifts are rather small and arecompensated for during manufacturing test. Randomdefects in the insulating oxides that can lead to charge lossare less of an issue with low-defect, high-yielding processtechnologies, but if still present, are screened out by themanufacturing tests. These defects are driven to lowenough levels on ETOX flash memories where error-correcting-codes (ECC) are not needed. The remainingconcern for charge retention is any degradation to theinsulating oxides that occurs as a result of the stresses ofdevice operation.

During normal operation, high fields are applied to theflash cell. The presence of the high fields over time candegrade the charge storage capabilities of the device, byeffectively lowering the energy barrier, or by providingtraps sites in the oxide that can act as intermediatetunneling locations. The benefit of channel hot-electronprogramming, compared to tunneling for programming, isthat fast programming can occur at lower internal fieldsthereby lessening the probability of oxide damage.Nevertheless, occurrence of damage needed to beunderstood to ensure the stability of the M.L.C. charge.


7

Consequently, the charge retention ability of the insulatingoxides under various process and bias field conditionswere studied in great detail. Over the course of the fouryear M.L.C. development period, in excess of 200 billion(2e1011) flash cells were studied for charge retention, eachto a resolution of floating gate charge of ~100 electrons.This exhaustive study provided more physical insight intothe oxide damage mechanisms and has enabled us to buildlarge scale empirical models for charge retention. The netresult of this study was the ability to optimize processrecipes and operating bias fields to maximize chargeretention. This allows Intel StrataFlash memory tomaintain high reliability performance, without the use ofany ECC.

Mixed Signal Design Implementation

The implementation of the described charge-placementalgorithm and charge-sensing operation required a mixedsignal circuit design of both digital and precision analogvoltage generation, regulation and control circuits. Theplacement algorithm is executed by utilizing an on-boardcontrol engine, or the Flash Algorithmic Control Engine(FACE). FACE runs the placement algorithm bysequencing through the programming and sensing loops.During a read operation (sensing of data at a later time),the user has random access to the memory array. A readoperation performs a precision-sensing operation andinvokes circuitry controlling the precise cell bias voltages.

Placement Algorithm Implementation

The placement algorithm executed by FACE is stored in asmall on-chip programmable flash array. Theprogrammable microcode allows for flexibility inalgorithm changes. FACE, illustrated in Figure 11,consists of the microcode storage array, program counter(PC), arithmetic logic unit (ALU), instruction decoder,clock generator, register files, and input/output circuitry.FACE uses 6,000 transistors for logic and 32k bits of flashmemory for algorithm storage.

To describe the implementation of the placementalgorithm, let us assume that a group of cells (i.e., adouble-word, or 32-logical bits, 16-physical cells) is to beplaced and is initially in the erased state (lowest floatinggate charge state). Any cells not to remain in the erasedstate (representing logical data “11”) will receive aprogramming pulse. FACE will look up the drain andinitial control gate voltage stored in a permanent read-onlyregister located on-chip. FACE will then set the controlgate voltage through the digital to analog converter(DAC). The DAC circuit receives the FACE digital inputand divides the on-chip generated 12-volt power supply(VP12) to achieve the desired control gate voltage for thatparticular programming pulse. The drain voltage, used

during the programming pulse, is generated from aregulation circuit that sets the gate voltage on a sourcefollower. FACE will continue to supply the programmingvoltages for the pre-determined amount of time sufficientto reach the saturation region. When the programmingpulse is complete, FACE will reconfigure the circuits toperform the sensing portion of the algorithm, an operationcalled verification. The drain and control gate voltagesare now set to the same values as used in a user readaccess to ensure common mode between verification andread. FACE will take the result of the verification anddetermine which cells have reached their destinationcharge level and which have not. Those that have not willrequire an additional programming pulse with anincreased control-gate voltage. A cell that no longerrequires additional programming pulses will have thedrain voltage disabled by the program pulse selectorcircuit. This sequence of events continues until all cells inthe double-word have completed programming.

Analog Circuit Blocks for Precise Charge Placement

Placement requires precision voltages covering a range of4-12 volts, while the chip Vcc (user supplied voltage) iskept at a typical value of 5 volts. The voltages applied tothe memory array need to be internally generated andprecisely regulated. On-chip voltage generation isachieved by use of charge pumps, in which switchedcapacitors boost the user-supplied Vcc to higher values.Voltages are controlled using a precision voltagereference circuit and voltage regulation circuits (Figure11).

During a programming pulse, two charge pumps are used.One charge pump generates the internal 12V supply(VP12). This is used to supply a precision control gatevoltage to the flash cells, through the DAC circuit.


8

VP12

FACE

Voltage Generation and RegulationCircuits

VP9

SampleandHold

Vref

ChargePump

RegulationCircuit

DAC

RowDecode

FlashMemory

Array

ColumnSelector

ProgramPulse Selector

ChargePump

ReadCircuits

WriteRegulator

ControlandI/O

RegistersInput / Output

IR

ALU

Reg File

Code StoreFlash Array

PC

Figure 11: FACE and placement operation block diagram


9

VP12 also serves to generate the precision flash drainvoltage through the write regulation circuit (WRC). TheWRC generates a voltage that is applied to an NMOStransistor configured as a source follower. Thistransistor is in the bitline (or drain) path of the flash cell.The flash cell drain current is supplied through a secondpump that generates the signal VP9. This pump isrequired to supply the programming current for up to 32flash cells at a time.

During the placement algorithm, voltage stability iscritical to precise charge storage. Any variations in thereference circuit voltages will be seen as variations in theflash control gate voltage, to which the programmingsaturated Vt is directly related. To achieve this absolutestability in the voltage reference circuit, a sample andhold circuit is employed. At the start of the placementalgorithm, the sample and hold circuit samples thereference voltage and holds the value on a capacitorduring the running of the entire algorithm. Thisguarantees the control gate voltage varies from pulse topulse by only the desired step value and not by anyadditional components.

Circuit Blocks for Precise Charge Sensing

When the device is in the read mode of operation, FACEis disabled and the user has control to access the memoryarray. A read operation consists of sensing 16 bits worthof data from a random location in the memory array.With M.L.C., 8 flash cells are used to obtain 16 bits ofdata. During the read operation (Figure 12), the flashcell control gate voltage is controlled through a read

regulator circuit (RRC). Minimizing this voltagevariation will minimize the variations in cell current(Equation 2). This allows for more precise measurementof the charge level stored on the floating gate. Drainvoltage stability is also important to ensure that the flashcell being sensed has a high enough drain voltage tokeep the memory transistor operating in the saturatedregion of the MOS I-V.

Due to fluctuations in user supplied Vcc and a lowervalue than may be needed during read, an internalvoltage charge pump is used during a read operation togenerate the internal voltage to supply the flash cellcontrol gate. The RRC uses the same voltage referencecircuit that is used for voltage regulation during aplacement operation, as mentioned above. However, inthe case of a read operation, not as much voltagestability is required so the sample and hold circuitry isnot used.

Parallel Charge Sensing

High speed random access and precise charge sensingare accomplished through a parallel charge-sensingscheme. Through direct connections to each memorycell, the data read operation determines the level of eachmemory cell quickly, accurately, and reliably. The dataread operation senses which of the four levels thememory cell falls within based on the threshold voltagesof three reference cells. This is done simultaneouslywith three sense amplifiers (Figure 13), where each senseamplifier compares the flash cell current being sensed to

Vread

16

Read Mode Gate Voltage Generation and Regulation

RowDecode

FlashMemory

Array

ColumnSelector-

+Vref

ChargePump

Drain Bias Circuits

Sense AmplifiersOutputBuffers

8

8

Drain BiasReference(Vdbref)

Figure 12: Read operation block diagram


10

the current of the flash reference cells.

The memory cell and the reference cells are biased insuch a way that each conducts a current (Icell and Iref)proportional to their respective threshold voltage (Vt andVtRef). During a read operation, Vread is placed on thecontrol gates of the memory and reference cells, thesource terminals are grounded, and the drain voltages areset through a bias circuit that utilizes a precision voltagereference circuit.

The current for the memory cell being sensed iscompared to the current of the three reference cells. Thememory cell and reference cell current is converted to a

voltage through an active load transistor. The resultantvoltages are compared by the three sense amplifiers. Asense amplifier is associated with each of the threereference cells. Each sense amplifier also has an inputfrom the flash cell being sensed. If the current of the cellbeing sensed is greater than the current of the referencecell (Icell > Iref or Vt < Vtref,), the sense amplifieroutput is a logic “1.” If the current of the cell beingsensed is less than the current of the reference cell, thesense amplifier output is a logic “0.” The outputs of thethree sense amplifiers are connected to a logic circuitthat interprets the two data bits in parallel.

Cell Vt

OutputSenseAmp 1

OutputSenseAmp 2

OutputSenseAmp 3 D1 D0

Vt < VtR1 1 1 1 1 1VtR1 < Vt <VtR2 0 1 1 1 0VtR2 < Vt <VtR3 0 0 1 0 1Vt >VtR3 0 0 0 0 0

D1D0

Iref3Iref2Iref1Icell

SenseAmp

3

SenseAmp

2

SenseAmp

1 - +

DrainBias

DrainBias

DrainBias

DrainBias

- + - +

LogicCircuit

ReferenceCell Array

FlashCell Array

Figure 13: Parallel charge sensing


11

Low-Cost Design Implementation

Traditionally, a storage element in a memorycorresponds to one bit of information. To double theamount of memory, the memory array or memorystorage elements would need to be doubled. In additionto doubling the number of memory elements in the array,certain memory interface circuits must also be doubled.In particular, the memory array needs to be decodedrequiring wordline and bitline decoders. In a typicalsingle transistor non-volatile memory device (flash,EPROM), approximately 20% of the silicon area used isdue to these interface circuits required to access thearray. These interface circuits typically do not scale withprocess technology at the same rate as the memory arraybecause they have high voltage and analog requirements.

Intel StrataFlash memory doubles the storage capacity ofa memory device without doubling the memory arrayand the associated interface decoding circuitry.Additional circuitry is required to achieve the multiplebits per cell, but takes up a relatively small additionalarea. The additional overhead for circuitry is due mostlyto the additional sense amplifiers, reference circuitry,and circuitry for voltage generation or charge pumps.The additional silicon area required for this circuitryrepresents only an additional 5% over what is necessaryfor a one bit per cell device. Implementations of M.L.C.,which require externally supplied components (i.e.,microcontroller, ECC, and voltage regulators), have thecost savings of M.L.C. diminished by these peripheraloverheads. Intel StrataFlash memories achieve 2x thedensity at very close to 1x the area.

Low-Cost Process Manufacturing

ETOX flash memory has a long manufacturing history.As such, it was necessary that any implementation ofM.L.C. not disrupt that history by having unique processrequirements, which would cause a slow yield learningperiod or poor manufacturing throughput. First andforemost for M.L.C. to be successful, it must be able toride on a technology that produces error-free one bit percell flash memory. This requirement throughout Intel’sETOX NOR flash memory’s history has resulted in tightmanufacturing margins and the learning necessary forachieving such margins. Memories that rely on ECC foreven one bit per cell operation have little margin builtinto the basic technology. Throughout the previousdiscussions, mention has been made of processmanufacturing attributes for M.L.C. These attributeshave been achieved by utilizing the same process flow asthe standard one bit per cell flash memory. This

approach has maintained shared learning and has lead tolower costs. In other words, low-cost processmanufacturing was achieved through an understanding ofM.L.C. requirements up-front in the design of the basicprocess architecture at the generation where M.L.C. isintroduced. The tight manufacturing margins requiredfor M.L.C. are a natural extension of the learning frommanufacturing of error-free one bit per cell flashmemory and are well within the manufacturing,equipment, and process module capability.

Standard Product Feature Set

One of the main challenges in implementing M.L.C. ismaintaining product performance, usability, andreliability at the same levels as standard flash memories.If the implementation of M.L.C. resulted in a productthat did not satisfy these goals, it would be relegated to aniche in the marketplace. Key features for a non-volatilememory are programming speed, read speed, powersupply requirements, and reliability. This paper showshow our implementation of M.L.C. achieves thesefeatures. Before finishing, however, let us brieflydiscuss each one of them.

Programming Speed

Programming speed is achieved by choosing a placementalgorithm that exploits stable device operating points toenable circuit performance optimization to occur, withlittle limitations of flash device operation. Parallel cellprogramming (32 cells, or 64 bits) at a time alsoamortizes the placement algorithm run time. The choiceof charge sensing approaches also affects programmingspeed as it is integral to the placement algorithm.Sensing approaches other than those described in thispaper can be used. An example would be a sensingscheme that varies control gate voltage to detect thethreshold voltage directly. Such a scheme, while a moredirect measure of floating gate charge, does not exploitthe current drive capability of the flash cell, the driveused for sensing speed performance. To sum up, thechoices of algorithms, optimizations, and architectureare what allow M.L.C. programming to be as good orbetter than one bit per cell flash memories.

Read Speed

As mentioned above, the choice of fixed control gatesensing and utilization of the flash cell’s current drivecapability allows fast read operation. In addition,parallel charge sensing allows for fast decode of thelogic level, with little circuit overhead. As such, the readspeed of Intel’s StrataFlash memory is consistent withthat of one bit per cell flash memories of comparable bitdensity.


12

Power Supply

As also discussed, the on-chip voltage generation andregulation is key to the implementation of M.L.C. Onecould specify an M.L.C. product that uses externallysupplied precision voltages, but such a product would bemore costly to the user, who would have to pay for thepower supply, memory, and board space. Having thevoltages generated and regulated on-chip allows for theIntel StrataFlash memory to plug directly into existingflash memory applications.

Reliability

Starting with high-yielding, low-defect memory,exhaustive cell studies and process and biasoptimizations allow for an implementation of M.L.C.that achieves non-volatility and high reliability withoutrequiring on-chip or system ECC. Thus the user caninterface to the device with random memory locationaccess, without latency for correction. Additionally,ECC requires overhead bits, which would diminish thecost advantages of M.L.C.

These standard flash memory product features, coupledwith low-cost circuit design and manufacturing processimplementation allow users to benefit from the low costof M.L.C. without having to sacrifice needed features orperformance.

ConclusionIt has been shown how Intel StrataFlash memoryachieves multiple bits per cell, coupled with traditionalprocess scaling, to provide an advance in memory costreduction. The M.L.C requirements of precise chargeplacement, precise charge sensing and precise chargeretention are achieved by exploiting stable device-operating points and direct access to the memory cell,employing mixed signal digital and analog design. Non-cell-related costs are held low by riding on the tightmanufacturing margins developed for error-free one bitper cell flash memories. A standard product feature setensures that the cost advantages of M.L.C. are availableto the mainstream flash memory market.

AcknowledgmentsThe authors would like to thank the members of theM.L.C. development groups, whose dedicated workhelped turn a few ideas into a product reality.

References [1] Kynett, V.N., et. al., “An In-System

Reprogrammable 256K CMOS Flash Memory,”Technical Digest IEEE International Solid StateCircuits Conference, 1988, pp. 132-133.

[2] Tam, S., Ko, P.K., and Hu, C., “Lucky-ElectronModel of Channel Hot Electron Injection inMOSFET’s,” IEEE Transactions Electron Devices,September 1984.

[3] Lenzlinger, M. and Snow, E.H., “Fowler-NordheimTunneling into Thermally Grown SiO2,” Journal ofApplied Physics, vol. 40, No. 1, January 1967, pp.278-283.

[4] Ong, T.C., et. al., “Erratic Erase in ETOXTM FlashMemory Array,” IEEE VLSI Symposium, 1993, p.145.

[5] Eitan, B. and Frohman-Bentchkowsky, D., “SurfaceConduction in Short-Channel MOS Devices as aLimitation to VLSI Scaling,” IEEE Transactions onElectron Devices, vol. ED-29, No. 2, February 1982,pp. 254-266.

[6] Wu, K., et. al., “A Model for EPROM IntrinsicCharge Loss Through Oxide-Nitride-Oxide (ONO)Interpoly Dielectric,” 28th Annual Proceedings IEEEInternational Reliability Physics Symposium, 1990,p. 145.

[7] Mielke, N., “New EPROM Data-Loss Mechanisms,”21st Annual Proceedings IEEE InternationalReliability Physics Symposium, 1983, p. 106.

Authors’ BiographiesAl Fazio is a Principal Engineer in Flash TechnologyDevelopment. He received a B.Sc. in Physics from theState University of New York at Stony Brook in 1982and joined Intel the same year. He has been involved indevelopment programs including SRAM, EPROM,E2PROM, NVRAM, and Flash Memories. He wasresponsible for the Technology Development of the IntelStrataFlashTM memory. He holds more than a dozenpatents and has authored or co-authored severaltechnical papers, two of which have receivedOutstanding Paper Awards at the IEEE InternationalReliability Physics Symposium and at the IEEEInternational Solid State Circuits Conference. He ispresently responsible for Intel’s Multi-Level-Cell andAdvanced Flash Memory Cell Development andcurrently serves as General Chairman of the IEEE Non-Volatile Semiconductor Memory Workshop. His e-mailaddress is [email protected]


13

Mark Bauer is a Senior Staff Engineer in Flash CircuitDesign. He received his B.S.E.E. from the University ofCalifornia, Davis in 1985. He joined Intel’s MemoryComponents Division that same year, working onEPROM design. He was responsible for Circuit DesignDevelopment of the Intel StrataFlashTM memory. Heholds more than a dozen patents in the field of non-volatile memories and has authored two technicalpapers, one of which received an Outstanding PaperAward at the IEEE International Solid State CircuitsConference. He is presently responsible for Intel’s nextgeneration Multi-Level-Cell Circuit Design. His e-mailaddress is [email protected]

1

Redundancy Yield Model for SRAMS

Nermine H. Ramadan, STTD Integration/Yield, Hillsboro, OR, Intel Corp.

Index words: Poisson’s formula, yield, defect density, repair rate

Abstract

This paper describes a model developed to calculate thenumber of redundant good die per wafer. A blockredundancy scheme is used here, where the entiredefective memory subarray is replaced by a redundantelement. A formula is derived to calculate the amount ofimprovement expected after redundancy. Thisimprovement is given in terms of the ratio of the overallgood die per wafer to the original good die per waferafter considering some key factors. These factors arememory area, available redundant elements, defectdensity and defect types with respect to the total reject dieand defect distribution on the memory area. The modeluses Poisson’s equation to define the yield, then theappropriate boundary conditions that account for thosefactors are applied. In the case of a new product, knowingthe die size, memory design, and total die per wafer, themodel can be used to predict the redundancy yield for thisproduct at different initial yield values. Optimizing thememory design by varying the number of memory blocksand/or redundant elements to enhance redundancy is alsodiscussed. The model was applied to three products fromtwo different process generations and showed goodagreement with the measured data.

IntroductionDue to the continuing increase in the size of memoryarrays, reaching a high yield from the same wafer ismore challenging than ever. Redundancy is a way toimprove the wafer yield and to reduce the test cost pergood die by fixing potentially repairable defects. In orderto forecast the volume of a certain product whenredundancy is applied, it is important to estimate, asaccurately as possible, the number of die gained afterredundancy.

Redundancy is the process of replacing defective circuitrywith spare elements. In SRAMs, rows and/or columnscan be replaced, as well as an entire subarray. In a

previous study[1], a redundant yield estimationmethodology was developed. It is applicable to row,column or block redundancy schemes. It distinguishesbetween repairable and non-repairable faults within amemory block. In order to apply this method, new CADtools are required. This method is useful if row orcolumn redundancy is used.

This paper will focus only on the yield estimation forblock redundancy, as block redundancy was preferredover row and column redundancy for the SRAMarchitecture. It is usually easier to replace the entiresubarray. This might seem like overkill; however,replacing the entire subarray allows for the replacementof defective peripheral circuits in addition to just thememory array elements. It also allows for thereplacement of multiple bad bits, or other combinationsof failing bits, rows and columns.

A yield multiplier M is defined as the ratio of the totalgood die after redundancy to the original good die perwafer, or

M = total redundant good/original good (1)

so that the redundant yield , Yred , is given as

Yred = M x Y (2)

where Y is the initial yield. Forecasting of theredundancy yields is based on how accurately the factorM is calculated. A formula for M was obtained by usingthe correlated defect model. According to this model, anexpression for the yield of die containing a number ofdefects, I, is given by

yI=(n+I+1)! x (DA)I/(I! nI -1)x(1+D A/n)n+I x fI (3)

whereyI = yield of a die with I defects


2

D = average defect density ( #/cm2 )A = die area (cm2 )n= correlation factor between defectsf = fraction of the die area that contains thedefects

The yield of die with zero defects can be obtained bysetting I = 0 and f = 1 as

Y = 1 / 1 + (A D / n) n (4)

With n = 4 and using equation (4) to substitute for thedefect density, equation (3) becomes

yI= Y x ((I+3)(I+2)(I+1) /6) x fI x ( 1-Y 1/4) I (5)

Introducing g as the fraction of repairable defects, gvaries depending on the number of repaired defects. Anexpression for M was obtained by summing yI over theratio of correctable defects and substituting in (2)

M = 1 + kΣI = 1 ((I+3)(I+2)(I+1) /6) x( g f ( 1-Y 1/4)) I (6)

M was calculated by entering arbitrary values of g and fin equation (6). However, there was no evidence tosupport the values of the repairable defect densityrepresented by g used to calculate M.

Another formula was used to estimate the yield multiplierM . The yield is derived from Poisson’s equation [2]

Y = exp ( -AD ) (7)

Instead of using a constant defect density, D, Murphyassumed several defect density distributions[3]. The mostpreferred distribution was a Gaussian. Stapper used agamma distribution, which led to the following yieldformula [4]

Y = 1 / 1 + (A D / α) α (8)

where α is the average value of the coefficient ofvariation for the gamma function. The yield multiplierderived from the previous yield formula is given by

M = S x ( 1 + 0.01 ( L + I ) Asb D / k ) k (9)

whereS = fuse programming success rate

I = number of redundant elementsL = number of subarraysAsb = area of subarray ( mm2 )k = constant for MOS process

0.01=conversion from mm2 to cm2

This simple formula is actually overestimating theredundancy improvement, since it assumes that all thedefects are repairable.

In order to get a better estimate of the yield improvement,the nature and distribution of the defect need to beunderstood. These are taken into account in this model.When considering defects, it is important to realize thatnot all reject die are repairable: a die failing for a short,for example, cannot be repaired. Also the number ofdefective subarrays that could be repaired depends on theavailable redundant elements per memory block. Thismeans that having more than one defect per die requiresa certain distribution of those defects in order forredundancy to be successful.

Taking into account the above factors and usingPoisson’s equation to describe the yield, the presentmodel was able to predict the redundant yield within thesame range as shown by the real data. The followingsection illustrates how the key parameters affectingredundancy are used to develop the model.

SRAM Array LayoutFigure 1 shows the layout of a SRAM memory array.Before going into details, the following terms are definedas they will be used throughout the paper:• Subarra y. This is a unit array of the memory area,

and is shown as subarrays 0 to 72 in Figure 1.• Memory block or bloc k. This is a segment of the

memory area, and is one of four rows shown inFigure 1.

• Redundant element or elemen t. This is a sparesubarray used to replace a memory subarray, and isgiven as subarrays R in Figure 1.

The die consists of two areas:• Repairable are a. This includes all the circuitry in

the subarray. In this model the repairable area is thesum of the areas of the memory subarrays.

• Non-repairable are a. This includes the peripheryarea. The redundancy elements are also consideredpart of the non-repairable area.

Block redundancy is illustrated in Figure 1. The defectivesubarray “4” is replaced by the redundant element R inthe same memory block. This is done by programmingthe right fuses and shifting the array assignments.


3

SRAM Array Layout

I/O 0 0 1 2 3 4 5 6 7 8 R 17 16 15 14 13 12 11 10 9 I/0 1

I/O 2 18 19 20 21 22 23 24 25 26 R 35 34 33 32 31 30 29 28 27 I/0 3

I/O 4 36 37 38 39 40 41 42 43 44 R 53 52 51 50 49 48 47 46 45 I/0 5

I/O 6 54 55 56 57 58 59 60 61 62 R 71 70 69 68 67 66 65 64 63 I/0 7

SRAM Array Layout

I/O 0 0 1 2 3 4 17 16 15 14 13 12 11 10 9 I/0 1

I/O 2 18 19 20 21 22 18 18 18 18 R 35 34 33 32 31 30 29 28 27 I/0 3

I/O 4 36 37 38 39 40 36 36 36 36 R 53 52 51 50 49 48 47 46 45 I/0 5

I/O 6 54 55 56 57 58 54 54 54 54 R 71 70 69 68 67 66 65 64 63 I/0 7

4 5 6 7 8

Figure 1: SRAM array layout used in the model

Definitions• Yield Equation: The yield model used here is Poisson’s yield model [2]

Pn = λ n exp ( - λ ) / n ! (10) where

Pn = the probability of n defects on a die of areaA and defect density D λ = A D

Defining the failure probability as the probability that adie has one or more defects as Fn = ∑∝

1 λn exp ( - λ ) / n ! = 1 - exp ( - λ) (11)

and defining the yield as the survival probability Sn = 1 - Fn (12)

the yield equation is then Y= exp ( - λ) = exp ( -AD ) (13) • Improvement Factor: The improvement factor is defined as

M = (Ngd + Nrep) / Ngd (14) where Ngd = initial number of good die per wafer Nrep = number of repaired die per wafer Using Poisson’s formula to derive an expression for Ngdand Nrep, Ngd can be defined as the number of die withzero defects Ngd = N exp ( - λ) (15) where N is the total number of tested die. An expressionfor Nbd is given by Nbd = N ( 1 - exp ( - λ) ) (16) which is the number of die with one or more defects.Assuming all bad die are repaired, Nrep is then equal toNbad, and a formula for a maximum improvement factoris given by Mmax = exp ( - λ) (17) However, this in fact is not the case; therefore, Nrepneeds to be represented by a more realistic formula. Themodel described in this paper started with a simpleassumption and more details were gradually added to getas close as possible to the real case. The following sectionillustrates this development.

Yield Improvement FormulaSimple Model As a first approach, Nrep is represented by the number ofdie with one defect, and Poisson’s formula is used againto describe Nrep(1) as Nrep(1) = N λ exp ( - λ ) (18)

Substituting the improvement factor formula, equation(14) Msimp = 1 + λ (19) As mentioned before, not all the die area could be fixed;only the memory area was repairable. Instead of A, thetotal area, Arep, is used in the expression of λ, where Arep = Farea x A (20) and Farea is the fraction of the repairable area. Since onlyrandom defects can be fixed by redundancy, the random


4

defect density is used in the expression of λ, where D isfound from the yield equation, equation (13), as D = - ln Y /A (21) Here Y is the random yield and is given by Y = Ngd/Nand is calculated from the data. Using Arep instead of A,the expression for λ that will be used for the rest of theanalysis is then

λ = Farea x A D (22)

Cumulative Model Next, a better definition of Nrep is obtained: the numberof die with one, two, or n defects. Poisson’s equation isused to derive a formula for the number of die with acertain number of defects. Since Pn = λn exp ( - λ ) / n ! (10) where n is the number of defects, the followingimprovement factors can be defined: M1 = improvement factor from die with one defect M2 = improvement factor from die with two defects Mn = improvement factor from die with n defects and isequal to Mn = 1 + Σ Nrepn / Ngd (23)

where

Σ Nrepn = N Σ λI exp( -λ / I ! (24) from I = 1 to I = n The improvement factors are then given by M1=1+λ………………………………1 def/die M2=1+λ+(λ)2/2!……………………...2 def/die and for n defects per die Mn= 1+λ +( λ ) 2 / 2! + ( λ ) 3 / 3! + …+ ( λ ) n / n! (25) Since there is a possibility of having more than onedefect per block, λ must be multiplied by a so-calledrepair probability Rn, where Rn is the ratio of thecombination of blocks and defects that can be repaired tothe total number of combinations. This depends on theavailable number of redundant elements. Mn is thenwritten as

Mn= 1+ R1 λ + R2 ( λ ) 2 / 2! + R3 ( λ ) 3 / 3! + … + Rn( λ ) n / n! (26) An expression for Rn is found by using a binomial seriesexpansion. If G = X + Y, where X and Y represent thenumber of blocks, the resulting binomial series is G n = ( X + Y ) n = X n + n X n-1 Y + n C 2 X n-2 Y 2 + …+ n C k X n-k Y k + Y n (27) with n C k = n !/ k ! ( n-k ) !

(28) as the coefficient of X. Note that this coefficientrepresents the number of terms with X raised to a certainpower, where this power represents the number of defectson this block. If G contains more than two terms, or more than twoblocks, G is the written as G = X + Y + Z + … up to b blocks and the series becomes G n = ( X + Y + Z +.. ) n = X n + n X n-1 Y + n X n-1 Z+ n X n-1 … + n C 2 X n-2 Y 2 + n C 2 X n-2 Z 2 + ... + n C k

X n-k Y k + n C k X n-k Z k +… + Y n + Z n + … (29)

Knowing that each redundant element, e, can fix onedefect, a term raised to the power of e+1 or higherindicates that it has more defects than elements and itcannot be fixed. This means that the number of possiblyrepaired blocks is equal to the total number of blocks anddefect combinations minus the sum of coefficients of theterms raised to the power of e+1 or higher. All terms canbe treated similarly, since all blocks are equal, and termsraised to the same power are collected together. Theircoefficients can then be added together as well. Eachcoefficient in the previous series is repeated b-1 times forb terms. Except for the highest power in the series, itexists only b times. This means that the sum ofcoefficients can be written as

sum = Σ b ( b - 1) ( n - k ) n ! / k ! (n - k )! (30) from k=1 to k = n, the number of defects. The number ofrepairable blocks is then G rep = ( b ) n - Σ b ( b - 1) ( n - k ) n ! / k ! (n - k )! from k = e+1 to k = n and n ≥ k always


5

From the definition of Rn, the total combination of blocksand defects can be given by bn. The repair probability isthe ratio of the possibly repaired count to the total count,or R n = G rep / b

n

= ( b ) n - Σ b (b - 1) ( n - k ) n ! / k ! (n - k)! /( b) n (31) and the formula for the cumulative improvement factor is Mn= 1+R1 λ+R2 (λ) 2 / 2! + … + Rn(λ ) n/ n! (32) Note that this formula is applicable to up to e x b defects,which is the total number of elements and blocks; beyondthat it is not useable. Higher order terms in the series arealso negligible and can be ignored without affecting theimprovement factor. General Model A general model is developed by including the effect ofdefect type in the previous improvement factor formula.The cumulative model is in fact overestimating the realdata, because it assumes that all die are repairable.Studying the reject die data, it was found that onlycertain die could be fixed, namely raster type bins whichoccupy a certain fraction of the total reject die population.Adding to this the other restriction of obeying thepreviously described repair probability, only a certainfraction of those die is repairable. An efficiency factor ηis introduced into the cumulative model. It is defined asthe effective fraction of bad die repaired extrapolated atthe maximum yield for a certain repairable area. It iscalculated from η = γ / Farea (33) where

γ = ( Nbd_cr/ Nbd) x (Nrep / Nbd_cr ) Nrep = number of repaired die Nbd = number of reject die Nbd_cr = correctable reject die

which cancels out in the expression of γ. λ is thenmodified to

λλ = η Farea x A D (34)

which is then used in the general model

Mn= 1+ R1 λ+R2 (λ) 2 / 2! +… + Rn(λ ) n/ n! (35)This is the same as the cumulative model formula,equation (32), except for the expression of λ. γ is

obtained from the empirical data, so that one value of γcan be used for products from the same processgeneration. For a new process, γ from a previous processcan be used, since its value is close from one process tothe other.

Redundant Elements and Memory BlocksOptimization

In this section, how the number of redundant elementsand memory blocks affects the yield improvement isstudied. Increasing the number of spare elementsincreases the chance of repair. However, this impacts therepairable area, since the total area increases, while therepairable area is fixed. The dependency of theimprovement factor on both the number of redundantelements and the repairable area is studied in order tocheck the possibility of improving redundancy by varyingthese two factors.

Since the improvement factor is a function of therepairable area and the defect density, and since thedefect density is also a function of the area as calculatedfrom the yield equation, equation (21), this equation isused to substitute for D in the expression for λ, equation(34), as

λ = η Farea x (A D) = - η Farea x ln Y(36)

The total die area is then written as

A = Arep + Anrep + 4 x Ael (37)

where Arep is the repairable area and is equal to the areaof the subarrays, Anrep is the area of the die circuitry,and Ael is the redundant element area and is equal to thesubarray area. Increasing the number of redundantelements by sets of 4, the total area is

A = Arep + Anrep + 4 x e x Ael (38)

where “e” is the number of elements/block. The fractionof the repairable area is then

Farea = Arep / (Arep + Anrep + 4 x e x Ael) (39)

and

λi = - η Arep ln Y / ( Arep + Anrep + 4 x I x Ael) (40)


6

To study the behavior of the improvement factor with eand Arep , start with the general yield improvementfactor formula, equation (35)

Mn= 1+ R1 λ +R2 (λ ) 2 / 2! +R3 (λ) 3 / 3!+ … +Rn(λ) n / n!

In the case of adding extra redundant elements, the diearea, and hence λ, is also changing, so that for each casewith a certain number of elements the value of λ isdifferent. The improvement factor formula is written as

1 element, n defects:Mn,1= 1+ R1λ1+R2 (λ1)

2/ 2!+R3 (λ1) 3 / 3!+ …+ Rn(λ1)

n /n!

2 elements, n defects:Mn,2=1+R1 λ2+R2 (λ2 )

2/ 2! +R3 (λ2) 3/3! +…+Rn(λ2 )

n/ n!

e elements, n defects:Mn,k= 1+R1 λe+P2 (λe)

2/2! +… + Rn( λe ) n / n! (41)

For the number of defects less than or equal to thenumber of elements per block, the die is alwaysrepairable, i.e., R n =1 for all terms with n ≤ e.

On the other hand, if a die has n defects, wheren > e x b, the die is never repairable. The improvementfactor formula is then written as

1 element, n ≤ 1 defects:Mn,1= 1+ λ1

2 elements, n ≤ 2 defects:Mn,2= 1+ λ2 + ( λ2 )

2 / 2!

e elements, e < n ≤ e x b defects:Mn,e= 1+ λe+ (λe)

2 / 2! +…+ ( λe ) e / e! + Re +1

( λe) e+1 / e+1! …+ Rn ( λe )

n / n! (42)

where R n follows the expression given by equation (31).

Next the effect of increasing the number of blocks permemory area on the redundant yield is studied. Dividingthe memory area into a larger number of blocks alsoincreases the chance for repair, since each block isaccompanied by one redundant element. However, thereis a certain maximum number of blocks, after which theincrease in improvement is negligible, since the largerorder terms in the series start to diminish. In thisanalysis, the total number of subarrays and Farea, are keptconstant, but the size of the subarray is changeddepending on how the memory area is divided. Thenumber of redundant elements per block e is still one.

The general formula, equation (35), is used here, wherethe number of blocks b is changed in the repairprobability term R n given by equation (31).

Results and DiscussionThis report describes a model that calculates theredundancy yield. The amount of improvement dependson some key factors: the repairable area, availableredundant elements, defect density and types of defectsand their distribution on the die. The memory area is thearea that contributes to redundancy, since the rest of thedie area cannot be fixed and has to be functional. Onlythe random defect density is considered here as the defectcategory that is potentially repairable. The number ofavailable redundant elements also determines how muchimprovement can be gained. If there is one redundantelement per memory block, only one defect per block canbe fixed. The type of defects is another important factorin estimating the redundancy yield. Raster defects such asbits, columns or rows (where bits can representindividual or clustered defects as long as they fall in thesame memory block) are considered repairable.Although the number of defects that could be fixed equalsthe number of redundant elements available, thosedefects have to follow a certain distribution on thememory array according to the repair probabilitydescribed in the text.

Figures 2 through 4 show the improvement factor versusthe initial yield calculated by the three models: simple,cumulative, and general. Data is compared to threeproducts from two different process generations.

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0% 10% 20% 30% 40% 50% 60% 70%

intitial yield

impr

ovem

ent

fact

or M

M_meas M_general M_simp M_cum

Figure 2: Three formulas compared to datameasured on product 1


7

Comparing the formulas of the improvement factor, theclosest fit to the actual data was obtained when all thefactors affecting redundancy were accounted for (generalmodel). The simple model underestimates the data, sinceit assumes the repair of die with one defect only, which isnot the real case. The cumulative model is overestimatingthe data. It considers all types of defects and assumes allof them are repairable, if their count is equal to or lessthan the number of redundant elements. Thus, it ignoresthe restriction of allowing one defect per block.

00.5

11.5

22.5

33.5

44.5

5

0% 10% 20% 30% 40% 50% 60%

initial yield

impr

ovem

ent

fact

or M



0

1

2

3

4

5

6

7

8

9

0% 10% 20% 30% 40% 50% 60% 70%

initial yield

impr

ovem

ent

fact

or M



The effect of varying the number of redundant elementsis shown in Figure 5. The effect of adding moreredundant elements is mostly seen at a lower initial yield.It was observed that the improvement in yield issignificant up to two extra sets of elements for a die oforiginally one redundant element per block. Beyond that,the effect of decreasing the repairable area is dominating,so that the two factors cancel out, and the overallimprovement is unchanged.

0

1

2

3

4

5

6

7

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

# of extra sets of redundant elements

impr

ovem

ent

fact

or M

Y = 4% Y = 9% Y = 14% Y = 23% Y = 38%

Figure 5: Improvement factor when extra redundantelements at different initial yield Y are added

Figure 6 shows the effect of dividing the memory areainto a large number of blocks. Again the enhancement inthe yield multiplier is observed at a lower yield. With anincrease in the initial yield, an improvement in theredundant yield was observed up to six blocks. Beyondthat, the effect of more blocks per repairable area is notnoticeable, since the higher order terms in the multiplierformula are negligible and do not add extra value to theyield multiplier.

0

2

4

6

8

10

12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

# of memory blocks/ memory area

impr

ovem

ent

fact

or M

Y = 4% Y = 9% Y = 14% Y = 23% Y = 38%

Figure 6: Improvement factor when the number ofmemory blocks at different initial yield Y is increased

ConclusionA model for calculating the redundancy yield isdeveloped and described in this paper. Poisson’s equationplus the effect of some redundancy-influencing factorsare used to derive a general yield multiplier formula. Thememory area is considered the only portion of the diearea where redundancy is applied. The random defectdensity is used here as the only defect category thatcontributes to redundancy. From the defect population,


8

only a fraction of it can be repaired depending on thenature of the defect. According to the die design, thenumber of repairable defects depends on the availableredundant elements per memory block. This means thatthe number of defects must be below a certain value, andthe defects have to follow a certain distributionthroughout the memory area to enable redundancy. Anefficiency factor is introduced and empirically evaluatedto account for the repairable defects. Combining thosefactors, a general formula is derived and shows goodagreement with the actual data. Knowing the propertiesof a new product and using the efficiency factor for theprocess generation, the redundancy yield of a newproduct can be predicted. The formula can also be used tostudy the impact of varying the number of redundantelements and memory blocks on the final result. Thus, abetter design that optimizes the number of redundantelements, memory size with respect to the total die area,and the number of blocks in the memory area mightresult in a more efficient redundancy scheme.

AcknowledgmentsI would like to thank Dan Grumbling for initiating thisproject, Tim Deeter for the useful discussions andcomments during the development of the model, andMike Mayberry for his continuous support and guidancethroughout this work.

References[1] Jitendra Khare, et al. Accurate Estimation of Defect-

Related Yield Loss in Reconfigurable VLSI Circuits.IEEE Journal of Solid State Circuits. Vol. 28, No. 2,February 1993.

[2] R. M. Warner, Jr. Applying a Composite Model to theIC Yield Problem. IEEE Journal of Solid StateCircuits. Vol. SC-9, No. 3, June 1974.

[3] B. T. Murphy. Cost Size Optima of MonolithicIntegrated Circuits. Proc. IEEE. Vol. 52, December1975.

[4] C. H. Stapper. On Murphy’s Yield Integral. IEEETrans. Semiconductor Manufacturing, Vol. 4,November 1991.

Author’s BiographyNermine Ramadan received a B.Sc. in NuclearEngineering from the University of Alexandria, Egypt in1982, and a M. Sc. and Ph.D. in Nuclear Engineering

and Engineering Physics from the University ofWisconsin, Madison in 1986 and 1992, respectively. In1994 she joined Intel in Oregon and is currently workingas a Senior Integration Engineer in Sort/Test TechnologyDevelopment. Her e-mail address [email protected]

1

Redundancy and High-Volume Manufacturing Methods

Christopher W. Hampson, MD6 Cache Product Engineering, Hillsboro, OR, Intel Corp.

Index words: I5, redundancy, raster

Abstract

This paper will describe practical aspects of a redundancyimplementation on a high-volume cache memoryproduct. Topics covered include various aspects ofredundancy from a design and product engineeringperspective; and present test development methods forfuture product implementations.

As robust as Intel’s wafer fabrication processes are,defects still occur, and wafer yields are the indicator. Asdie sizes increase, so does the probability of a defectivedie. Failure analysis has shown that a large percentage ofmemory array defects are attributed to single-cell defects.This implies that a single memory cell fault can cause anarray of over four million cells to be deemed non-functional.

Redundancy is a method wherein “spare” array elementsare incorporated into the design to replace elements thathave tested defective. First, the basic redundant elementstrategy must be decided upon. This involves evaluatingrow, column, and block replacement schemes. Second ,the replacement mechanism needs to be a known andreliable entity (e.g., fuses). The design challenge is toselect how many redundant elements to add withoutincreasing the die size to the point where the totalnumber of good die is less than the overall yield withoutredundancy. The critical factors are die size and defectdensity. Yield forecasts and defect densities for a processare usually available prior to the design phase and areupdated on an ongoing basis .

Introduction

The “I5” is the 512K byte second-level cache packagedwith the Pentium® Pro processor. It is one of the firstcache devices at Intel to use redundancy. By using thisdesign feature, the I5 has achieved one of the highestyield levels for an Intel product. The overall yieldimprovement on the I5 with redundancy is a generous35%, and the cost savings are substantial.

I5 Architecture

The I5 architecture (Figure 1) consists of seventy-twoidentical sub-array “blocks” that make up the data array.It is organized into four quadrants, each containingeighteen sub-arrays. One sub-array contains 64K memorycells. Each sub-array corresponds to one input/output biton the device. “R” represents the redundant elements.

Figure 1: I5 Basic architecture

The goal for redundancy involves evaluating severalparameters to make a decision on how much redundancyto incorporate. First, the non-redundant yield should becalculated. This is determined from wafer size, number ofdie, and defect densities for the fabrication process.

For a sample wafer with 33 testable die, and known diesize and nominal defect density , the “perfect die” yieldmight be 20 die per wafer (D/W), without redundancy.

TAG

D A T AD A T A

D A T A D A T A

D A T AD A T A

D A T AD A T A

Q1

Q2

R

R

R

R

Q3

Q4


2

Figure 2: Non-redundant wafer yield

With the addition of redundant elements, the die sizeincreases, so fewer die fit on the same size wafer. Hencethe “perfect die” yield decreases. We then need to be ableto predict for the same defect density, how manyadditional die can be made functional for a total (perfectplus redundant) die yield.

Figure 3: Total wafer yield with redundancy

Block replacement was chosen as the optimal strategy forthis architecture. Given this block replacement strategy,the yield increase can be determined with defectdensities, die size, and sub-array size.

A yield multiplier can be calculated from the equation:

MULT = S ( 1 + 0.01 ( N + I ) Asb D / k ) k

Where: S = Programming success rate N = # of sub-arrays I = # of redundant sub-arrays Asb = Sub-array area (mm2) D = Defect density (#/cm2) k = Constant for MOS processes 0.01 = Conversion from mm2 to cm2

k is a constant derived from a formula for yield that isbased on an average value of the coefficient of variationfor the defect density distribution. This yield model isdiscussed in detail in the paper entitled “RedundancyYield Model for SRAMS” also published in this issue ofthe Intel Technology Journal.

Since the data array is divided into four quadrants , thelogical direction for determining how much redundancyto incorporate in the design was to calculate yield withthe multiplier and evaluate for one or more redundantelements (sub-arrays) per quadrant. This process revealedone element per quadrant as the optimum strategy (thesub-arrays labeled “R” in Figure 1). The tag array wasevaluated for redundancy and was considered too smallfor an effective implementation.

Table 1 shows the predicted yield for both the non-redundant and redundant cases. The multiplier equationassumed a programming success rate of 100%. As diesize increases due to redundancy, the perfect die yielddecreases. For a nominal yield level, the redundant yieldmultiplier is 1.49 times the perfect die yield of 18,resulting in a 27 D/W final yield. This model predicts atthis defect density level, a 50% increase in yield over theperfect die with redundancy; and a 35% increase over theyield without redundancy.

This was the model used to predict yield for the I5. Witha nominal defect density, the predicted increase in yieldon the I5 was 50% over the perfect die yield withredundancy; and 46% over the non-redundant case.

One Redundant Element per 18 Sub-Arrays

Yield LevelDefectDensity

(per cm2)

Perfect D/WNon-

Redundant

PerfectD/W with

Redundancy

Data ArrayYield

Multiplier

Total Goodwith

Redundancy

Ratio toNon-

redundancyLow 0.8 13 12 1.85 22 1.7

Nominal 0.5 20 18 1.49 27 1.4High 0.2 28 26 1.18 30 1.1

Non-redundant: Gross Die/Wafer = 36, Redundant: Gross Die/Wafer = 34

Table 1: Sample wafer redundant yield calculation

P = Perfect DiePerfect Yield = 20Gross Die = 36Tested Die = 33

P = Perfect DieR = Redundancy DiePerfect Yield = 18Redundancy Die = 9Total Yield = 27Gross Die = 34Tested Die = 31

RRR

R RRR

RR

PP

PPP

PP

PP

P

P

PP

PPP

PP

PPP PP

P PP PPP PPP P P

P P PP


3

The Replacement Mechanism

The crux of any redundancy implementation is themethod used to substitute defective elements withdefect-free elements. On the I5, flash cells areprogrammed to direct muxes that replace the defectivesub-arrays with defect-free sub-arrays.

Figure 4: Flash cell basic schematic

The flash cell is basically two transistors, one floatinggate, and a select gate (Figure 4). To program, a highvoltage is applied to the programming gate, and withthe select gate turned on, current will flow to the drainof the floating gate transistor. This creates the fieldsconducive to hot-electron injection, causing an increasein the threshold of the floating gate. Cells should havelow unprogrammed switching thresholds (Vt) out offab, and once programmed, they should have highswitching threshold levels. For more information onflash technology, refer to [1].

In Figure 1, a sub-array is replaced by its neighboringsub-array, closest to the redundant sub-array. The sub-arrays between the bad sub-array and the redundantsub-array are also switched to their neighboring sub-array. All this switching is done by muxes in the readand write paths of the device.

One redundant sub-array per quadrant allows for oneand only one defective sub-array replacement perquadrant, up to four per die. The task is to determinethe number of defective sub-arrays per quadrant. Thisprocess is integrated with the cache raster capability.Raster is a test process used to uniquely identify allfailing cell locations on the device. The redundancyalgorithm is integrated with the raster function toidentify the failing sub-arrays.

The Redundancy Algorithm

The basic idea of the redundancy test flow is to find thedevices that are defective, evaluate the extent ofreplaceability (one failing sub-array per quadrant),program the flash cells to effect the replacement of the

failing sub-array, and re-test to ensure the redundantsub-array is defect free.

This is accomplished by obtaining fail information fromrastering, and modifying tests in the flow, that onceexecuted, will perform the programming and reading ofthe flash cells. The algorithm is further enhanced byensuring the cells are programmed to a high reliabilitylevel, detecting high Vt cells out of fab, and checks forresorting wafers containing programmed cells.

The Details

The replacement process occurs early in the test flow, atthe Built-in-self-Test (Bist) step. This test checks everymemory cell in the data array. The flow starts whenthis test fails (see Figure 5). This is the only point in theentire test flow at which sub-array replacement is done.

At the raster step, all the failing cell information iscollected. It is also discerned if fails occurred in morethan one sub-array per quadrant.

Figure 5: Redundancy algorithm

BLProg

WL (Select)

Erase

BIST TEST

BIST TEST

Continue

Read and Send Flash Cell info to Database

Collect Raster andReplacement Data

Redundancy OK ?

Reliability Test Pass ?

Any Flash Cells On ?

Is Die Redundant ?

N

N N

Pass

Pass

Fail

Fail

Y

Y

Y

Program all Flash Cells

Any Flash Cells = 1?

Flash Cells Read OK ?

N

N

N

YY

Y

= Failed Die


4

If not, the test flow is halted, and the die is binned non-functional.

First the array is read and tested for all cells equal to“0.” This checks for cells whose Vt’s are high enoughto read as a “1,” out of fab. If any cells are read as “1,”the die is binned non-functional.

The flash cells are then programmed and tested for theexpected contents, and if a cell failed to program, it isbinned out. A reliability test is then done to ensure thecells are reliably programmed. This test gives anindication of a high Vt.

The BIST tests are then re-executed, passing die flashcells are read and written to the database for possiblefuture failure analysis, and the die continues the testflow.

If die passes the first BIST tests, the flash cells are readto determine the die’s status. If any cells are read as“1,” then it must be determined if this is a bad cell outof fab, or a redundancy die. Once this determination ismade, the die is either binned non-functional orcontinues the test flow.

The Production Results

Raster and replacement data indicated that 85% of alldie that failed the BIST screen could use redundancy.After the first month in production, an average increasein yield of 35% was evident. Subsequently, afterredundancy had been enabled for two quarters ofproduction, a cost analysis was performed. It showedthat all the replacement die had amounted to anequivalent of 6696 wafers. The direct unit cost savingswere substantial. In addition to the direct costs, thissavings enabled the manufacture and sale of manyother Intel products.

Test Cost of Redundancy

An additional 1.5 seconds was needed to implementredundancy on a die in the sort test program. Ananalysis was performed to determine if redundancyactually lowered the test time per good die, over anentire lot. Considerations were good and bad die testtimes with and without redundancy, and time to alignwafers and stepping to other die on the same wafer. Itwas concluded in all cases for different yields that asignificant test time savings could be achieved. Theactual test time savings at nominal yield levelsamounted to 1.33 seconds per good die over the testtime without redundancy. Test time savings are greaterfor lower yields.

Conclusions

The design yield predictions based on redundancy weresomewhat inflated due to the general model used.

At nominal yield levels, the predicted increase was(50%); the actual increase was (35%).

First, the factor that would inflate the prediction, yield,is based on the wafer size and therefore is calculatedwith the gross die per wafer count instead of tested die.This is standard for yield calculations, so the initialpredictions counted on more die available forreplacement. Furthermore, a major contributor to thedegradation of the replacement rate was the 15% falloutfor those die whose data arrays had more than onedefect per quadrant.

A new redundancy model has been formulated thattakes into account the number of “tested die” and thepossibility of defect types that do not warrantreplacement.

The redundancy application with the I5 has shown thatthere are other factors that would increase the accuracyof a redundancy model. Quiescent current screening isan important factor that will change for differentproduct types. This screen accounted for an additional1% reduction in replacement rate, but could be higherfor products with tighter testing. The programmingsuccess rate seen on the I5 was less than perfect at 97%.This is due to redundant die that had defects in theredundant sub-array, or die that failed to program flashcells. An additional component is the reliability test onthe programming element. The position of thereplacement function in the test flow, and the test usedto determine if a die needed redundancy, are otherconsiderations that can alter the replacement rate. Allthese factors can be incorporated into redundant yieldpredictions in the future.

Summary

Improvements to yield prediction and implementationaspects have been described. The I5 has shown thatredundancy makes sense on large arrays, and itsbenefits are greater for lower yields. It can beimplemented and made production worthy, andimproved yields and substantial savings can be realized.

References

[1] Ohsaki, K., Asamoto, N., Takagaki, S., “A SinglePoly EEPROM Cell Structure for use in StandardCMOS Processes,” IEEE J. Solid State Circuits, vol. 29,No.3, March 1994, pp. 311-316.

Author’s BiographyChristopher Hampson is a product engineer in theMicroprocessor Products Group, Cache ProductsDivision. He received a B.Sc. degree in ComputerScience from National University, San Diego, Ca. Hejoined Intel in 1993, was a lead product engineer on theL2 cache for the Pentium Pro® processor, and is


5

currently working on the next generation of Intel’scache products. His e-mail address [email protected]

IEEE International Electron Devices Meeting ( Dec '97)

A PROM Element Based on Salicide Agglomeration of Poly Fuses in a CMOS Logic Process

Mohsen Alavi , Mark Bohr, Jeff Hicks, Martin Denham, Allen Cassens, Dave Douglas, Min-Chun TsaiIntel Corporation, Portland Technology Development

Hillsboro, ORAbstract

A novel programmable element has been developed andevaluated for state of the art CMOS processes. This elementis based on agglomeration of the Ti-silicide layer on top ofpoly fuses. Various aspects of these programmable devicesincluding characterization and optimization of physical andelectrical aspects of the element, programming yield, andreliability have been studied. Development of a novelprogramming and sensing circuit is also included.

Introduction The capability of implementing a small PROM array onlogic products at no additional process cost is highlydesirable for a number of applications such as redundancyimplementation in SRAMs, die identification, electricallyprogrammable feature selection, etc. As CMOS technology scaled, gate oxides became thinenough that implementation of flash memory cells onstandard logic CMOS processes (SPEED) became possible[1]. However, further scaling of CMOS technology resultedin inadequate charge retention in the SPEED device due totunneling of carriers through the gate oxide. The element presented here avoids the problem withscaled gate oxide thickness. The results are a fuse elementwhich is reliable under thermo-mechanical and bias-temperature stress while enjoying near 100% programmingsuccess used in a specially designed circuit. Programmingthe fuse does not result in any collateral damage in overlyingor underlying layers and may be performed nominally at2.5V and 10 mA in 100 ms.

General Description of the ElementA. Physical Properties The poly agglomeration fuse (PAF) is made from apolysilicon line shunted on top by a layer of Ti-silicidewhich is used as the gate in CMOS processes. It isprogrammed via current stress which results in temperatureshigh enough to cause agglomeration of the Ti-silicide [2].The damage due to programming of the element has beenfound to be very subtle and confined to the Ti-silicide and itsinterface with the underlying poly layer and the overlyingdielectric. The integrity of the entire overlying stack fromthe passivation to the overlying ILD is found to be intact andno collateral damage has been observed (see Fig. 1,2). Thisis in contrast with traditional poly or metal fuses whichrequire openings in the overlying layers to facilitate removalof fuse material, and therefore, a post program passivationstep. Typically, a fuse link is drawn at minimum allowable

width with a few microns of length (see Fig. 2). The effectof fuse doping and geometry on its performance has beeninvestigated extensively and will follow.

x

Oxide

Oxide

Substrate

Ti-Si2

Poly

0.5 µmx

Fig. 1, Cross section of the damaged section of a programmed fuse. Lack ofcollateral damage to the overlying and underlying layers is evident..

Fig. 2, Top view of a programmed fuse. The subtle damage due toprogramming is evident on the left side of the element.

B. Electrical Properties Prior to programming, electrical properties of the fuseare determined by the salicide layer on top which has a sheetresistance of about 4 ohms per square in our study, resultingin a typical resistance of about 50-100 ohms depending onthe dimensions of the fuse. Injection of current beyond acertain level results in a sudden increase in resistanceindicating formation of discontinuities in the silicide layer.The value of this resistance varies greatly from device todevice. In our structures, post program resistance variedfrom several hundred Ohms to several hundred kOhms. Postprogram I-V characteristics are found to be nonlinear andtherefore, the value of resistance varies with applied bias.

Element CharacterizationA. Test Structures The element described above has been implemented in a0.25um CMOS process with a poly thickness of about 0.2um[3]. Ti-silicide films resulting in sheet resistance ranging


from 3 to 4 ohms per square have been studied. Initial andpost program electrical characteristics of a variety of elementdesigns have been investigated. This includes the effects ofpoly doping (n, p, undoped), fuse length and width, fuseshape, programming and sensing voltage and current, andprogramming time.

B. Programming dynamics In order to program an element, a certain amount ofcurrent is needed. The voltage needed for injecting thiscurrent must obviously be smaller than the available powersupply voltage. Under constant voltage stress, as the elementgets hot enough, agglomeration starts to occur, thereby,increasing the element resistance. As a result, the currentthrough the element drops to a low value consistent with theelements final resistance and the element cools down. Thismechanism is one with negative feedback. Therefore, a givenfuse may be stressed only once and it’s post programresistance will not increase with additional voltage stress. Figure 3 shows the I-V characteristics of a typical fuseelement. As the voltage is increased, current increases in anonlinear fashion due to resistance change caused by selfheating. When the dissipated power reaches a critical value,fusing occurs and element goes to a much higher resistance.

00

0.5 1.0 1.5 2.0 2.5

Potential (Volts)

3

6

9

12

15

Cur

rent

(m

A)

Onset of Programming

Figure 3, I-V characteristics of a typical element upon programming.

C. Response parameters Initial and post program resistance of the element arethe two key parameters affecting any circuit meant to sensethe state of the element. A maximum value of initialresistance and a minimum value of post program resistanceare needed to guarantee proper circuit function (about 100Ωand 1kΩ respectively in our circuit). Initial fuse resistance depends on element geometry andsilicide thickness and quality. Silicide quality in turndepends on process conditions, poly line width, and doping[2,4]. Silicide imperfections are more likely for long narrowelements and best silicide lines were found to be the onesmade from p-doped poly. Imperfections in the silicide layer

(cracks, high resistance Ti-Si phase) result in a resistiveelement Figure 4 shows cumulative distribution of theresistance of a typical fuse structure made with two processeswith different thermal cycles and Ti thickness. A highresistance tail corresponding to silicide imperfections isevident in the distribution of the resistance of theunoptimized process.Post program resistance varies greatly from device to deviceand depends on the shape and size of the discontinuity in thelink. Due to this variation, any aspect of this resistance mustbe studied statistically. Many factors affect the level of fusingand therefore, post program resistance. They include:

0 4 0 8 0 120 160 200

0.1

1

1 0

3 05 0

7 0

9 0

9 9

99.9

1

1

1

1

1

11

1

1

1

2

2

2

2

Cu

mu

lati

ve D

istr

ibu

tio

n

Ohms

(2)

(1)

Figure 4, Resistance of a typical fuse. (1) Unoptimized silicide process, (2)Optimized silicide process.

Programming voltage, current, and time: Eventhough fusing can occur quickly and at fairly low currentsand voltages (in the order of 1V, 8mA, 1mS), post programresistance is significantly enhanced if more energy isdumped into the element Therefore, increased voltage andcurrent levels are needed for a longer time to guarantee asufficiently large resistance. In this work, minimumprogramming conditions which resulted in statisticallyadequate post program resistance were a current of 20mAinjected for 100ms with a voltage compliance of 2.5V. Initial fuse integrity: Measured data shows that fusesthat are more robust initially (by process or geometry) resultin more successfully programmed elements. This is due tothe fact that for a given voltage compliance and a givenvalue of fusing current, a smaller resistance results in alarger amount of energy transferred to the device. Thefortuitous result is that process conditions which result ingood silicide formation and robust unprogrammed fuses alsoproduce elements which program successfully. Fuse shape: In addition to the relation between fuse sizeand it's initial resistance, the shape of the fuse has a markedeffect on the distribution of its post program resistance. Thishas been found to be due to the fact that in addition to thehigh temperature necessary for agglomeration, the level of


temperature gradient (and therefore stress) in the elementplays a key role in the fusing event. Fusing has been found tooccur near the sides of the element close to the point whichhas the highest temperature gradient (see Figure 2,5).Additionally, line width plays a significant role in fusingsuccess with narrower lines having the advantage of betterfusing. Figure 6 shows four different fuse shapes of the samelength. Figures 7,8 show the distribution of initial and postprogram resistance for these elements. The differencebetween post program resistance of elements a,b correspondsto the effect of element width while differences betweenstructures c,d show the effect of temperature gradient.

Center link

Tem

pera

ture

(C

)

25

245

465

685

905

Tem

pera

ture

Gra

dien

t (C

/µm

)

0

180

360

540

720

Normalized position along the fuse lengthFigure 5, Simulation results of profiles of temperature, and temperaturegradient along the length of a typical fuse at nominal programming bias.

Modeling and SimulationIn order to look for an optimum fuse design, numericalsimulation of temperature in the element under current stresshas been performed. The simulation is based on a twodimensional model with an added loss term to the overlyingand underlying layers. Thermal conductivity of the silicidelayer and the heat loss coefficient were fitting parameters.Assuming that fusing occurs when the temperature of thefuse reaches 800C (silicide agglomeration temperature [2]),the simulation is able to predict fusing current using a singleset of fitting parameters for various fuse geometry with goodaccuracy (see Fig. 5,9) and provides insight into thedistribution of temperature and its gradient in the element.

(a) (b) (c) (d)Figure 6, various fuse shapes. All elements are p-type, about 2um long. (a)width=0.22um, (b) width=0.27um, (c, d) width = 0.22um/0.27um.

0.0 20 40 60 80 100

0.1

1

10

305070

90

99

1

1

1

1

1

2

2

2

3

3

3

3

5

5

5

5

5

Cu

mu

lati

ve D

istr

ibu

tio

n

Ohms

(a) (d)

(c)

(b)

Figure 7, Pre-programmed fuse resistance of structures in figure 6.

0 2.0 4.0 6.0 8.0 10

0.1

1

10

30

50

70

90

99

99.9

1

1

1

1 11

1

1

1 1 11

1 11 1

11

11

11 1 1

11

1

2

2

2

2

2

2

22

2

22 2

2 2 2 22 2 2 2

22 2

2 2 22 2 2

2

3

33

3

33

33

3

3

3 3

33 3

3

33

3 3 33

3 3 3 33 3

3 3 33 3

33

33

3

4

4 4

4

5

5

5

5

5

5

5

55

5

5

55

55

55

5

Cu

mu

lati

ve D

istr

ibu

tio

n

kOhms

(d)

(b)

(c)

(a)

Figure 8, Post program resistance of fuses in Figure 6.

Sensing Circuit A special circuit has been developed for programmingand sensing the element. Figure 10 shows a simplifiedschematic of this circuit. Programming occurs when a logicLO is asserted on the gate of a large PMOS transistor. Sincethe fuse programs at relatively low bias, logic andprogramming circuits share a common supply voltage. The sensing circuit is a novel and well-balancedsolution to a stringent set of requirements, the foremostbeing that the sensing currents must be kept very low. Thecore of this circuit comprises a pair of matched N-channeltransistors, which perform the sensing, and a pair ofmatched P-channel devices, which act as current-sensingoutput loads. The N-channel sensing transistors areconnected in a current mirror configuration, such that, if thefuse-reference resistance on the left were equal to theunburned fuse resistance on the right, both circuit brancheswould have equal current. In practice, the referenceresistance is set to about 8 times that of the unburned fuse.This ratio of reference to fuse creates a default (unburned)output voltage that is low enough to be interpreted as a logicLO value. Additionally, for a programmed fuse, theresulting output voltage is sufficiently high to be interpreted


as logic HI. Therefore, the gain of the circuit is sufficient forsingle-ended voltage outputs.

0.5 1.0 1.5 2.0 2.5 3.0 3.56

8

10

12

14

oo

o

o

o

o

xx

x

x

++

+

+

Fu

sin

g C

urr

ent

(mA

)

Fuse length (um)

+ Width=0.30µmx Width=0.27µmo Width=0.24µm

Figure 9, Measured and simulated current at the onset of fusing. Solid linesshow simulation results. Symbols show measured data points

In this circuit, the ratio of reference to unburned fuseresistance represents a balanced tradeoff between output highvoltage (VOH) and output low voltage (VOL) levels. With aratio of 8, noise margins for VOH and VOL signals areroughly equal. The resulting noise margin is adequate toguard-band the circuit from expected manufacturingvariations in transistor Vt and channel length.

Figure 10, Schematic of a simplified programming and sensing circuit.

Yield and Reliability A PROM array based on the PAF will suffer yield loss ifthe programmed fuse does not have a high enough resistanceto be properly sensed. Programming yield depends on thefuse design (see Fig. 8), array size, and circuit design. Evenafter optimizing the element and circuit, the resulting yieldmay not be as high as expected. In that case, redundant fuseelements are needed such that if programming of a givenfuse in a given memory bit is not successful, an additionalfuse is available in that bit for an extra attempt. In this work,for a 64 bit array, a programming yield loss of less than 1 in10,000 was achieved using two fuses per bit (a programmedstate in either fuse resulted in a programmed bit). The reliability of this element was characterized byplacing a large number of samples (programmed and

unprogrammed) under thermo-mechanical stress (1000cycles of condition 'C' temperature shock) and in bake (300hours, 250C). The element was found to be quite stableunder these conditions (see Fig. 11). Additional testing wasdone to characterize the stability of the unprogrammed fuseunder bias temperature stress. Results indicated that as longas the sensing current is significantly less than the current atthe onset of programming, the device will remain stable.

0.0 1.0 2.0 3.0 4.0 5.0 6.0

0.1

1

10

305070

90

99

99.9

1

11

1

1

1

1

11

1

11

1

1

1

1

2

2

22

2

2

22

2

2

2

22

2

2

kOhms

Initial Stressed

Figure 11, Post program fuse resistance distribution of a typical fuse beforeand after 300HR 250C bake.

Conclusions Poly agglomeration fuse is a reliable programmableelement which may be implemented in a logic CMOSprocesses. This element may be programmed under nominalbias and does not introduce any collateral damage.Distribution of the post program resistance depends onsilicide quality, fuse shape, doping, and programmingconditions. Optimized conditions for fuse shape andprogramming parameters have been presented usingempirical results and numerical simulations and a novelcircuit has been presented for the device with a 1 in 10,000programming yield loss for a 64 bit PROM arrays with 2fuses per bit. Element reliability has been verified undertemperatue shock and bake.

References1) K. Ohsaki, N. Asamoto, S. Takagaki, "A Single Poly EEPROM Cell

Structure for Use in Standard COMS Processes", IEEE J. Solid State.Circuits, Vol 29, No. 3, March 1994, PP. 311-316

2) J.B. Lasky, J.S. Nakos, O.J. Cain, P.J. Geiss, "Comparison ofTransformation to Low-Resistivity Phase and Agglomeration of TiSi2and CoSi2, IEEE Trans. Elect. Devices, Vol. 38, No. 2, Feb. 1991, pp.262-269.

3) M. Bohr, et. al., “A High Performance 0.25µm Logic TechnologyOptimized for 1.8V Operation”, 1996 IEDM Tech Digest, 1996, pp. 847-850.

4) J.A. Kittle, Q-Z Hong, D.A. Prinslow, G.R. Misum, "A Ti SalicideProcess for 0.10 um Gate Length CMOS Technology”, 1996 VLSI Symp.Tech. Digest, 1996, pp. 14,15.

Documents

Intel Technology Journal Q4, 1997 · Intel Technology Journal Q4, 1997 Preface Lin Chao ... Greg Atwood, Flash Technology and Manufacturing, Santa Clara, ... Q4. Intel Technology