SPARP: a single pass antialiased rasterization processor

Computers & Graphics 24 (2000) 233}243

Technical Section

SPARP: a single pass antialiased rasterization processor

Jin-Aeon Lee*, Lee-Sup Kim

Department of Electrical Engineering, Korea Advanced Institute of Science and Technology, Taejon, South Korea

Abstract

We present a rasterization processor architecture named SPARP (single-pass antialiased rasterization processor),which exploits antialiased rendering in a single pass. Our architecture is basically based on the A-bu!er (Carpenter,Computer graphics 1985;19:69}78) algorithm. We have modi"ed the A-bu!er algorithm to enhance the e$ciency ofhardware implementation and quality of the image rendered, such as the data structure of pixel storage elements, themerging scheme of partial-coverage pixels, and the blending of partial-coverage or non-opaque pixels. For the scanconversion and generation of subpixel masks, we use the representation of edges that was proposed by Schilling(Computer graphics 1991;25:131}41). We represent partial-coverage pixels for a pixel location by a front-to-back sortedlist as in the A-bu!er and dynamically manage the list storage. We have devised a dynamic memory management schemethat extremely simpli"es the memory managing overheads so that we can build it by hard-wired logic circuitry. In ourarchitecture we can render an antialiased scene with the same rendering context of Z-bu!er method. Depending on thescene complexity, proposed architecture requires rasterization time 1.4}1.7 times as much as a Z-bu!er rasterizer does.The bu!er memory requirements can vary depending on the scene complexity; the average storage requirement is 2.75times that of the Z-bu!er for our example scenes. Our architecture can be used with most rendering algorithms toproduce high-quality antialiased images at the minimally increased rendering time and bu!er memory cost, but due tothe improvements in semiconductor technology we can expect that antialiased rasterization processors will be widelyadopted in the near future. ( 2000 Published by Elsevier Science Ltd. All rights reserved.

Keywords: Rendering hardware; Antialiasing; Rasterization; Hidden surface removal; Transparency blending

1. Introduction

There has been a great deal of work in graphics onantialiased rendering. To reduce aliasing artifacts of theZ-bu!er resulting from point sampling, many renderingalgorithms have been proposed, such as supersampling,stochastic sampling [1], and the A-bu!er [2]. Super-sampling reduces aliasing artifacts by increasing the fre-quency of the sampling grid. Stochastic sampling isa method that makes higher frequency components of animage noise rather than aliases by irregular sampling.These techniques sample an image at multiple points andapply "ltering to a sampled image to get a "nal renderedimage. Supersampling and stochastic sampling require

*Corresponding author. Tel.: #82-42-869-4445; fax: #82-42-869-4450.

E-mail address: [email protected] (Jin-Aeon Lee)

multiple rendering passes or massive parallel hard-ware renderers, so they are inadequate for real-timeimage generation at moderate hardware cost.

As a continuous image domain "ltering for antialias-ing, Catmull [3] proposed an algorithm that performssubpixel geometry for polygons but approximates theintensity of a polygon as a constant value. It computesthe intensity by using visible subpixel fragments asweights in an intensity sum. This is equivalent to convol-ving the image with a box "lter. Despite an approxima-tion of the intensity, it is extremely expensive to calculateprecise coverage areas over pixels and "lter extents. Toapproximate the process of calculating of coverage areas,Carpenter [2] uses bit-masks to area sample subpixelfragments, developing a technique known as the A-bu!er.Coverage and area weighting is accomplished by usingbit-wise logical operators between masks representingpolygon fragments, and the processing per pixel squarewill depend only on the number of visible fragments. In

0097-8493/00/$ - see front matter ( 2000 Published by Elsevier Science Ltd. All rights reserved.PII: S 0 0 9 7 - 8 4 9 3 ( 9 9 ) 0 0 1 5 7 - 0

Fig. 1. Rendering pipeline with the SPARP.

the A-bu!er, the contribution of surfaces that cover par-tially are arranged in a list that is sorted front-to-back.Two z-values are stored for each fragment, z

.*/and z

.!9,

and these values are used for the determination of thevisible area of the fragment. When all fragments havebeen added to the list, the intensity is calculated ina process called packing. As pointed out by Schilling [4],the method of calculating the visible area using twoz-values per fragments will fail very often. Schilling pro-posed an algorithm to resolve the visible area determina-tion of two intersecting fragments within a pixel usinga subpixel mask, which is called priority mask. The prior-ity mask indicates which part of the pixel belongs towhich object. If two fragments intersect within a pixeland subpixel masks of those fragments are overlapped,the priority mask must be calculated. For the calculationof the priority mask, z and z increment values in thex and y direction are stored for each fragment. Theequation for the intersection line can be built using thosez values of two intersecting fragments. After normalizingthe equation, the line parameters can be used to look upthe resulting priority mask. As in the case of the A-bu!er,partial fragments are arranged in a depth-sorted list. Forlist processing, a pipeline of processors was introduced.Each processor, called list processor, can only handle oneadditional object per pixel. If there is more than oneobject that are concerned with the pixel, only one objectis processed and the other objects are sent to the next listprocessor. It can resolve all objects by pipelining theseprocessors and cycle through the pipeline if necessary. Inthis algorithm and architecture the priority mask calcu-lation requires division operations, requiring relativelyhigh hardware cost.

Recently, a few rendering schemes for the hardware-assisted antialiased rendering have been proposed suchas chunking [5] and edge tracing. The chunking schemerequires some amount of overheads in the object clippingand the edge tracing scheme is not rendering orderindependent.

To generate antialiased images in real time, a hard-ware-assisted rendering technique is substantial. Present-ed in this paper is a rasterization processor architecturefor single-pass antialiased rendering, called SPARP(single-pass antialiased rasterization processor). Thegoals of the SPARP are:

f To use the same programming model for antialiasedrendering as used in the Z-bu!er.

f To provide an architecture which does the antialiasedrendering in a single pass for the interactive works.

f To render transparent objects.f To provide an e$cient partial-pixel handling method.

In polygonal objects rendering, we have been accus-tomed to rendering images using a programming modellike the Z-bu!er method. After a scene is built in virtualspace, polygons that construct the scene are fed into the

rendering pipeline to render the scene. There is no orderto render those polygons in the Z-bu!er method. In theSPARP, the major goal is to use the same programmingmodel for properly antialiased image rendering andtransparent objects rendering. We have modi"ed thedata structure of the A-bu!er, and partitioned the render-ing pipeline into two phases * the rasterization phaseand the post-processing phase. During the rasterizationphase, objects are rasterized and stored to the bu!ermemory in two categories. One is the full pixel that needsno more processing and the other is the partial pixel thatrequires a procedure to handle it. Partial pixels areproduced at the boundary of polygons during the raster-ization process. A partial-pixel handler creates a newlinked list for a new partial pixel at the given pixellocation or inserts the partial pixel into the present list.During the inserting operation, partial pixels can bemerged if they meet the merging criteria. The mergingoperation is important in reducing the storage require-ment of partial pixels during the rasterization phase. Wemust "nd the merging schemes that minimize the storagerequirement for partial pixels with minimized artifacts inthe rendered image. By our merging algorithm, we elim-inate most of the intermediate partial pixels during therasterization phase. To manage the dynamic storage allo-cation of partial pixel storage, we have developeda simple and e$cient new memory-managing schemethat recycles the storage of deallocated elements. Themain issue of the managing scheme of the partial pixelstorage is the simplicity that can be implemented e!ec-tively by hardware. In the post-processing phase, partialpixels are blended at the subpixel level and averaged toconstruct the "nal color value.

The rasterization and the post-processing phase can bepipelined using two sets of bu!er memories as in doublebu!ering. The total amount of processing time for thepost processing phase is less than that of the rasterizationphase in most of our rendering experiments.

2. Overview

Fig. 1 shows the rendering pipeline when the SPARP isused. The rendering pipeline is the same as the one in theZ-bu!er except that post processing is included. As theA-bu!er algorithm, we use subpixel masks to representpartial pixels. Partial pixels are created during the raster-ization process on the boundary of polygons. Most par-tial pixels are merged during the rasterization of adjacentpolygons, and "nally the remaining partial pixels are

234 Jin-Aeon Lee and Lee-Sup Kim / Computers & Graphics 24 (2000) 233}243

Fig. 2. Hardware block diagram of the SPARP.

located on the boundary of objects or in the intersectionregion of objects with other object or objects. We repres-ent partial pixels to a pixel location by a linked list formas in the A-bu!er. In our architecture, we represent thearea coverage of a pixel using a 4*4 subpixel mask anduse the triangle as a rendering primitive.

Fig. 2 shows the hardware block diagram of theSPARP. There are three distinct bu!er memory blocks,perfect-pixel bu!er (PB), partial-pixel fragment bu!er(PFB) and screen refresh bu!er (SRB). The PB is equiva-lent to the frame bu!er (FB) in the Z-bu!er. Data ele-ments for a pixel location are shown in Fig. 3(a). As wecan see in Fig. 3(a), data components that are added tothe PB compared to the FB are incremental values of thez-value (dxZ, dyZ). In Fig. 3(a), all depth parameters aresigned numbers. The z "eld represents a real depth valuewhen it is a positive value; otherwise, it holds the headpointer of the partial-pixel list as in the A-bu!er. Partial-pixels are stored in the PFB during the rasterization ina single linked-list form. We use the FID (fragment ID) toindex an entry of a PFB cell. Fig. 3(b) represents dataelements of a cell in the PFB. To represent the partialpixel information and support proper merging of partialpixels, two data components are added to the PFB cell* the subpixel mask (M) and the object identi"cationnumber (ObjID). One additional component in the PFBcell is the pointer to the next PFB cell that has thesubsequent partial pixel data * NextFID (next partial-pixel fragment ID).

The rasterization process is pipelined as three groupsof operations: span generation, span processing and pixelprocessing. The Pixel Processor performs per-pixel op-erations for all pixels including partial pixels that areproduced by the span processor. Depending on the resultof the pixel processing, a pixel fragment is assigned to bestored in the PB or to be processed by the partial-pixelprocessor. The partial-pixel processor manages the par-tial pixels in a front-to-back depth sorted manner for theproper blending of transparent objects. We will describe

the partial-pixel processing in some detail in a sectionbelow.

The post-processor scans the display region of the PBand copies pixel values to the SRB if the pixel fragment isa full pixel (z-value is positive). On the other hand, ifa pixel location has a partial-pixel list, i.e. the z-value isnegative, it requires the blending of partial pixels basedon their visible area coverage and transparencies. Toblend partial pixels, we expand a pixel to 4*4 subpixelsand "nd color values of individual subpixels. Aftertraversing all of partial pixels in the list, we calculate themean values of the visible subpixles and pass it on tothe SRB.

3. Rasterization

3.1. Scan conversion and span xlling

We use the formula that is described by Schilling [6] torepresent edges of a triangle. Based on that formula, wehave designed a scan converter which traverses the edgesof the triangle and produces spans so that they include allof non-zero area covered pixels in the triangle. Subpixelmasks are generated during the span "ll operation. Weuse a look-up table which has the "rst quadrant coveragemask patterns. Its size is 2K*16-bit. Using that tablewe generate subpixel masks for all directional edges* through bit manipulations (rotation or/and re#ection)for other quadrant lines.

3.2. Pixel processing

The pixel processor does the per-pixel operation forpixels that are generated from the Span Filler. One ofmajor roles of the pixel processor is to determine whethera pixel can be stored in the PB as a full pixel, should beprocessed by the partial pixel processor, or must bediscarded if it is fully occluded by the present pixel

Jin-Aeon Lee and Lee-Sup Kim / Computers & Graphics 24 (2000) 233}243 235

Fig. 3. Data structure of PB and PFB elements, (a) PBCell,(b) PFBCell.

Table 1Storing test decision table

Test condition Operation

DP.Z(0 CP W DP CP.MOFull CP.Z)DP.Z CP.AOOpaque DP.AOOpaque

@ } } } } } Insert@ } } } }

@ @ } }

@ @ Create a new list} @

@ } Replace DP with CP.} } Discard CP.

fragment. Table 1 summarizes the storing test conditionsand actions of the pixel processor. After the storing test,the Pixel Processor passes the source pixel fragment anddestination pixel fragment to the partial-pixel processorif the pixel should be processed as a partial pixel. Other-wise, the source pixel is stored in the PB or discardeddepending on the result of the storing test.

4. Partial pixel processing

Partial pixels are stored in a linked list form in thebu!er memory and are sorted front-to-back for the laterproper blending of transparencies. The partial-pixel pro-cessor determines whether to create a new list or to do

the insertion operation based on the handling informa-tion of the partial pixel from the pixel processor. Its mainjob is the managing of the list-sorting, merging, andpruning of data elements in the list.

4.1. Partial-pixel fragment buwer (PFB)

During the rasterization, partial pixels are stored in thePFB. We have shown the allocation unit of the PFB inFig. 3(b). To manage the PFB, we use an identi"cationnumber to point an entry of the PFB}FID (fragment ID).The FID has a number value between 0 and k!1, wherek"(total size of the PFB in bytes)/(the size of a PFB cellin bytes). For the simplicity in managing the PFB, weconstruct an FID allocation manager using a log

2k-bit

counter and a bu!er memory block, called the FIDrecycle bu!er (FRB), that temporally stores FIDs freed.During the rasterization, partial-pixel fragments are fre-quently created and deleted. If a PFB cell has been freed,we push its FID to the FRB. When an allocation requestfor a PFB cell exists, we check the FRB "rst and if there isat least one available FID, we use it. However, if there isno available FID in the FRB, we generate a new onefrom the new FID generation counter. The counter isreset at the beginning of a new frame rendering and itincrements its value by one when an allocation requestcomes with empty FRB. Fig. 4 shows the PFB allocationmanager schematically and Fig. 5 shows its pseudocodes. The size of the FRB is dependent on the scenecomplexity but 4K to 8K entries are su$cient for normalscenes. It extremely simpli"es the memory-managingoverhead of the PFB and can be implemented by hard-ware very easily.

4.2. Insertion operation

Pixel fragments come from the pixel processor witha two-bit tag that has the resultant information of thestoring test. It commands the partial-pixel processor to


Fig. 4. The PFB allocation manager.

Fig. 5. Pseudo-codes of PFB allocation manager (a) AllocFID,(b) DeallocFID.

Table 2Tag encoding for the partial-pixel processing

Bit value Description

00 Create a new list, CP after DP.01 Create a new list, DP after CP.1x Do the insertion operation to the existing list.

create a new list or do the insertion operation on theexisting list. Table 2 describes the bit information ofthe tag.

To create a new list, we allocate a PFB cell for thecurrent pixel fragment (CP, i.e. the source pixel-fragment)and the destination pixel fragment (DP, which comesfrom PB) each. We make a list on which nearer pixel-fragment comes "rst, and we register the FID of nearerpixel-fragment to the z "eld of the PB with a negativesign-bit. On the contrary, if the tag shows that therealready exists a list, we do the insertion operation. Basi-cally, we traverse the list to "nd out the insertion pointwhere the CP can be placed between two pixel-fragmentsbased on z values. During list traversal, we check thepossibility of the merging of the CP and the existingpartial fragment with the highest priority. Having reach-ed the insertion point without satisfying the mergingconditions, we do the insertion test. The insertion testexamines whether the CP is occluding more distant listelements, occluded by closer partial fragments, or inter-secting with an existing one. Depending on the insertion

test, the partial-pixel processor inserts the CP into thelist, replaces all of the elements behind the CP with theCP, or just discards the CP. We also examine the alphavalue of pixel fragments during the insertion test. Anypixel fragment that is occluded by a transparent pixelfragment will remain on the list for the proper blending oftransparencies if they are not fully occluded by anopaque pixel fragment.

4.3. Merging of partial-pixels

In the middle of the rasterization process, partial pixelfragments are generated at the boundary of triangles butmost of the partial pixels are not actually located at theboundary of an object. As shown in Fig. 6, partial pixelsthat are located inside an object can be merged. It isimportant to merge those partial-fragments because itsigni"cantly reduces the required storage of partial-pixelfragments during the rasterization. However, we must becareful in the merging of partial fragments because anincorrect merging of partial-pixels produces severe arti-facts in the rendered image. For the correct merging ofpartial pixels we use the object ID to distinguish betweenobjects during the merging process. However, we do notstore the object ID for pixels in the PB because we haveintroduced the object ID for the merging of partial pixelsto reduce the resultant number of partial pixel fragmentsby merging those intermediate partial pixels that aregenerated at the boundary of triangles of the same objectduring the rasterization process.

Fig. 7 shows our algorithm for the partial-pixelmerging. We have designed the subpixel mask generationmodule to generate subpixel masks along each side of anedge so that they have an exact inverted mask of eachother. Thus, Condition I in Fig. 7 means that thosesubpixels lie on the edges of triangles, and they exist inadjacent triangles. We have included Condition II forthose partial pixels that are produced at vertices of tri-angles. They lie in the same object but do not satisfyCondition I because the pixel data is contributed to bymore than two partial pixels. These strict merging condi-tions ensure minimized artifacts that can be produced byan inadequate merging condition in the rendered image.At the time of the merging of two partial pixel fragments,we select the pixel fragment data of the partial pixel that


Fig. 6. Partial pixels during the rasterization.

Fig. 7. The method for the partial pixel merging.

Fig. 8. The block diagram of the insertion operation module.

has a larger or equal coverage-area than the other one asthe resultant pixel-fragment data. It reduces the mergingoverhead signi"cantly in comparison to the blending ofthose partial-pixel fragment data. This simpli"cationdoes not a!ect the rendered image quality greatly.

4.4. Hardware implementation of the insertion operation

We have divided the insertion operation into threesubtasks. The "rst one creates a new list or invokes one ofthe other two subtasks for the insertion operation if thereis a list that has already been created. The second onedoes the insertion operation when the insertion test oc-curs in the middle of the list. Finally, the third one is forthe case where the list traversal reaches the end of the list.During the list traversal, insertion tests are accomplishedin parallel to check if one of the merging conditions issatis"ed or if the insertion point has been reached. Forthose tests, we need a register "le that stores three par-tial-pixel fragments for the previous pixel fragment (PF),the current pixel fragment (CF), and the next pixel frag-ment (NF) in the list. Fig. 8 shows the register "le anddata-processing modules for insertion operations. We dotests for the possible merging of partial pixels with the CPagainst the CF and the NF. If merging tests fail, we checkdepth values to determine whether the insertion pointhas been reached. If the CP is located between the CFand the NF or the CP intersects with the CF or the NF,that is the insertion point. We do down traversal of thelist if the above tests are not satis"ed. Finally, if we have

reached the last element in the list without any testsuccess, that is the insertion point as well. The par-tial-pixel processor always tries to #ush partial pixelfragments that are occluded by other partial pixels ora partial pixel list that is no longer needed after themerging of partial-pixels. After the merging of two partialpixel fragments, we reexamine whether the newly mergedpixel fragment occludes elements behind it or can bestored in the PB as a full pixel.

4.5. Object ID generation

An object ID is the serial number of an object and it isused for the proper merging of partial pixels in the samescene. It is, however, cumbersome to maintain and pro-cess it thoroughly by software. During the rendering ofa scene, the absolute value of an object ID is meaninglessbecause we use the number to distinguish between ob-jects in the partial pixel merging process. Therefore, inthe real implementation, object IDs can be generated bythe rendering hardware under the control of the render-ing software. When the rendering software informs thehardware renderer of a change of the object, the hard-ware renderer increments the object ID number and usesit as the object ID of subsequent incoming polygons. Thisscheme can reduce the managing overhead of object IDs.

5. Post-processing

After the rasterization of a frame of a scene, the post-processing of the rasterized scene is required in theSPARP. The Post-processor scans pixel fragments in thePB from the left top corner to the right bottom corner ofthe display region line by line. If a pixel fragment hasa positive z-value, its color value is copied to the SRBdirectly; otherwise, it has a partial pixel fragment list thatshould be blended at the subpixel level to "nd the "nalcolor value. If a pixel location has a list, the FID that


Fig. 9. An example sequence of the partial pixel blending.

Fig. 10. The block diagram of subpixel blending module.

resides in the z "eld is passed to the partial-pixel blenderfor the blending of the partial pixels in the subpixel level,which means that we expand a pixel into 4H4 subpixelsand do the blending of those subpixels in parallel.Fig. 9 shows an example sequence of the partial pixelblending. During the subpixel level blending, the testresult of the subpixel mask has the highest priority, i.e.a subpixel with turn-on mask-bit is modi"ed only. Afterthe blending of all partial-pixel fragments in the list,we "nd the average color value of subpixels and pass itto the SRB as the "nal color of that location. Fig. 10shows the block diagram of subpixel blending module.We use two di!erent types of blenders for RGB andAlpha channel.

6. Analysis

6.1. Storage complexity

In the SPARP as shown in Fig. 2, three distinct bu!ermemory blocks are required, PB, PFB and SRB. The sizeof the PFB and the SRB can be "xed by the displayresolution that we want to support. However, the re-quired size of the PFB depends on the complexity of thescene as well. Eq. (1) shows the calculation forms of therequired storage for the SPARP.

PBsize

"PBCellsize*

DISPresol

,

PFBsize

"PFBCellsize*

DISPresol*

PF¸ratio*

PF¸avg}length

,

SRBsize

"SRBCellsize*

DISPresol

,

¹O¹A¸size

"PBsize

#PFBsize

#SRBsize

, (1)

where

PF¸ratio

"NPFL

/DISPresol

,

PF¸avg}length"1

NPFL

NPFL~l+i/0

¸EN(PF¸i),

and NPFL

is the no. of pixel locations that have a partialpixel list.

For a comparison with the traditional Z-bu!er method,we have found the relative requirement size of the stor-age. In our comparison, we use numbers that are used inour prototype architecture. Table 3 shows the size ofelement cells in byte and the relative size to the size ofa Z-bu!er cell (we use 16 bits for an object ID). Fig. 11shows the storage requirement in the SPARP with thevariation of PF¸

ratioand PF¸

avg}length. In our experi-

ments, the average value of PF¸ratio

is 5.49% and that ofPF¸

avg}lengthis 3.10. In this case, the relative bu!er size of

the SPARP against the Z-bu!er is 2.75. For the render-ing performance enhancement of dynamic scenes, we canpipeline the rasterization process and the post process.This scheme, however, requires dual PBs and PFBs. Thisis the extra storage requirement in the SPARP. However,we can share some elements in the PBCell like dxZ anddyZ, so the required storage overhead for this scheme is


Table 3Cell size in the SPARP

Storage element Size in bytes Relative size

PBCell 13 1.86PFBCell 19 2.71SRBCell 3 0.43Total 35 5.0

Fig. 11. Storage requirement in the SPARP.

Fig. 12. Variation of the rasterization time in the SPARP.

actually 154% for the PB and 200% for the PFB. Fordouble bu!ering, we need one more copy of the SRB butthis is the same in the Z-bu!er method.

6.2. Time complexity

The rendering pipeline has been divided into twophases in the SPARP. As we have mentioned in theprevious section, the rasterization and the post process-ing can be fully pipelined in the rendering of dynamicscenes. Therefore, the overall rendering time of a scene isgiven by

R¹scene

"Max(RP¹scene

,PP¹scene

), (2)

where RP¹scene

is the rasterization process time of ascene and PP¹

scenethe post-processing time of a scene.

If FIFOs in Fig. 2 are so large that individual modulescan be operated with no hazard in writing their resultsthen we can "nd RP¹

sceneas

RP¹scene

"Max(SGtime

, SFtime

, PPtime

, IPtime

)#R¸time

,

(3)

where

SGtime

"[+Max(¸ENleft}edge

, ¸ENright}edge

)]*q*SGoverhead

,

SFtime

"(+¸ENratio

)*q*SFoverhead

,

PPtime

"[Max(DFC, S¹C)#=BC*=Bratio

]

*NPFscene*

q,

IPtime

"NPFscene*

INSratio*

¹Davg*

NCinsert*

q and

R¸time

is the rasterization latency time. DFC is the desti-nation pixel fetch cycles per pixel fragment, STC thestoring test cycles per pixel fragment, WBC the pixel datawrite back cycles per pixel fragment, =B

ratiothe pixel

data write back ratio, NPFscene

the no. of pixel fragmentsin a scene, INS

ratiothe no. of pixel fragments that are

processed by the partial-pixel Processor/NPFscene

, ¹Davg

the average depth of the list traversal per partial pixel,NC

insertthe average no. of cycles per insertion test and

q the cycle time.Fig. 12 shows the relative processing time requirement

in the rasterization process with the variation of INSratio

and average list length. The processing time is nor-malized by PP

timethe pixel processing time. As we can

see in the "gure, the performance limit with a low INSratio

comes from the span "lling operation in which memoryaccess is required. However, with a higher INS

ratio, this is

possible when a scene includes many transparent objects,IP

time(the insertion processing time) can limit the overall

performance.The length of the post-processing time can be found as

in Eq. (4). As we have mentioned in the previous section,the post-processor directly copies the pixel data to theSRB if a pixel is a full pixel. On the other hand, partialpixel blending is required for those pixels that consist ofmore than one partial pixel. The post-processor pushesthe head pointer of the partial pixel list to the input bu!erof the partial pixel blender and keeps fetching the nextpixel data. Therefore, the fetching operation and thepartial pixel blending operation run concurrently:

PP¹scene

"Max(PPCtime

, SBtime

), (4)

where

PPCtime

"PPfetch*

DISPresol*

q,

SBtime

"PF¸ratio*

DISPresol*

NCblend*

q,

PPfetch

the no. of PB fetch cycles per pixel and, with NCblend

the no. of average subpixel blending cycles per PF¸.


Fig. 13. Variation of the post-processing time in the SPARP.

Table 4Gate counts estimation of the SPARP

Module name Gate counts

Span generator 11,100Span "ller 5,454Pixel processor 1,495Partial-pixel processor 12,071Post processor#Partial-pixel blender 45,295Control FSMs 1,720Subpixel mask ROM 16,384FIFOs 65,940Total 159,459

Fig. 14. Sample rendered images.

Fig. 13 shows the processing time requirement inthe post-processing phase with the variation of PF¸

ratio.

The processing time is normalized by PPCtime

theamount of time required for fetching of all pixels inthe PB.

In general, the length of the time that is required forthe post-processing is less than the time required for therasterization phase. Therefore, the rasterization phaselimits the overall rendering performance with an equalnumber of processing units for the rasterization and thepost-processing. With moderate size of FIFOs, we canget a rendering performance that is degraded by a factorof 0.6}0.7. The major cause of this degradation is theconcentration of partial pixels in a small region thatcauses the stalling of the pixel processing to wait for anavailable entry in the partial pixel FIFO by the comple-tion of the insertion operation.

7. Example

We described the SPARP in HDL and fully con"rmedthrough logic simulations. Bu!er memory blocks are

described behaviorally and other parts are described inthe RTL level. We assume that the normal SDRAMbu!ers are used for the PB and the SRB. In case of thePFB, a longer access time due to page breaks in fetching


Table 5Statistics of the rendered images in Fig. 14

Fig. 14(a) Fig. 14(b) Fig. 14(c)

No. of objects 126 78 17No. of triangles 53,556 6,580 33,822No. of triangles after backface cull 25,366 3,122 21,936No. of total fragments 639,957 220,715 666,750fragments/triangle 25.21 70.69 30.39No. of fragments processed by the partial-pixel processor 311,114 111,936 587,030Total rasterization cycles 1.90]106 0.61]106 2.50]106

Rasterization cycles/fragment 2.97 2.75 3.75No. of merged fragments 172,450 28,588 198,553No. of recycled PFB cell 143,886 46,479 312,570Max. FRB pointer value 7,111 3,012 2,304No. of allocated PFB cells 65,000 32,971 85,465PF¸

ratio5.5% 3.46% 7.67%

PF¸avglength

3.41 3.09 3.62Total post-processing cycles 0.34]106 0.33]106 0.37]106

Expected performance @100 MHz (fragments/s) 33.7]106 36.4]106 26.7]106

Fig. 15. Partial pixel allocation map of the image in Fig. 14(a).

of list elements can a!ect the overall partial-pixel pro-cessing performance. Currently, we assume an SRAMbu!er for the PFB. However, owing to the linked liststructure of the storing scheme of partial-pixels a pre-fetching method can be used to minimize the perfor-mance degradation when an SDRAM bu!er is used forthe PFB. Table 4 shows the estimated gate counts of theSPARP. Fig. 14 shows rendered images that are a few ofthe images used in our experiments. Table 5 summarizesthe rendering statistics of the images; the numbers aregathered by logic simulation of SPARP HDL model.Fig. 15 is the allocation map of the PFB for the image inFig. 14(a). The darker point represents where morememory units are assigned. Fig. 16 is an enlarged viewof a small region in Fig. 14(a).

8. Discussion

There are two issues that we should mention: theover#ow of the FRB and the shortage of the PFB mem-ory. The FRB temporarily stores the freed FIDs for lateruse. In the middle of rasterization, if a big triangle oc-cludes many partial pixels behind it then the number offreed FIDs can exceed the number of the FRB entry. Thiscauses the loss of the amount storage when FIDs are notrecycled via the FRB. This is the worst case in our PFBmemory-managing scheme. However, the loss of the PFBmemory is con"ned to the rendering of the frame. For thenext frame rendering, the lost storage is recovered auto-matically. Furthermore, if there is a su$cient amount ofthe PFB memory, the lack of entries in the FRB does notcause any artifacts in the rendered image. However,a lack of FRB entry can cause a lack of the PFB memory

due to the bad recycling of PFB cells; therefore, weshould assign a su$cient amount of memory to the FRB.In our experiment, an 8K-entry bu!er is moderate anda 16K-entry bu!er is su$cient. Another di$culty in theSPARP is that the size of the PFB cannot be "xed bydisplay resolution. The required size of the PFB alsodepends on the characteristics of scenes. (A scene withunlimited transparent objects requires unlimited partialpixel storage.) Due to this algorithmic characteristic, wecan include a display region-partitioning scheme to en-sure the generality of the scene rendering at the time ofhardware implementation. Under the given size of thePFB, we can render the whole frame of most images atonce. When a scene is too complex to render it at oncedue to the shortage of the PFB storage, we can render theimage in region by region. However, this should be


Fig. 16. An enlarged view of a small region in Fig. 14(a).

avoided for real-time applications by adding more mem-ory to the PFB.

9. Conclusion

For the generation of high-quality images in real time,antialiasing is one of the key techniques to be acceleratedby hardware. In this paper we presented a rasterizationprocessor architecture for single-pass antialiased render-ing. In the SPARP we proposed an e$cient storagemanagement scheme to handle partial pixels. Correctvisible fraction "nding of partial pixels and blending ofthem are solved at the post processing stage. Withstraightforward implementation of the SPARP it re-quires 2.75 times bu!er memory in comparison to the

Z-bu!er in sequential rasterization and post-processing.Rasterization and post-processing can e!ectively be pro-cessed in parallel with 4.20 times bu!er memory in com-parison to the Z-bu!er. The rasterization time in theSPARP increased by a factor of 1.4}1.7 relative to theZ-bu!er-based rasterizer depending on the scene com-plexity. The post-processing time is less than the rasteriz-ation time for most of scenes except those transparentobjects occupy large portion of the display region ina scene. The SPARP is an e$cient architecture forsingle-pass antialiased rasterization compared to super-sampling method. With the progress of semiconductormemory technologies that signi"cantly reduce bu!ermemory cost, we expect single pass antialiased rasteriz-ation processors, like the SPARP, adopted for PCgraphics hardware accelerators in the near future.

References

[1] Dippe MAZ, Wold EH. Antialiasing through stochasticsampling. Computer Graphics 1985;19:69}78.

[2] Carpenter L. The A-bu!er, an antialiased hidden surfacemethod. Computer Graphics 1984;18(3):103}8.

[3] Catmull E. A hidden surface algorithm with anti-aliasing.Computer Graphics 1978;12(3):6}11.

[4] Schilling A, Straber W. EXACT: algorithm and hard-ware architecture for an improved A-bu!er. ComputerGraphics 1993;27:85}91.

[5] Torborg, J., Kajiya, J.T. Talisman: commodity realtime 3Dgraphics for the PC. Computer Graphics Proceedings,1996, 353}63.

[6] Schilling A. A new simple and e$cient antialiasing withsubpixel masks. Computer Graphics 1991;25:133}41.


Documents

SPARP: a single pass antialiased rasterization processor