36

The InterBase and Firebird Developer Magazine, Issue 2, 2005

Embed Size (px)

DESCRIPTION

The second issue of "The InterBase and Firebird Developer Magazine", it was published in 2005. This issue is devoted to internals of Firebird locking (article by Ann W Harrison), "Testing NO SAVEPOINT feature in InterBase" by Vlad Khorsun, "Object Oriented Programming with Firebird" by Vladimir Kotlyarevsky. Dmitry Kuzmenko, CEO of IBSurgeon, wrote an article "Using IBAnalyst". There is also a nice article regarding replication basics and specific implementation with CopyCat replication software.

Citation preview

Page 1: The InterBase and Firebird Developer Magazine, Issue 2, 2005

Contents

2

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

ContentsEditorrsquos noteRock around the blogby Alexey Kovyazin 3

Firebird conference by Helen E M Borrie 4

Oldest ActiveOn Silly Questions and Idiotic Outcomes by Helen E M Borrie 5

Server internalsCover storyLocking Firebird and the Lock Table by Ann W Harrison 6

Inside BLOBs by Dmitri Kouzmenko and Alexey Kovyazin 11

TestBedTesting NO SAVEPOINT in InterBase 751 by Vlad Horsun Alexey Kovyazin 13

Development areaObject-Oriented Development in RDBMS Part 1 by Vladimir Kotlyarevsky 22

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

Replicating and synchronizingInterbaseFireBird databases using CopyCat by Jonathan Neve 27Using IBAnalystby Dmitri Kouzmenko 31Readers feedbackComments to ldquoTemporary tablesrdquo article by Volker Rehn 35

Miscellaneous 36

Best viewed with Acrobat Reader 7Download now

Donations

Alexey KovyazinChief Editor

Helen BorrieEditor

Dmitri KouzmenkoEditor

Noel CosgraveSub-editor

Lev TashchilinDesigner

NatalyaPolyanskayaBlog editor

Credits

Magazine CD

column ldquoOldest Activerdquo by HelenBorrie No need to say more aboutauthor of ldquoThe Firebird Bookrdquo justread it In this issue Helen looks atthe phenomenon of support listsand their value to the Firebird com-munity She takes a swing at someof the unproductive things that listposters do in this topic titled ldquoOnSilly Questions and Idiotic Out-comesrdquo

Letrsquos blog again

Now it is a good time to say a fewwords about our new web-presen-tation Weve started a blog-styleinterface for our magazine ndash take alook on wwwibdevelopercom ifyou havent already discovered itNow you can make comments onany article or message In futurewersquoll publish all the materials relat-ed to the PDF issues along with spe-cial bonus articles and materialsYou can look out for previews ofarticles drafts and behind-the-scene community discussionsPlease feel welcome to blog with us

Donations

The magazine is free for readers

Editorrsquos note

3

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Rock around the block

Rock around the blogEditorrsquos note

By Alexey KovyazinDear readers

I am glad to introduce you thesecond issue of ldquoThe Inter-Base and Firebird DeveloperMagazinerdquo We received a lotof emails with kind words andcongratulations which havehelped us in creating thisissue Thank you very much

In this issue

The cover story is the newestEpisode from Ann W HarrisonldquoLocking Firebird and the LockTablerdquo The article covers the lock-ing issue from general principles tospecific advice so everyone willfind a bit thats interesting or inform-ative

We continue the topic of savepointsinternals with an article ldquoTestingNO SAVEPOINT in InterBase751rdquo Along with interesting prac-tical test results you will find adescription of UNDO log workingsin InterBase 751

We publish the small chapterldquoInside BLOBsrdquo from the forthcom-ing book ldquo1000 InterBaseampFirebird

TipsampTricksrdquo by Dmitri Kouzmenkoand Alexey Kovyazin

Object oriented development hasbeen a high-interest topic for manyyears and it is a still hot topic Thearticle ldquoObject-Oriented Develop-ment in RDBMSrdquo explores the prac-tical use of OOD principles in thedesign of InterBase or Firebirddatabases

Replication is a problem whichfaces every database developersooner or later The article ldquoRepli-cating and synchronizing Inter-BaseFirebird databases usingCopyCatrdquo introduces the approachused in the new CopyCat compo-nent set and the CopyTiger tool

The last article by Dmitri Kouz-menko ldquoUnderstanding Your Data-base with IBAnalystrdquo provides aguide for better understanding ofhow databases work and how tun-ing their parameters can optimizeperformance and avoid bottle-necks

ldquoOldest Activerdquo column

I am very glad to introduce the new

but it is a considerable expense forits publisher We must pay for arti-cles editing and proofreadingdesign hosting etc In future wersquolltry to recover costs solely fromadvertising but for these initialissues we need your support

See details here

httpibdevelopercomdonations

On the Radar

Irsquod like to introduce the severalprojects which will be launched inthe near future We really needyour feedback so please do nothesitate to place your comments inblog or send email to us(readersibdevelopercom)

Magazine CD

We will issue a CD with all 3 issuesof ldquoThe InterBase and FirebirdDeveloper Magazinerdquo in Decem-ber 2005 (yes a Christmas CD)

Along with all issues in PDF andsearchable html-format you will findexclusive articles and bonus materi-als The CD will also include freeversions of popular software relat-

Editorrsquos note

4

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Rock around the blockFirebird conference

ed to InterBase and Firebird andoffer very attractive discounts fordozens of the most popular prod-ucts

The price for CD is USD$999 plusshipping

For details seewwwibdevelopercommagazine-cd

Paper versionof the magazine

In 2006 we intend to issue the firstpaper version of ldquoThe InterBaseand Firebird Developer Maga-zinerdquo The paper version cannot befree because of high productioncosts

We intend to publish 4 issues peryear with a subscription price ofaround USD$49 The issue volumewill be approximately 64 pages

In the paper version we plan toinclude the best articles from theonline issues and of course exclu-sive articles and materials

To be or not be ndash this is the questionthat only you can answer If theidea of subscribing to the paperversion appeals to you pleaseplace a comment in our blog(httpibdevelopercompaper-version) or send email toreadersibdevelopercom withyour pro and cons We look for-

This issue of our magazine almostcoincides with the third annualFirebird World Conference start-ing November 13 in PragueCzech Republic This years con-ference spans three nights with atight programme of back-to-backand parallel presentations overtwo days With speakers in atten-dance from all across Europe andthe Americas presenting topicsthat range from specialised appli-cation development techniques tofirst-hand accounts of Firebirdinternals from the core develop-ers this one promises to be thebest ever

Registration has been well underway for some weeks nowalthough the time is now past forearly bird discounts on registra-tion fees At the time of publish-ing bookings were still beingtaken for accommodation in theconference venue itself HotelOlsanka in downtown PragueThe hotel offers a variety of roomconfigurations including bed andbreakfast if wanted at modesttariffs Registration includes

FirebirdConference

lunches on both conference daysas well as mid-moring and mid-afternoon refreshments

Presentations include one on Fire-birds future development fromthe Firebird Project CoordinatorDmitry Yemanov and anotherfrom Alex Peshkov the architectof the security changes coming inFirebird 2 Jim Starkey will bethere talking about Firebird Vul-can and members of the SASInstitute team will talk aboutaspects of SASs taking on Fire-bird Vulcan as the back-end to itsillustrious statistical softwareMost of the interface develop-ment tools are on the menuincluding Oracle-mode FirebirdPHP Delphi Python and Java(Jaybird)

It is a dont miss for any Firebirddeveloper who can make it toPrague Link to details and pro-gramme either athttpfirebirdsourceforgenetindexphpop=konferenz or at theIBPhoenix websitehttpwwwibphoenixcom

ward for your feedback

Invitation

As a footnote Irsquod like to invite

all people who are in touch withFirebird and InterBase to partici-pate in the life of the communityThe importance of a large activecommunity to any public project --and a DBMS with hundreds of thou-sands of users is certainly public --cannot be emphasised enough It isthe key to survival for such projectsRead our magazine ask your ques-tions on forums and leave com-ments in blog just rock and roll withus

Sincerely

Alexey Kovyazin

Chief Editor

editoribdevelopercom

comingsoon

Oldest Active

5

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

On Silly Questionsand Idiotic Outcomes

Firebird owes its existence to the communitys support lists Though itsall ancient history now if it hadnt been for the esprit de corps among theregulars of the old InterBase support list hosted by merscom in thenineties Borlands shock-horror decision to kill off InterBase develop-ment at the end of 1999 would have been the end of it for the English-speaking users at least

Firebirds support lists on Yahoo and Sourceforge were born out of thesame cataclysm of fatal fire and regeneration that caused the phoenixdrawn from the fire against extreme adversity to take flight as Firebirdin July 2000 Existing and new users both of Firebird and of the richselection of drivers and tools that spun off from and for it depend heav-ily on the lists to get up to speed with the evolution of all of these soft-ware products

Newbies often observe that there is just nothing like our lists for anyother software be it open source or not One of the things that I thinkmakes the Firebird lists stand out from the crowd is that once gripped bythe power of the software our users never leave the lists Instead theystick around learn hard and stay ready and willing to impart to otherswhat they have learnt After nearly six years of this across a dozen listsour concentrated mix of expertise and fidelity is hard to beat

Now doesnt all this make us all feel warm and fuzzy For those of usin the hot core of this voluntary support infrastructure the answer has tobe unfortunately Not all the time There are some bugbears that canmake the most patient person see red Im using this column to drawattention to some of the worst

First and worst is silly questions You have all seen them Perhapsyou even posted them Subject Help Body Every time I try toconnect to Firebird I get the message Connection refused Whatam I doing wrong What follows from that is a frustrating andtedious game for responders What version of Firebird are youusing Which server model Which platform Which toolWhat connection path (By connection path we mean)

Everyones time is valuable Nobody has the luxury of beingable to sit around all day playing this game If we werent will-ing to help we wouldnt be there But you make it hard or even

impossible when you post prob-lems with no descriptions Youwaste our time You waste yourtime We get frustrated becauseyou dont provide the facts youget frustrated because we cantread your mind And the list getsfilled with noise

If theres something Ive learnt inmore than 12 years of on-linesoftware support it is that a prob-lem well described is a problemsolved If a person applies timeand thought to presenting a gooddescription of the problemchances are the solution will hopup and punch him on the noseEven if it doesnt good descrip-tions produce the fastest rightanswers from list contributorsFurthermore those good descrip-tions and their solutions form apowerful resource in the archivesfor others coming along behind

Off-topic questions are anothersource of noise and irritation in thelists At the Firebird website wehave bent over backwards tomake it very clear which lists areappropriate for which areas ofinterest An off-topic posting iseasily forgiven if the poster polite-ly complies with a request from alist regular to move the question tox list When these events insteadbecome flame threads or whenthe same people persistently reof-fend they are an extreme burdenon everyone

In the highly tedious categorywe have the kind of question thatgoes like this Subject Bug inFirebird SQL Body This state-ment works inMSSQLAccessMySQLinsertany non-standard DBMS namehere It doesnt work in FirebirdWhere should I report this bugTo be fair George Bernard Shawwasnt totally right when he saidIgnorance is a sin However

you do at least owe it to yourselfto know what you are talkingabout and not to present yourproblem as an attack on our soft-ware Whether intended or not itcomes over as a troll and youcome across as a fool Theres astrong chance that your attitudewill discourage anyone frombothering with you at all Its no-win sure but its also a reflectionof human nature You reap whatyou sow

In closing this little tirade I justwant to say that I hope Ive strucka chord with those of our reader-ship who struggle to get satisfac-tion from their list postings It willbe a rare question indeed that hasno answer Those rarities well-presented become excellent bugreports If youre not getting ananswer theres a high chance thatyou asked a silly question

Helen E M Borrie

helebortpgcomau

On Silly Questions and Idiotic Outcomes

Cover story

6

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

Firebird WorldwideConference 2005

The Firebird conferencewill take place

at the Hotel Olsankain Prague

Czech Republicfrom the evening

of Sunday the 13thof November

(opening session)until the evening

of Tuesday the 15thof November

(closing session)

News amp Events

One of the best-known facts aboutFirebird is that it uses multi-versionconcurrency control instead ofrecord locking For databases thatmanage concurrency thoughrecord locks understanding mecha-nisms of record locking is critical todesigning a high performance highconcurrency application In gener-al record locking is irrelevant toFirebird However Firebird useslocks to maintain internal consisten-cy and for interprocess coordina-tion Understanding how Firebirddoes (and does not) use locks helpswith system tuning and anticipatinghow Firebird will behave underload

Irsquom going to start by describinglocking abstractly then Firebirdlocking more specifically then thecontrols you have over the Firebirdlock table and how they can affectyour database performance So ifyou just want to learn about tuningskip to the end of the article

Concurrency control

Concurrency control in databasesystems has three basic functionspreventing concurrent transactionsfrom overwriting each othersrsquochanges preventing readers fromseeing uncommitted changes andgiving a consistent view of the data-

Locking Firebird and the Lock Tablebase to running transactions Thosethree functions can be implementedin various ways The simplest is toserialize transactions ndash allowingeach transaction exclusive access tothe database until it finishes Thatsolution is neither interesting nordesirable

A common solution is to allow eachtransaction to lock the data it useskeeping other transactions fromreading data it changes and fromchanging data it reads Moderndatabases generally lock recordsFirebird provides concurrency con-trol without record locks by keepingmultiple versions of records eachmarked with the identifier of thetransaction that created it Concur-rent transactions donrsquot overwriteeach otherrsquos changes because thesystem will not allow a transactionto change a record if the mostrecent version was created by aconcurrent transaction Readersnever see uncommitted databecause the system will not returnrecord versions created by concur-rent transactions Readers see aconsistent version of the databasebecause the system returns only thedata committed when they start ndashallowing other transactions to cre-ate newer versions Readers donrsquotblock writers

A lock sounds like a very solidobject but in database systems alock anything but solid ldquoLockingrdquo isa shorthand description of a proto-col for reserving resources Data-bases use locks to maintain consis-tency while allowing concurrentindependent transactions to updatedistinct parts of the database Eachtransaction reserves the resources ndashtables records data pages ndash that itneeds to do its work Typically thereservation is made in memory tocontrol a resource on disk so thecost of reserving the resource is notsignificant compared with the costof reading or changing theresource

Locking example

ldquoResourcerdquo is a very abstract termLets start by talking about lockingtables Firebird does lock tables butnormally it locks them only to pre-vent catastrophes like having onetransaction drop a table that anoth-er transaction is using

Before a Firebird transaction canaccess a table it must get a lock onthe table The lock prevents othertransactions from dropping thetable while it is in use When atransaction gets a lock on a tableFirebird makes an entry in its tableof locks indicating the identity of

the transaction that has the lock theidentity of the table being lockedand a function to call if anothertransaction wants an incompatiblelock on the table The normal tablelock is shared ndash other transactionscan lock the same table in the sameway

Before a transaction can delete atable it must get an exclusive lockon the table An exclusive lock isincompatible with other locks If anytransaction has a lock on the tablethe request for an exclusive lock isdenied the drop table statementfails Returning an immediate erroris one way of dealing with conflict-ing lock requests The other is to putthe conflicting request on a list ofunfulfilled requests and let it wait forthe resource to become available

In its simplest form that is how lock-ing works All transactions followthe formal protocol of requesting alock on a resource before using itFirebird maintains a list of lockedresources a list of requests for lockson resources ndash satisfied or waitingndash and a list of the owners of lockrequests When a transactionrequests a lock that is incompatiblewith existing locks on a resource

Author Ann W Harrisonaharrisonibphoenixcom

Requestfor Sponsors

How to Register

ConferenceTimetable

ConferencePapers

and Speaker

Cover story

7

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

Firebird either denies the newrequest or puts it on a list to waituntil the resource is available Inter-nal lock requests specify whetherthey wait or receive an immediateerror on a case-by-case basisWhen a transaction starts it speci-fies whether it will wait for locks thatit acquires on tables etc

Lock modes

For concurrency and read commit-ted transactions Firebird lockstables for shared read or sharedwrite Either mode says ldquoIrsquom usingthis table but you are free to use ittoordquo Consistency mode transac-tions follow different rules Theylock tables for protected read orprotected write Those modes sayldquoIrsquom using the table and no one elseis allowed to change it until Irsquomdonerdquo Protected read is compati-ble with shared read and other pro-tected read transactions Protectedwrite is only compatible with shareread

The important concept about lockmodes is that locks are more subtlethan mutexes ndash locks allowresource sharing as well as protect-ing resources from incompatibleuse

Two-phase locking vs tran-sient locking

The table locks that wersquove been

describing follow a protocol knownas two-phase locking which is typi-cal of locks taken by transactions indatabase systems Databases thatuse record locking for consistencycontrol always use two-phaserecord locks In two-phase lockinga transaction acquires locks as itproceeds and holds the locks until itends Once it releases any lock itcan no longer acquire another Thetwo phases are lock acquisition andlock release They cannot overlap

When a Firebird transaction readsa table it holds a lock on that tableuntil it ends When a concurrencytransaction has acquired a sharedwrite lock to update a table no con-sistency mode transaction will beable to get a protected lock on thattable until the transaction with theshared write lock ends and releasesits locks Table locking in Firebird istwo-phase locking

Locks can also be transient takenand released as necessary duringthe running of a transaction Fire-bird uses transient locking exten-sively to manage physical access tothe database

Firebird page locks

One major difference between Fire-bird and most other databases isFirebirdrsquos Classic mode In Classicmode many separate processesshare write access to a single data-

base file Most databases have asingle server process like Super-Server that has exclusive access tothe database and coordinatesphysical access to the file within

itself Firebird coordinates physicalaccess to the database throughlocks on database pages

In general database theory a trans-

Cover story

8

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

action is a set of steps that transformthe database from on consistentstate to another During that trans-formation the resources held by thetransaction must be protected fromincompatible changes by othertransactions Two-phase locks arethat protection

In Firebird internally each time atransaction changes a page itchanges that page ndash and the physi-cal structure of the database as awhole ndash from one consistent state toanother Before a transaction readsor writes a database page it locksthe page When it finishes readingor writing it can release the lockwithout compromising the physicalconsistency of the database fileFirebird page level locking is tran-sient Transactions acquire andrelease page locks throughout theirexistence However to preventdeadlocks transactions must beable to release all the page locks itholds before acquiring a lock on anew page

The Firebird lock table

When all access to a database isdone in a single process ndash as is thecase with most database systems ndashlocks are held in the serverrsquos memo-ry and the lock table is largely invis-ible The server process extends orremaps the lock information asrequired Firebird however man-ages its locks in a shared memory

section In SuperServer only theserver uses that share memoryarea In Classic every databaseconnection maps the shared memo-ry and every connection can readand change the contents of thememory

The lock table is a separate piece ofshare memory In SuperServer thelock table is mapped into the serverprocess In Classic each processmaps the lock table All databaseson a server computer share thesame lock table except those run-ning with the embedded server

The Firebird lock manager

We often talk about the FirebirdLock Manager as if it were a sepa-rate process but it isnrsquot The lockmanagement code is part of theengine just like the optimizer pars-er and expression evaluator Thereis a formal interface to the lockmanagement code which is similarto the formal interface to the distrib-uted lock manager that was part ofVAXVMS and one of the inter-faces to the Distributed Lock Man-ager from IBM

The lock manager is code in theengine In Classic each process hasits own lock manager When aClassic process requests or releasesa lock its lock management codeacquires a mutex on the sharedmemory section and changes the

state of the lock table to reflect itsrequest

Conflicting lock requests

When a request is made for a lock

on a resource that is already lockedin an incompatible mode one oftwo things happens Either therequesting transaction gets animmediate error or the request is

Registering for the Conference

Call for papers

Sponsoring the Firebird Conference

httpfirebird-conferencecom

Cover story

9

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

put on a list of waiting requests andthe transactions that hold conflictinglocks on the resource are notified ofthe conflicting request Part of everylock request is the address of a rou-tine to call when the lock interfereswith another request for a lock onthe same object Depending on theresource the routine may cause thelock to be released or require thenew request to wait

Transient locks like the locks ondatabase pages are releasedimmediately When a transactionrequests a page lock and that pageis already locked in an incompati-ble mode the transaction or trans-actions that hold the lock are noti-fied and must complete what theyare doing and release their locksimmediately Two-phase locks liketable locks are held until the trans-action that owns the lock completesWhen the conflicting lock isreleased and the new lock is grant-ed then transaction that had beenwaiting can proceed

Locks as interprocess com-munication

Lock management requires a highspeed completely reliable commu-nication mechanism between trans-actions including transactions indifferent processes The actualmechanism varies from platform toplatform but for the database to

work the mechanism must be fastand reliable A fast reliable inter-process communication mechanismcan be ndash and is ndash useful for a num-ber of purposes outside the areathatrsquos normally considered data-base locking

For example Firebird uses the locktable to notify running transactionsof the existence of a new index on atable Thatrsquos important since assoon as an index becomes activeevery transaction must help main-tain it ndash making new entries when itstores or modifies data removingentries when it modifies or deletesdata

When a transaction first referencesa table it gets a lock on the exis-tence of indexes for the tableWhen another transaction wants tocreate a new index on that table itmust get an exclusive lock on theexistence of indexes for the tableIts request conflicts with existinglocks and the owners of those locksare notified of the conflict Whenthose transactions are in a statewhere they can accept a new indexthey release their locks and imme-diately request new shared locks onthe existence of indexes for thetable The transaction that wants tocreate the index gets its exclusivelock creates the index and com-mits releasing its exclusive lock onthe existence of indexing As othertransactions get their new locks

they check the index definitions forthe table find the new index defini-tion and begin maintaining theindex

Firebird locking summary

Although Firebird does not lockrecords it uses locks extensively toisolate the effects of concurrenttransactions Locking and the locktable are more visible in Firebirdthan in other databases because thelock table is a central communica-tion channel between the separateprocesses that access the databasein Classic mode In addition to con-trolling access to database objectslike tables and data pages the Fire-bird lock manager allows differenttransactions and processes to notifyeach other of changes to the state ofthe database new indexes etc

Lock table specifics

The Firebird lock table is an in-mem-ory data area that contains of fourprimary types of blocks The lockheader block describes the locktable as a whole and containspointers to lists of other blocks andfree blocks Owner blocks describethe owners of lock requests ndash gen-erally lock owners are transactionsconnections or the SuperServerRequest blocks describe the rela-tionship between an owner and alockable resource ndash whether therequest is granted or pending the

mode of the request etc Lockblocks describe the resources beinglocked

To request a lock the owner findsthe lock block follows the linked listof requests for that lock and addsits request at the end If other own-ers must be notified of a conflictingrequest they are located throughthe request blocks already in the listEach owner block also has a list ofits own requests The performancecritical part of locking is finding lockblocks For that purpose the locktable includes a hash table foraccess to lock blocks based on thename of the resource being locked

A quick refresher on hashing

A hash table is an array with linkedlists of duplicates and collisionshanging from it The names of lock-able objects are transformed by afunction called the hash functioninto the offset of one of the elementsof the array When two namestransform to the same offset theresult is a collision When two lockshave the same name they areduplicates and always collide

In the Firebird lock table the arrayof the hash table contains theaddress of a hash block Hashblocks contain the original name acollision pointer a duplicate point-er and the address of the lock blockthat corresponds to the name Thecollision pointer contains the

Cover story

10

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

address of a hash block whosename hashed to the same valueThe duplicate pointer contains theaddress of a hash block that hasexactly the same name

A hash table is fast when there arerelatively few collisions With nocollisions finding a lock blockinvolved hashing the name index-ing into the array and reading thepointer from the first hash blockEach collision adds another pointerto follow and name to check Theratio of the size of the array to thenumber of locks determines thenumber of collisions Unfortunatelythe width of the array cannot beadjusted dynamically because thesize of the array is part of the hashfunction Changing the widthchanges the result of the functionand invalidates all existing entries inthe hash table

Adjusting the lock table toimprove performance

The size of the hash table is set inthe Firebird configuration file Youmust shut down all activity on alldatabases that share the hash tablendash normally all databases on themachine ndash before changes takeeffect The Classic architecture usesthe lock table more heavily thanSuperServer If you choose theClassic architecture you shouldcheck the load on the hash tableperiodically and increase the num-

ber of hash slots if the load is highThe symptom of an overloadedhash table is sluggish performanceunder load

The tool for checking the lock tableis fb_lock_print which is a com-mand line utility in the bin directoryof the Firebird installation tree Thefull lock print describes the entirestate of the lock table and is of limit-ed interest When your system isunder load and behaving badlyinvoke the utility with no options orswitches directing the output to afile Open the file with an editorYoull see output that starts some-thing like this

LOCK_HEADER BLOCK Version114 Active owner0 Length262144 Used85740 Semmask0x0 Flags 0x0001 Enqs 18512 Converts 490 Rejects0 Blocks 0 Deadlock scans0 Deadlocks0 Scan interval10 Acquires 21048 Acquire blocks0 Spin count0 Mutex wait103 Hash slots101 Hash lengths (minavgmax)3 15 30

hellipThe seventh and eighth lines suggestthat the hash table is too small andthat it is affecting system perform-ance In the example these valuesindicate a problem

Mutex wait 103 Hash slots 101 Hash lengths (minavgmax)3 15 30 In the Classic architecture eachprocess makes its own changes tothe lock table Only one process is

allowed to update the lock table atany instant When updating the locktable a process holds the tablersquosmutex A non-zero mutex wait indi-cates that processes are blocked bythe mutex and forced to wait foraccess to the lock table In turn thatindicates a performance probleminside the lock table typicallybecause looking up a lock is slow

If the hash lengths are more thanmin 5 avg 10 or max 30 youneed to increase the number ofhash slots The hash function used inFirebird is quick but not terribly effi-cient It works best if the number ofhash slots is prime

Change this line in the configurationfileLockHashSlots = 101

Uncomment the line by removing

the leading Choose a value thatis a prime number less than 2048

LockHashSlots = 499

The change will not take effect untilall connections to all databases onthe server machine shut down

If you increase the number of hashslots you should also increase thelock table size The second line ofthe lock print

Version114 Active owner 0Length 262144 Used 85740

tells you how close you are to run-ning out of space in the lock tableThe Version and Active owner areuninteresting The length is the max-imum size of the lock table Used isthe amount of space currently allo-cated for the various block typesand hash table If the amount usedis anywhere near the total lengthuncomment this parameter in theconfiguration file by removing theleading and increase the value

LockMemSize = 262144

to this

LockMemSize = 1048576

The value is bytes The default locktable is about a quarter of amegabyte which is insignificant onmodern computers Changing thelock table size will not take effectuntil all connections to all databaseson the server machine shut down

PHP Server

One of the more interest-ing recent developmentsin information technolo-gy has been the rise ofbrowser based applica-tions often referred toby the acronym LAMP

One key hurdle forbroad use of the LAMPtechnology for mid-mar-ket solutions was that itwas never easy to con-figure and manage

PHPServer changes thatit can be installed withjust four clicks of themouse and support forFirebird is compiled in

PHPServer shares all thequalities of Firebird it isa capable compacteasy to install and easyto manage solution

PHPServer is a freedownload

Read more at

wwwfyracleorgphpserverhtml

News amp Events

Server internals

11

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Inside BLOBs

How the serverworks with BLOBs

The BLOB data type is intended forstoring data of variable size Fieldsof BLOB type allow for storage ofdata that cannot be placed in fieldsof other types - for example pic-tures audio files video fragmentsetc

From the point view of the databaseapplication developer using BLOBfields is as transparent as it is forother field types (see chapter ldquoDatatypesrdquo for details) However thereis a significant difference betweenthe internal implementation mecha-nism for BLOBs and that for otherdata

Unlike the mechanism used for han-dling other types of fields the data-base engine uses a special mecha-nism to work with BLOB fields Thismechanism is transparently inte-grated with other record handlingat the application level and at thesame time has its own means ofpage organization Lets consider indetail how the BLOB-handlingmechanism works

Inside BLOBsThis is an excerpt from the book ldquo1000 InterBase amp Firebird Tips amp Tricksrdquoby Alexey Kovyazin and Dmitri Kouzmenko which will be published in 2006

Initially the basic record data onthe data page includes a referenceto a ldquoBLOB recordrdquo for each non-null BLOB field ie to record-likestructure or quasi-record that actu-

ally contains the BLOB dataDepending on the size of the BLOBthis BLOB-record will be one ofthree types

The first type is the simplest If thesize of BLOB-field data is less thanthe free space on the data page it isplaced on the data page as a sepa-rate record of BLOB type

Author Dmitri Kouzmenkokdvib-aidcom

Author Alexey Kovyazinakib-aidcom

Server internals

12

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Inside BLOBs

The second type is used when thesize of BLOB is greater than thefree space on the page In thiscase references to pages contain-ing the actual BLOB data arestored in a quasi-record Thus atwo-level structure of BLOB-fielddata is used

If the size of BLOB-field contents isvery large a three-level structureis used ndash a quasi-record stores ref-erences to BLOB pointer pageswhich contain references to theactual BLOB data

The whole structure of BLOB stor-age (except for the quasi-recordof course) is implemented by onepage type ndash the BLOB page typeDifferent types of BLOB-pages dif-fer from each other in the presenceof a flag (value 0 or 1) defininghow the server should interpret thegiven page

BLOB Page

The blob page consists of the fol-lowing parts

The special header contains thefollowing information

bullThe number of the first blob pagein this blob It is used to check thatpages belong to one blob

bullA sequence number This isimportant in checking the integrityof a BLOB For a BLOB pointerpage it is equal to zero

bullThe length of data on a page Asa page may or may not be filled tothe full extent the length of actualdata is indicated in the header

Maximum BLOB size

As the internal structure for storingBLOB data can have only 3 levelsof organization and the size ofdata page is also limited it is pos-sible to calculate the maximum sizeof a BLOB

However this is a theoretical limit(if you want you can calculate it)but in practice the limit will bemuch lower The reason for thislower limit is that the length ofBLOB-field data is determined by avariable of ULONG type ie itsmaximal size will be equal to 4gigabytes

Moreover in reality this practicallimit is reduced if a UDF is to beused for BLOB processing Aninternal UDF implementationassumes that the maximum BLOB

size will be 2 gigabytes So if youplan to have very large BLOBfields in your database you shouldexperiment with storing data of alarge size beforehand

The segment size mystery

Developers of database applica-tions often ask what the SegmentSize parameter in the definition ofa BLOB is why we need it andwhether or not we should set itwhen creating Blob-fields

In reality there is no need to setthis parameter Actually it is a bitof a relic used by the GPRE utilitywhen pre-processing EmbeddedSQL When working with BLOBsGPRE declares a buffer of speci-fied size based on the segmentsize Setting the segment size hasno influence over the allocationand the size of segments whenstoring the BLOB on disk It alsohas no influence on performanceTherefore the segment size can besafely set to any value but it is setto 80 bytes by default

Information for those who want toknow everything the number 80was chosen because 80 symbolscould be allocated in alphanumer-ic terminals

TestBed

13

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

In issue 1 we published DmitriYemanovrsquos article about the inter-nals of savepoints While that articlewas still on the desk Borlandannounced the release of InterBase751 introducing amongst otherthings a NO SAVEPOINT optionfor transaction management Is thisan important improvement for Inter-Base We decided to give thisimplementation a close look and testit some to discover what it is allabout

Testing NO SAVEPOINT

In order to analyze the problem thatthe new transaction option wasintended to address and to assessits real value we performed severalvery simple SQL tests The tests areall 100 reproducible so you willbe able to verify our results easily

Database for testing

The test database file was created inInterBase 751 page size = 4096character encoding is NONE Itcontains two tables one stored pro-cedure and three generators

For the test we will use only one

Testing the NO SAVEPOINTfeature in InterBase 751 Author Alexey Kovyazin

akib-aidcom

Author Vlad Horsunhvladuserssourceforgenet

table with the following structure

CREATE TABLE TEST (ID NUMERIC(182)NAME VARCHAR(120)DESCRIPTION VARCHAR(250)CNT INTEGERQRT DOUBLE PRECISIONTS_CHANGE TIMESTAMPTS_CREATE TIMESTAMPNOTES BLOB

)This table contains 100000 records which will be updated during the testThe stored procedure and generators are used to fill the table with test dataYou can increase the quantity of records in the test table by calling thestored procedure to insert them

SELECT FROM INSERTRECS(1000)

The second table TEST2DROP has the same structure as the first and is filledwith the same records as TESTINSERT INTO TEST2DROP SELECT FROM TESTAs you will see the second table will be dropped immediately after connectWe are just using it as a way to increase database size cheaply the pagesoccupied by the TEST2DROP table will be released for reuse after we dropthe table With this trick we avoid the impact of database file growth on thetest results

Setting test environment

All that is needed to perform this test is the trial installation package of Inter-Base 751 the test database and an SQL script

Download the InterBase 751 trialversion from wwwborlandcomThe installation process is obviousand well-described in the InterBasedocumentation

You can download a backup of thetest database ready to use fromhttpwwwibdevelopercomissue2testspbackupzip (~4 Mb) oralternatively an SQL script for cre-ating it fromhttpwwwibdevelopercomissue2testspdatabasescriptzip (~1Kb)

If you download the databasebackup the test tables are alreadypopulated with records and youcan proceed straight to the sectionldquoPreparing to testrdquo below

If you choose instead to use theSQL script you will create data-base yourself Make sure youinsert 100000 records into tableTEST using the INSERTRECS storedprocedure and then copy all ofthem to TEST2DROP three or fourtimes

After that perform a backup of thisdatabase and you will be on thesame position as if you had down-

TestBed

14

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

loaded ldquoready for userdquo backup

Hardware is not a material issue for these tests since we are only comparing performance with and without the NOSAVEPOINT option Our test platform was a modest computer with Pentium-4 2 GHz with 512 RAM and an 80GBSamsung HDD

Preparing to test

A separate copy of the test database is used for each test case in order to eliminate any interference between state-ments We create four fresh copies of the database for this purpose Supposing all files are in a directory calledCTEST simply create the four test databases from your test backup file

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp1ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp2ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp3ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp4ib

SQL test scripts

The first script tests a regular one-pass update without the NO SAVEPOINT option

For convenience the important commands are clarified with comments

connect CTESTtestsp1ib USER SYSDBA Password masterkey Connect drop table TEST2DROP Drop table TEST2DROP to free database pagescommitselect count() from test Walk down all records in TEST to place them into cachecommitset time on enable time statistics for performed statementsset stat on enables writesfetchesmemory statisticscommit perform bulk update of all records in TEST table update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommit

quit The second script tests performance for the same UPDATE with the NO SAVEPOINT option

TestBed

15

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

connect Ctestsp2ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitSET TRANSACTION NO SAVEPOINT enable NO SAVEPOINTupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit

Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script both scripts arethe same simply testing the behavior of engine in case of the single bulk UPDATE

To test sequential UPDATEs we added several UPDATE statements--we recommend using five The script for testing without NO SAVEPOINT would be

connect Etestsp3ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit

Fyracle 089

Janus has released anew version of Oracle-mode Firebird FyracleFyracle is a specialized

build of Firebird 15it adds temporary

tables hierarchicalqueries and

a PLSQL engine

Version 089 addssupport for storedprocedures written

in Java

Fyracle dramaticallyreduces the cost of port-ing Oracle-based appli-

cations to Firebird

Common usageincludes the

Compiere open sourceERP package

mid-market deploymentsof Developer

2000 applicationsand demo CDsof applications

without license trouble

Read more at

wwwjanus-soft-warecom

News amp Events

IBAnalyst 19

IBSurgeon has issued newversion of IBAnalyst

Now it can better analyzeInterBase or Firebird data-base statistics with usingmetadata information (in thiscase connection to databaseis required)

IBAnalyst is a tool that assistsa user to analyze in detailFirebird or InterBase data-base statistics and identifypossible problems with data-base performance mainte-nance and how an applica-tion interacts with the data-base

It graphically displays data-base statistics and can thenautomatically make intelli-gent suggestions aboutimproving database per-formance and databasemaintenance

wwwibsurgeoncomnewshtml

News amp Events

You can download all the scripts and the raw results of their execution from this locationhttpwwwibdevelopercomissue2testresultszip

TestBed

16

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

How to perform the test

The easiest way to perform the test is to use isqls INPUT command

Suppose you have the scripts located in ctestscripts

gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql

Test resultsThe single bulk UPDATE

First letrsquos perform the test where the single-pass bulk UPDATE is performed

This is an excerpt from the one-pass script with default transaction settings

SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980

SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3

Fast Report 319

The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains

Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors

TrialBuy Now

News amp Events

TECT Softwarepresents

New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet

SPGen Stored Procedure Gen-erator for Firebird and Inter-

Base creates a standard set ofstored procs for any table full

details can be found here

httpwwwtectsoftnetProductsFirebird

FIBSPGenaspx

And FBMail Firebird EMail(FBMail) is a cross platform PHP

utility which easily allows thesending of emails direct from

within a database

This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected

to the internet

Full detailscan be found here

httpwwwtectsoftnetProductsFirebird

FBMailaspx

News amp Events

This is an excerpt from the one-pass script with NO SAVEPOINT enabled

CopyTiger 100

CopyTigera new productfrom Microtecis an advanced

databasereplication amp

synchronisationtool for

InterBaseFirebird

powered bytheir CopyCatcomponent set(see the articleabout CopyCat

in this issue)

For moreinformation about

CopyTiger seewwwmicrotecfr

copycatct

Downloadan evaluation

version

TestBed

17

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154

SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3

Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not

Sequential bulk UPDATE statements

With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large

Table 1

Test results for 5 sequentalUPDATEs

News amp Events

SQLHammer 15Enterprise Edition

The interestingdeveloper toolSQLHammer

now hasan Enterprise Edition

SQLHammer isa high-performance tool

for rapiddatabase development

and administration

It includes a commonintegrated development

environmentregistered custom

componentsbased on the

Borland Package Librarymechanism

and an Open Tools APIwritten in

DelphiObject Pascal

wwwmetadataforgecom

News amp Events

TestBed

18

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

For convenience the results are tabulated above

The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT

Figure 1

Time to perform UPDATEs with and without NO SAVEPOINT

Table 1

Test results for 5 sequental UPDATEs

In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used

The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters

Figure 2

Memory usage while performing UPDATEs with and without NO SAVEPOINT

With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below

Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains

A few words about versions

You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The

TestBed

19

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

old version does not disappearimmediately but is retained as abackversion

In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk

The UNDO log concept

You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested

A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point

So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources

The NO SAVEPOINToption

The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)

Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes

The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management

First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see

ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)

Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions

Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes

Initial situation

Consider the implementation details of the undo-log Figure 3 shows the initial situation

Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record

Figure 3

Initial situation before any UPDATE - only the one record version exists Undo log is empty

The first UPDATE

The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager

It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk

TestBed

20

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 4

UPDATE1 create small delta version on disk and put record number into UNDO log

This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled

The second UPDATE

When the second UPDATE statement is performed on the same set of records we have adifferent situation

Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used

To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory

The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required

Figure 5

The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log

This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters

When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first

The third UPDATE

The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further

TestBed

21

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 6

The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one

Figure 7

If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint

Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)

Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log

A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know

The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE

Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit

BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log

For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7

In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope

SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping

Development area

22

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article

The sources in the bibliography arelisted in the order of their appear-ance in my mind

The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability

of all the methods taken from thearticles and of course I have testedall my own methods

Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise

I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc

And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle

The problem state-ment

What is the problem

Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common

1Everyone got used to them andfelt comfortable with them

2They are quite fast (if you usethem according to certain knownstandards)

3They use SQL which is an easycomprehensible and time-proveddata manipulation method

4They are based upon a strongmathematical theory

5They are convenient for applica-

tion development - if you developyour applications just like 20-30years ago

As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and

Author Vladimir Kotlyarevskyvladcontekru

Development area

23

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip

As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored

What do we need

The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing

RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task

The OID

All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it

First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have

any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database

OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex

Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)

Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world

And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones

Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles

The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers

ClassId

All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is

Development area

24

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain

Name Description cre-ation_date change_dateowner

Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect

links is often a very complicatedtask to put it mildly

Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later

The OBJECTS Table

Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure

So let us design this table

Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)

The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)

Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method

Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage

There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student

create domain TOID as integer not nullcreate table OBJECTS(

OID TOID primary keyClassId TOIDName varchar(32)

A simple plain dictionary can be retrieved from the OBJECTStable by the following query

select OID Name Description from OBJECTS where ClassId = LookupId

Development area

25

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description

from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)

Later I will demonstrate how to add alittle more intelligence to such simpledictionaries

Storing of more complexobjects

It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping

Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders

create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))

which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to

ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query

select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id

As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did

If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M

create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))

One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present

Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database

Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this

create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))

connected 1M with OBJECTS by OID There is also a table

create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))

which describes type metadata ndash an attribute set (their names and types) for eachobject type

All attributes for particular object where the OID is known are retrieved by the query

select attribute_id value from attributeswhere OID = oid

or with names of attributes

select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id

Development area

26

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries

select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id

Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query

Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the

database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base

The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1

Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage

Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB

It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-

utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation

To be continuedhellip

FastReport 3 - new generation of the reporting tools

Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix

reports support support most popular DB-engines

Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags

(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)

Access to DB fields styles text flow URLs Anchors

httpwwwfast-reportcom

Development area

27

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

Replicating and synchronizingInterBaseFirebirddatabases using CopyCat

AuthorJonathan Neve

jonathanmicrotecfr

wwwmicrotecfr

PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication

Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases

Data logging

Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)

Multi-node replication

Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after

In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it

replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)

When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it

Two-way replication

One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed

The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change

Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself

Primary key synchronization

One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-

Development area

28

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values

Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server

When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique

Conflict management

Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-

sion of the record this is a conflict

CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved

This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently

Difficult database structures

There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node

Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged

To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an

Development area

29

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable

PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms

1 As a set of Delphi C++ Builder components

2 As a standalone Win32 replication tool

We will now present each one of these products

1 CopyCat Delphi C++ Builder Component Set

CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions

Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server

Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table

Below is a concise guide for getting started with the CopyCat component suite

Download and install the evaluation components fromhttpwwwmicrotecfrcopycat

Prepare databases

1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components

2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)

3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo

4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base

5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo

Development area

30

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data

6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority

7) Apply all the generated SQL to all databases that should be replicated

8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)

Replicate

1 In Delphi open the ldquoReplicatorrdquo example project

2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property

3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases

4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)

5 Compile and run the example

6 Press the ldquoReplicate nowrdquo button

2 CopyTiger the CopyCat Win32standalone replication tool

For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER

Features include

bull Easy to use installer

bull Independent server Administrator tool (CTAdmin)

bull Configuration wizard for setting up links to master slave databases

bull Robust replication engine based on Microtec CopyCat

bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections

bull Simple amp intuitive control panel

bullAutomatic email notification on certain events (conflicts PK violations etc)

Visit the CopyTiger homepage to download a time-limited trial version

httpwwwmicrotecfrcopycatct

SUMMARY

In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit

1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications

2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure

CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool

You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr

As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold

Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code

ibd-852-150

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 2: The InterBase and Firebird Developer Magazine, Issue 2, 2005

column ldquoOldest Activerdquo by HelenBorrie No need to say more aboutauthor of ldquoThe Firebird Bookrdquo justread it In this issue Helen looks atthe phenomenon of support listsand their value to the Firebird com-munity She takes a swing at someof the unproductive things that listposters do in this topic titled ldquoOnSilly Questions and Idiotic Out-comesrdquo

Letrsquos blog again

Now it is a good time to say a fewwords about our new web-presen-tation Weve started a blog-styleinterface for our magazine ndash take alook on wwwibdevelopercom ifyou havent already discovered itNow you can make comments onany article or message In futurewersquoll publish all the materials relat-ed to the PDF issues along with spe-cial bonus articles and materialsYou can look out for previews ofarticles drafts and behind-the-scene community discussionsPlease feel welcome to blog with us

Donations

The magazine is free for readers

Editorrsquos note

3

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Rock around the block

Rock around the blogEditorrsquos note

By Alexey KovyazinDear readers

I am glad to introduce you thesecond issue of ldquoThe Inter-Base and Firebird DeveloperMagazinerdquo We received a lotof emails with kind words andcongratulations which havehelped us in creating thisissue Thank you very much

In this issue

The cover story is the newestEpisode from Ann W HarrisonldquoLocking Firebird and the LockTablerdquo The article covers the lock-ing issue from general principles tospecific advice so everyone willfind a bit thats interesting or inform-ative

We continue the topic of savepointsinternals with an article ldquoTestingNO SAVEPOINT in InterBase751rdquo Along with interesting prac-tical test results you will find adescription of UNDO log workingsin InterBase 751

We publish the small chapterldquoInside BLOBsrdquo from the forthcom-ing book ldquo1000 InterBaseampFirebird

TipsampTricksrdquo by Dmitri Kouzmenkoand Alexey Kovyazin

Object oriented development hasbeen a high-interest topic for manyyears and it is a still hot topic Thearticle ldquoObject-Oriented Develop-ment in RDBMSrdquo explores the prac-tical use of OOD principles in thedesign of InterBase or Firebirddatabases

Replication is a problem whichfaces every database developersooner or later The article ldquoRepli-cating and synchronizing Inter-BaseFirebird databases usingCopyCatrdquo introduces the approachused in the new CopyCat compo-nent set and the CopyTiger tool

The last article by Dmitri Kouz-menko ldquoUnderstanding Your Data-base with IBAnalystrdquo provides aguide for better understanding ofhow databases work and how tun-ing their parameters can optimizeperformance and avoid bottle-necks

ldquoOldest Activerdquo column

I am very glad to introduce the new

but it is a considerable expense forits publisher We must pay for arti-cles editing and proofreadingdesign hosting etc In future wersquolltry to recover costs solely fromadvertising but for these initialissues we need your support

See details here

httpibdevelopercomdonations

On the Radar

Irsquod like to introduce the severalprojects which will be launched inthe near future We really needyour feedback so please do nothesitate to place your comments inblog or send email to us(readersibdevelopercom)

Magazine CD

We will issue a CD with all 3 issuesof ldquoThe InterBase and FirebirdDeveloper Magazinerdquo in Decem-ber 2005 (yes a Christmas CD)

Along with all issues in PDF andsearchable html-format you will findexclusive articles and bonus materi-als The CD will also include freeversions of popular software relat-

Editorrsquos note

4

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Rock around the blockFirebird conference

ed to InterBase and Firebird andoffer very attractive discounts fordozens of the most popular prod-ucts

The price for CD is USD$999 plusshipping

For details seewwwibdevelopercommagazine-cd

Paper versionof the magazine

In 2006 we intend to issue the firstpaper version of ldquoThe InterBaseand Firebird Developer Maga-zinerdquo The paper version cannot befree because of high productioncosts

We intend to publish 4 issues peryear with a subscription price ofaround USD$49 The issue volumewill be approximately 64 pages

In the paper version we plan toinclude the best articles from theonline issues and of course exclu-sive articles and materials

To be or not be ndash this is the questionthat only you can answer If theidea of subscribing to the paperversion appeals to you pleaseplace a comment in our blog(httpibdevelopercompaper-version) or send email toreadersibdevelopercom withyour pro and cons We look for-

This issue of our magazine almostcoincides with the third annualFirebird World Conference start-ing November 13 in PragueCzech Republic This years con-ference spans three nights with atight programme of back-to-backand parallel presentations overtwo days With speakers in atten-dance from all across Europe andthe Americas presenting topicsthat range from specialised appli-cation development techniques tofirst-hand accounts of Firebirdinternals from the core develop-ers this one promises to be thebest ever

Registration has been well underway for some weeks nowalthough the time is now past forearly bird discounts on registra-tion fees At the time of publish-ing bookings were still beingtaken for accommodation in theconference venue itself HotelOlsanka in downtown PragueThe hotel offers a variety of roomconfigurations including bed andbreakfast if wanted at modesttariffs Registration includes

FirebirdConference

lunches on both conference daysas well as mid-moring and mid-afternoon refreshments

Presentations include one on Fire-birds future development fromthe Firebird Project CoordinatorDmitry Yemanov and anotherfrom Alex Peshkov the architectof the security changes coming inFirebird 2 Jim Starkey will bethere talking about Firebird Vul-can and members of the SASInstitute team will talk aboutaspects of SASs taking on Fire-bird Vulcan as the back-end to itsillustrious statistical softwareMost of the interface develop-ment tools are on the menuincluding Oracle-mode FirebirdPHP Delphi Python and Java(Jaybird)

It is a dont miss for any Firebirddeveloper who can make it toPrague Link to details and pro-gramme either athttpfirebirdsourceforgenetindexphpop=konferenz or at theIBPhoenix websitehttpwwwibphoenixcom

ward for your feedback

Invitation

As a footnote Irsquod like to invite

all people who are in touch withFirebird and InterBase to partici-pate in the life of the communityThe importance of a large activecommunity to any public project --and a DBMS with hundreds of thou-sands of users is certainly public --cannot be emphasised enough It isthe key to survival for such projectsRead our magazine ask your ques-tions on forums and leave com-ments in blog just rock and roll withus

Sincerely

Alexey Kovyazin

Chief Editor

editoribdevelopercom

comingsoon

Oldest Active

5

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

On Silly Questionsand Idiotic Outcomes

Firebird owes its existence to the communitys support lists Though itsall ancient history now if it hadnt been for the esprit de corps among theregulars of the old InterBase support list hosted by merscom in thenineties Borlands shock-horror decision to kill off InterBase develop-ment at the end of 1999 would have been the end of it for the English-speaking users at least

Firebirds support lists on Yahoo and Sourceforge were born out of thesame cataclysm of fatal fire and regeneration that caused the phoenixdrawn from the fire against extreme adversity to take flight as Firebirdin July 2000 Existing and new users both of Firebird and of the richselection of drivers and tools that spun off from and for it depend heav-ily on the lists to get up to speed with the evolution of all of these soft-ware products

Newbies often observe that there is just nothing like our lists for anyother software be it open source or not One of the things that I thinkmakes the Firebird lists stand out from the crowd is that once gripped bythe power of the software our users never leave the lists Instead theystick around learn hard and stay ready and willing to impart to otherswhat they have learnt After nearly six years of this across a dozen listsour concentrated mix of expertise and fidelity is hard to beat

Now doesnt all this make us all feel warm and fuzzy For those of usin the hot core of this voluntary support infrastructure the answer has tobe unfortunately Not all the time There are some bugbears that canmake the most patient person see red Im using this column to drawattention to some of the worst

First and worst is silly questions You have all seen them Perhapsyou even posted them Subject Help Body Every time I try toconnect to Firebird I get the message Connection refused Whatam I doing wrong What follows from that is a frustrating andtedious game for responders What version of Firebird are youusing Which server model Which platform Which toolWhat connection path (By connection path we mean)

Everyones time is valuable Nobody has the luxury of beingable to sit around all day playing this game If we werent will-ing to help we wouldnt be there But you make it hard or even

impossible when you post prob-lems with no descriptions Youwaste our time You waste yourtime We get frustrated becauseyou dont provide the facts youget frustrated because we cantread your mind And the list getsfilled with noise

If theres something Ive learnt inmore than 12 years of on-linesoftware support it is that a prob-lem well described is a problemsolved If a person applies timeand thought to presenting a gooddescription of the problemchances are the solution will hopup and punch him on the noseEven if it doesnt good descrip-tions produce the fastest rightanswers from list contributorsFurthermore those good descrip-tions and their solutions form apowerful resource in the archivesfor others coming along behind

Off-topic questions are anothersource of noise and irritation in thelists At the Firebird website wehave bent over backwards tomake it very clear which lists areappropriate for which areas ofinterest An off-topic posting iseasily forgiven if the poster polite-ly complies with a request from alist regular to move the question tox list When these events insteadbecome flame threads or whenthe same people persistently reof-fend they are an extreme burdenon everyone

In the highly tedious categorywe have the kind of question thatgoes like this Subject Bug inFirebird SQL Body This state-ment works inMSSQLAccessMySQLinsertany non-standard DBMS namehere It doesnt work in FirebirdWhere should I report this bugTo be fair George Bernard Shawwasnt totally right when he saidIgnorance is a sin However

you do at least owe it to yourselfto know what you are talkingabout and not to present yourproblem as an attack on our soft-ware Whether intended or not itcomes over as a troll and youcome across as a fool Theres astrong chance that your attitudewill discourage anyone frombothering with you at all Its no-win sure but its also a reflectionof human nature You reap whatyou sow

In closing this little tirade I justwant to say that I hope Ive strucka chord with those of our reader-ship who struggle to get satisfac-tion from their list postings It willbe a rare question indeed that hasno answer Those rarities well-presented become excellent bugreports If youre not getting ananswer theres a high chance thatyou asked a silly question

Helen E M Borrie

helebortpgcomau

On Silly Questions and Idiotic Outcomes

Cover story

6

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

Firebird WorldwideConference 2005

The Firebird conferencewill take place

at the Hotel Olsankain Prague

Czech Republicfrom the evening

of Sunday the 13thof November

(opening session)until the evening

of Tuesday the 15thof November

(closing session)

News amp Events

One of the best-known facts aboutFirebird is that it uses multi-versionconcurrency control instead ofrecord locking For databases thatmanage concurrency thoughrecord locks understanding mecha-nisms of record locking is critical todesigning a high performance highconcurrency application In gener-al record locking is irrelevant toFirebird However Firebird useslocks to maintain internal consisten-cy and for interprocess coordina-tion Understanding how Firebirddoes (and does not) use locks helpswith system tuning and anticipatinghow Firebird will behave underload

Irsquom going to start by describinglocking abstractly then Firebirdlocking more specifically then thecontrols you have over the Firebirdlock table and how they can affectyour database performance So ifyou just want to learn about tuningskip to the end of the article

Concurrency control

Concurrency control in databasesystems has three basic functionspreventing concurrent transactionsfrom overwriting each othersrsquochanges preventing readers fromseeing uncommitted changes andgiving a consistent view of the data-

Locking Firebird and the Lock Tablebase to running transactions Thosethree functions can be implementedin various ways The simplest is toserialize transactions ndash allowingeach transaction exclusive access tothe database until it finishes Thatsolution is neither interesting nordesirable

A common solution is to allow eachtransaction to lock the data it useskeeping other transactions fromreading data it changes and fromchanging data it reads Moderndatabases generally lock recordsFirebird provides concurrency con-trol without record locks by keepingmultiple versions of records eachmarked with the identifier of thetransaction that created it Concur-rent transactions donrsquot overwriteeach otherrsquos changes because thesystem will not allow a transactionto change a record if the mostrecent version was created by aconcurrent transaction Readersnever see uncommitted databecause the system will not returnrecord versions created by concur-rent transactions Readers see aconsistent version of the databasebecause the system returns only thedata committed when they start ndashallowing other transactions to cre-ate newer versions Readers donrsquotblock writers

A lock sounds like a very solidobject but in database systems alock anything but solid ldquoLockingrdquo isa shorthand description of a proto-col for reserving resources Data-bases use locks to maintain consis-tency while allowing concurrentindependent transactions to updatedistinct parts of the database Eachtransaction reserves the resources ndashtables records data pages ndash that itneeds to do its work Typically thereservation is made in memory tocontrol a resource on disk so thecost of reserving the resource is notsignificant compared with the costof reading or changing theresource

Locking example

ldquoResourcerdquo is a very abstract termLets start by talking about lockingtables Firebird does lock tables butnormally it locks them only to pre-vent catastrophes like having onetransaction drop a table that anoth-er transaction is using

Before a Firebird transaction canaccess a table it must get a lock onthe table The lock prevents othertransactions from dropping thetable while it is in use When atransaction gets a lock on a tableFirebird makes an entry in its tableof locks indicating the identity of

the transaction that has the lock theidentity of the table being lockedand a function to call if anothertransaction wants an incompatiblelock on the table The normal tablelock is shared ndash other transactionscan lock the same table in the sameway

Before a transaction can delete atable it must get an exclusive lockon the table An exclusive lock isincompatible with other locks If anytransaction has a lock on the tablethe request for an exclusive lock isdenied the drop table statementfails Returning an immediate erroris one way of dealing with conflict-ing lock requests The other is to putthe conflicting request on a list ofunfulfilled requests and let it wait forthe resource to become available

In its simplest form that is how lock-ing works All transactions followthe formal protocol of requesting alock on a resource before using itFirebird maintains a list of lockedresources a list of requests for lockson resources ndash satisfied or waitingndash and a list of the owners of lockrequests When a transactionrequests a lock that is incompatiblewith existing locks on a resource

Author Ann W Harrisonaharrisonibphoenixcom

Requestfor Sponsors

How to Register

ConferenceTimetable

ConferencePapers

and Speaker

Cover story

7

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

Firebird either denies the newrequest or puts it on a list to waituntil the resource is available Inter-nal lock requests specify whetherthey wait or receive an immediateerror on a case-by-case basisWhen a transaction starts it speci-fies whether it will wait for locks thatit acquires on tables etc

Lock modes

For concurrency and read commit-ted transactions Firebird lockstables for shared read or sharedwrite Either mode says ldquoIrsquom usingthis table but you are free to use ittoordquo Consistency mode transac-tions follow different rules Theylock tables for protected read orprotected write Those modes sayldquoIrsquom using the table and no one elseis allowed to change it until Irsquomdonerdquo Protected read is compati-ble with shared read and other pro-tected read transactions Protectedwrite is only compatible with shareread

The important concept about lockmodes is that locks are more subtlethan mutexes ndash locks allowresource sharing as well as protect-ing resources from incompatibleuse

Two-phase locking vs tran-sient locking

The table locks that wersquove been

describing follow a protocol knownas two-phase locking which is typi-cal of locks taken by transactions indatabase systems Databases thatuse record locking for consistencycontrol always use two-phaserecord locks In two-phase lockinga transaction acquires locks as itproceeds and holds the locks until itends Once it releases any lock itcan no longer acquire another Thetwo phases are lock acquisition andlock release They cannot overlap

When a Firebird transaction readsa table it holds a lock on that tableuntil it ends When a concurrencytransaction has acquired a sharedwrite lock to update a table no con-sistency mode transaction will beable to get a protected lock on thattable until the transaction with theshared write lock ends and releasesits locks Table locking in Firebird istwo-phase locking

Locks can also be transient takenand released as necessary duringthe running of a transaction Fire-bird uses transient locking exten-sively to manage physical access tothe database

Firebird page locks

One major difference between Fire-bird and most other databases isFirebirdrsquos Classic mode In Classicmode many separate processesshare write access to a single data-

base file Most databases have asingle server process like Super-Server that has exclusive access tothe database and coordinatesphysical access to the file within

itself Firebird coordinates physicalaccess to the database throughlocks on database pages

In general database theory a trans-

Cover story

8

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

action is a set of steps that transformthe database from on consistentstate to another During that trans-formation the resources held by thetransaction must be protected fromincompatible changes by othertransactions Two-phase locks arethat protection

In Firebird internally each time atransaction changes a page itchanges that page ndash and the physi-cal structure of the database as awhole ndash from one consistent state toanother Before a transaction readsor writes a database page it locksthe page When it finishes readingor writing it can release the lockwithout compromising the physicalconsistency of the database fileFirebird page level locking is tran-sient Transactions acquire andrelease page locks throughout theirexistence However to preventdeadlocks transactions must beable to release all the page locks itholds before acquiring a lock on anew page

The Firebird lock table

When all access to a database isdone in a single process ndash as is thecase with most database systems ndashlocks are held in the serverrsquos memo-ry and the lock table is largely invis-ible The server process extends orremaps the lock information asrequired Firebird however man-ages its locks in a shared memory

section In SuperServer only theserver uses that share memoryarea In Classic every databaseconnection maps the shared memo-ry and every connection can readand change the contents of thememory

The lock table is a separate piece ofshare memory In SuperServer thelock table is mapped into the serverprocess In Classic each processmaps the lock table All databaseson a server computer share thesame lock table except those run-ning with the embedded server

The Firebird lock manager

We often talk about the FirebirdLock Manager as if it were a sepa-rate process but it isnrsquot The lockmanagement code is part of theengine just like the optimizer pars-er and expression evaluator Thereis a formal interface to the lockmanagement code which is similarto the formal interface to the distrib-uted lock manager that was part ofVAXVMS and one of the inter-faces to the Distributed Lock Man-ager from IBM

The lock manager is code in theengine In Classic each process hasits own lock manager When aClassic process requests or releasesa lock its lock management codeacquires a mutex on the sharedmemory section and changes the

state of the lock table to reflect itsrequest

Conflicting lock requests

When a request is made for a lock

on a resource that is already lockedin an incompatible mode one oftwo things happens Either therequesting transaction gets animmediate error or the request is

Registering for the Conference

Call for papers

Sponsoring the Firebird Conference

httpfirebird-conferencecom

Cover story

9

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

put on a list of waiting requests andthe transactions that hold conflictinglocks on the resource are notified ofthe conflicting request Part of everylock request is the address of a rou-tine to call when the lock interfereswith another request for a lock onthe same object Depending on theresource the routine may cause thelock to be released or require thenew request to wait

Transient locks like the locks ondatabase pages are releasedimmediately When a transactionrequests a page lock and that pageis already locked in an incompati-ble mode the transaction or trans-actions that hold the lock are noti-fied and must complete what theyare doing and release their locksimmediately Two-phase locks liketable locks are held until the trans-action that owns the lock completesWhen the conflicting lock isreleased and the new lock is grant-ed then transaction that had beenwaiting can proceed

Locks as interprocess com-munication

Lock management requires a highspeed completely reliable commu-nication mechanism between trans-actions including transactions indifferent processes The actualmechanism varies from platform toplatform but for the database to

work the mechanism must be fastand reliable A fast reliable inter-process communication mechanismcan be ndash and is ndash useful for a num-ber of purposes outside the areathatrsquos normally considered data-base locking

For example Firebird uses the locktable to notify running transactionsof the existence of a new index on atable Thatrsquos important since assoon as an index becomes activeevery transaction must help main-tain it ndash making new entries when itstores or modifies data removingentries when it modifies or deletesdata

When a transaction first referencesa table it gets a lock on the exis-tence of indexes for the tableWhen another transaction wants tocreate a new index on that table itmust get an exclusive lock on theexistence of indexes for the tableIts request conflicts with existinglocks and the owners of those locksare notified of the conflict Whenthose transactions are in a statewhere they can accept a new indexthey release their locks and imme-diately request new shared locks onthe existence of indexes for thetable The transaction that wants tocreate the index gets its exclusivelock creates the index and com-mits releasing its exclusive lock onthe existence of indexing As othertransactions get their new locks

they check the index definitions forthe table find the new index defini-tion and begin maintaining theindex

Firebird locking summary

Although Firebird does not lockrecords it uses locks extensively toisolate the effects of concurrenttransactions Locking and the locktable are more visible in Firebirdthan in other databases because thelock table is a central communica-tion channel between the separateprocesses that access the databasein Classic mode In addition to con-trolling access to database objectslike tables and data pages the Fire-bird lock manager allows differenttransactions and processes to notifyeach other of changes to the state ofthe database new indexes etc

Lock table specifics

The Firebird lock table is an in-mem-ory data area that contains of fourprimary types of blocks The lockheader block describes the locktable as a whole and containspointers to lists of other blocks andfree blocks Owner blocks describethe owners of lock requests ndash gen-erally lock owners are transactionsconnections or the SuperServerRequest blocks describe the rela-tionship between an owner and alockable resource ndash whether therequest is granted or pending the

mode of the request etc Lockblocks describe the resources beinglocked

To request a lock the owner findsthe lock block follows the linked listof requests for that lock and addsits request at the end If other own-ers must be notified of a conflictingrequest they are located throughthe request blocks already in the listEach owner block also has a list ofits own requests The performancecritical part of locking is finding lockblocks For that purpose the locktable includes a hash table foraccess to lock blocks based on thename of the resource being locked

A quick refresher on hashing

A hash table is an array with linkedlists of duplicates and collisionshanging from it The names of lock-able objects are transformed by afunction called the hash functioninto the offset of one of the elementsof the array When two namestransform to the same offset theresult is a collision When two lockshave the same name they areduplicates and always collide

In the Firebird lock table the arrayof the hash table contains theaddress of a hash block Hashblocks contain the original name acollision pointer a duplicate point-er and the address of the lock blockthat corresponds to the name Thecollision pointer contains the

Cover story

10

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

address of a hash block whosename hashed to the same valueThe duplicate pointer contains theaddress of a hash block that hasexactly the same name

A hash table is fast when there arerelatively few collisions With nocollisions finding a lock blockinvolved hashing the name index-ing into the array and reading thepointer from the first hash blockEach collision adds another pointerto follow and name to check Theratio of the size of the array to thenumber of locks determines thenumber of collisions Unfortunatelythe width of the array cannot beadjusted dynamically because thesize of the array is part of the hashfunction Changing the widthchanges the result of the functionand invalidates all existing entries inthe hash table

Adjusting the lock table toimprove performance

The size of the hash table is set inthe Firebird configuration file Youmust shut down all activity on alldatabases that share the hash tablendash normally all databases on themachine ndash before changes takeeffect The Classic architecture usesthe lock table more heavily thanSuperServer If you choose theClassic architecture you shouldcheck the load on the hash tableperiodically and increase the num-

ber of hash slots if the load is highThe symptom of an overloadedhash table is sluggish performanceunder load

The tool for checking the lock tableis fb_lock_print which is a com-mand line utility in the bin directoryof the Firebird installation tree Thefull lock print describes the entirestate of the lock table and is of limit-ed interest When your system isunder load and behaving badlyinvoke the utility with no options orswitches directing the output to afile Open the file with an editorYoull see output that starts some-thing like this

LOCK_HEADER BLOCK Version114 Active owner0 Length262144 Used85740 Semmask0x0 Flags 0x0001 Enqs 18512 Converts 490 Rejects0 Blocks 0 Deadlock scans0 Deadlocks0 Scan interval10 Acquires 21048 Acquire blocks0 Spin count0 Mutex wait103 Hash slots101 Hash lengths (minavgmax)3 15 30

hellipThe seventh and eighth lines suggestthat the hash table is too small andthat it is affecting system perform-ance In the example these valuesindicate a problem

Mutex wait 103 Hash slots 101 Hash lengths (minavgmax)3 15 30 In the Classic architecture eachprocess makes its own changes tothe lock table Only one process is

allowed to update the lock table atany instant When updating the locktable a process holds the tablersquosmutex A non-zero mutex wait indi-cates that processes are blocked bythe mutex and forced to wait foraccess to the lock table In turn thatindicates a performance probleminside the lock table typicallybecause looking up a lock is slow

If the hash lengths are more thanmin 5 avg 10 or max 30 youneed to increase the number ofhash slots The hash function used inFirebird is quick but not terribly effi-cient It works best if the number ofhash slots is prime

Change this line in the configurationfileLockHashSlots = 101

Uncomment the line by removing

the leading Choose a value thatis a prime number less than 2048

LockHashSlots = 499

The change will not take effect untilall connections to all databases onthe server machine shut down

If you increase the number of hashslots you should also increase thelock table size The second line ofthe lock print

Version114 Active owner 0Length 262144 Used 85740

tells you how close you are to run-ning out of space in the lock tableThe Version and Active owner areuninteresting The length is the max-imum size of the lock table Used isthe amount of space currently allo-cated for the various block typesand hash table If the amount usedis anywhere near the total lengthuncomment this parameter in theconfiguration file by removing theleading and increase the value

LockMemSize = 262144

to this

LockMemSize = 1048576

The value is bytes The default locktable is about a quarter of amegabyte which is insignificant onmodern computers Changing thelock table size will not take effectuntil all connections to all databaseson the server machine shut down

PHP Server

One of the more interest-ing recent developmentsin information technolo-gy has been the rise ofbrowser based applica-tions often referred toby the acronym LAMP

One key hurdle forbroad use of the LAMPtechnology for mid-mar-ket solutions was that itwas never easy to con-figure and manage

PHPServer changes thatit can be installed withjust four clicks of themouse and support forFirebird is compiled in

PHPServer shares all thequalities of Firebird it isa capable compacteasy to install and easyto manage solution

PHPServer is a freedownload

Read more at

wwwfyracleorgphpserverhtml

News amp Events

Server internals

11

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Inside BLOBs

How the serverworks with BLOBs

The BLOB data type is intended forstoring data of variable size Fieldsof BLOB type allow for storage ofdata that cannot be placed in fieldsof other types - for example pic-tures audio files video fragmentsetc

From the point view of the databaseapplication developer using BLOBfields is as transparent as it is forother field types (see chapter ldquoDatatypesrdquo for details) However thereis a significant difference betweenthe internal implementation mecha-nism for BLOBs and that for otherdata

Unlike the mechanism used for han-dling other types of fields the data-base engine uses a special mecha-nism to work with BLOB fields Thismechanism is transparently inte-grated with other record handlingat the application level and at thesame time has its own means ofpage organization Lets consider indetail how the BLOB-handlingmechanism works

Inside BLOBsThis is an excerpt from the book ldquo1000 InterBase amp Firebird Tips amp Tricksrdquoby Alexey Kovyazin and Dmitri Kouzmenko which will be published in 2006

Initially the basic record data onthe data page includes a referenceto a ldquoBLOB recordrdquo for each non-null BLOB field ie to record-likestructure or quasi-record that actu-

ally contains the BLOB dataDepending on the size of the BLOBthis BLOB-record will be one ofthree types

The first type is the simplest If thesize of BLOB-field data is less thanthe free space on the data page it isplaced on the data page as a sepa-rate record of BLOB type

Author Dmitri Kouzmenkokdvib-aidcom

Author Alexey Kovyazinakib-aidcom

Server internals

12

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Inside BLOBs

The second type is used when thesize of BLOB is greater than thefree space on the page In thiscase references to pages contain-ing the actual BLOB data arestored in a quasi-record Thus atwo-level structure of BLOB-fielddata is used

If the size of BLOB-field contents isvery large a three-level structureis used ndash a quasi-record stores ref-erences to BLOB pointer pageswhich contain references to theactual BLOB data

The whole structure of BLOB stor-age (except for the quasi-recordof course) is implemented by onepage type ndash the BLOB page typeDifferent types of BLOB-pages dif-fer from each other in the presenceof a flag (value 0 or 1) defininghow the server should interpret thegiven page

BLOB Page

The blob page consists of the fol-lowing parts

The special header contains thefollowing information

bullThe number of the first blob pagein this blob It is used to check thatpages belong to one blob

bullA sequence number This isimportant in checking the integrityof a BLOB For a BLOB pointerpage it is equal to zero

bullThe length of data on a page Asa page may or may not be filled tothe full extent the length of actualdata is indicated in the header

Maximum BLOB size

As the internal structure for storingBLOB data can have only 3 levelsof organization and the size ofdata page is also limited it is pos-sible to calculate the maximum sizeof a BLOB

However this is a theoretical limit(if you want you can calculate it)but in practice the limit will bemuch lower The reason for thislower limit is that the length ofBLOB-field data is determined by avariable of ULONG type ie itsmaximal size will be equal to 4gigabytes

Moreover in reality this practicallimit is reduced if a UDF is to beused for BLOB processing Aninternal UDF implementationassumes that the maximum BLOB

size will be 2 gigabytes So if youplan to have very large BLOBfields in your database you shouldexperiment with storing data of alarge size beforehand

The segment size mystery

Developers of database applica-tions often ask what the SegmentSize parameter in the definition ofa BLOB is why we need it andwhether or not we should set itwhen creating Blob-fields

In reality there is no need to setthis parameter Actually it is a bitof a relic used by the GPRE utilitywhen pre-processing EmbeddedSQL When working with BLOBsGPRE declares a buffer of speci-fied size based on the segmentsize Setting the segment size hasno influence over the allocationand the size of segments whenstoring the BLOB on disk It alsohas no influence on performanceTherefore the segment size can besafely set to any value but it is setto 80 bytes by default

Information for those who want toknow everything the number 80was chosen because 80 symbolscould be allocated in alphanumer-ic terminals

TestBed

13

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

In issue 1 we published DmitriYemanovrsquos article about the inter-nals of savepoints While that articlewas still on the desk Borlandannounced the release of InterBase751 introducing amongst otherthings a NO SAVEPOINT optionfor transaction management Is thisan important improvement for Inter-Base We decided to give thisimplementation a close look and testit some to discover what it is allabout

Testing NO SAVEPOINT

In order to analyze the problem thatthe new transaction option wasintended to address and to assessits real value we performed severalvery simple SQL tests The tests areall 100 reproducible so you willbe able to verify our results easily

Database for testing

The test database file was created inInterBase 751 page size = 4096character encoding is NONE Itcontains two tables one stored pro-cedure and three generators

For the test we will use only one

Testing the NO SAVEPOINTfeature in InterBase 751 Author Alexey Kovyazin

akib-aidcom

Author Vlad Horsunhvladuserssourceforgenet

table with the following structure

CREATE TABLE TEST (ID NUMERIC(182)NAME VARCHAR(120)DESCRIPTION VARCHAR(250)CNT INTEGERQRT DOUBLE PRECISIONTS_CHANGE TIMESTAMPTS_CREATE TIMESTAMPNOTES BLOB

)This table contains 100000 records which will be updated during the testThe stored procedure and generators are used to fill the table with test dataYou can increase the quantity of records in the test table by calling thestored procedure to insert them

SELECT FROM INSERTRECS(1000)

The second table TEST2DROP has the same structure as the first and is filledwith the same records as TESTINSERT INTO TEST2DROP SELECT FROM TESTAs you will see the second table will be dropped immediately after connectWe are just using it as a way to increase database size cheaply the pagesoccupied by the TEST2DROP table will be released for reuse after we dropthe table With this trick we avoid the impact of database file growth on thetest results

Setting test environment

All that is needed to perform this test is the trial installation package of Inter-Base 751 the test database and an SQL script

Download the InterBase 751 trialversion from wwwborlandcomThe installation process is obviousand well-described in the InterBasedocumentation

You can download a backup of thetest database ready to use fromhttpwwwibdevelopercomissue2testspbackupzip (~4 Mb) oralternatively an SQL script for cre-ating it fromhttpwwwibdevelopercomissue2testspdatabasescriptzip (~1Kb)

If you download the databasebackup the test tables are alreadypopulated with records and youcan proceed straight to the sectionldquoPreparing to testrdquo below

If you choose instead to use theSQL script you will create data-base yourself Make sure youinsert 100000 records into tableTEST using the INSERTRECS storedprocedure and then copy all ofthem to TEST2DROP three or fourtimes

After that perform a backup of thisdatabase and you will be on thesame position as if you had down-

TestBed

14

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

loaded ldquoready for userdquo backup

Hardware is not a material issue for these tests since we are only comparing performance with and without the NOSAVEPOINT option Our test platform was a modest computer with Pentium-4 2 GHz with 512 RAM and an 80GBSamsung HDD

Preparing to test

A separate copy of the test database is used for each test case in order to eliminate any interference between state-ments We create four fresh copies of the database for this purpose Supposing all files are in a directory calledCTEST simply create the four test databases from your test backup file

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp1ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp2ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp3ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp4ib

SQL test scripts

The first script tests a regular one-pass update without the NO SAVEPOINT option

For convenience the important commands are clarified with comments

connect CTESTtestsp1ib USER SYSDBA Password masterkey Connect drop table TEST2DROP Drop table TEST2DROP to free database pagescommitselect count() from test Walk down all records in TEST to place them into cachecommitset time on enable time statistics for performed statementsset stat on enables writesfetchesmemory statisticscommit perform bulk update of all records in TEST table update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommit

quit The second script tests performance for the same UPDATE with the NO SAVEPOINT option

TestBed

15

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

connect Ctestsp2ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitSET TRANSACTION NO SAVEPOINT enable NO SAVEPOINTupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit

Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script both scripts arethe same simply testing the behavior of engine in case of the single bulk UPDATE

To test sequential UPDATEs we added several UPDATE statements--we recommend using five The script for testing without NO SAVEPOINT would be

connect Etestsp3ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit

Fyracle 089

Janus has released anew version of Oracle-mode Firebird FyracleFyracle is a specialized

build of Firebird 15it adds temporary

tables hierarchicalqueries and

a PLSQL engine

Version 089 addssupport for storedprocedures written

in Java

Fyracle dramaticallyreduces the cost of port-ing Oracle-based appli-

cations to Firebird

Common usageincludes the

Compiere open sourceERP package

mid-market deploymentsof Developer

2000 applicationsand demo CDsof applications

without license trouble

Read more at

wwwjanus-soft-warecom

News amp Events

IBAnalyst 19

IBSurgeon has issued newversion of IBAnalyst

Now it can better analyzeInterBase or Firebird data-base statistics with usingmetadata information (in thiscase connection to databaseis required)

IBAnalyst is a tool that assistsa user to analyze in detailFirebird or InterBase data-base statistics and identifypossible problems with data-base performance mainte-nance and how an applica-tion interacts with the data-base

It graphically displays data-base statistics and can thenautomatically make intelli-gent suggestions aboutimproving database per-formance and databasemaintenance

wwwibsurgeoncomnewshtml

News amp Events

You can download all the scripts and the raw results of their execution from this locationhttpwwwibdevelopercomissue2testresultszip

TestBed

16

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

How to perform the test

The easiest way to perform the test is to use isqls INPUT command

Suppose you have the scripts located in ctestscripts

gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql

Test resultsThe single bulk UPDATE

First letrsquos perform the test where the single-pass bulk UPDATE is performed

This is an excerpt from the one-pass script with default transaction settings

SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980

SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3

Fast Report 319

The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains

Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors

TrialBuy Now

News amp Events

TECT Softwarepresents

New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet

SPGen Stored Procedure Gen-erator for Firebird and Inter-

Base creates a standard set ofstored procs for any table full

details can be found here

httpwwwtectsoftnetProductsFirebird

FIBSPGenaspx

And FBMail Firebird EMail(FBMail) is a cross platform PHP

utility which easily allows thesending of emails direct from

within a database

This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected

to the internet

Full detailscan be found here

httpwwwtectsoftnetProductsFirebird

FBMailaspx

News amp Events

This is an excerpt from the one-pass script with NO SAVEPOINT enabled

CopyTiger 100

CopyTigera new productfrom Microtecis an advanced

databasereplication amp

synchronisationtool for

InterBaseFirebird

powered bytheir CopyCatcomponent set(see the articleabout CopyCat

in this issue)

For moreinformation about

CopyTiger seewwwmicrotecfr

copycatct

Downloadan evaluation

version

TestBed

17

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154

SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3

Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not

Sequential bulk UPDATE statements

With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large

Table 1

Test results for 5 sequentalUPDATEs

News amp Events

SQLHammer 15Enterprise Edition

The interestingdeveloper toolSQLHammer

now hasan Enterprise Edition

SQLHammer isa high-performance tool

for rapiddatabase development

and administration

It includes a commonintegrated development

environmentregistered custom

componentsbased on the

Borland Package Librarymechanism

and an Open Tools APIwritten in

DelphiObject Pascal

wwwmetadataforgecom

News amp Events

TestBed

18

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

For convenience the results are tabulated above

The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT

Figure 1

Time to perform UPDATEs with and without NO SAVEPOINT

Table 1

Test results for 5 sequental UPDATEs

In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used

The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters

Figure 2

Memory usage while performing UPDATEs with and without NO SAVEPOINT

With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below

Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains

A few words about versions

You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The

TestBed

19

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

old version does not disappearimmediately but is retained as abackversion

In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk

The UNDO log concept

You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested

A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point

So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources

The NO SAVEPOINToption

The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)

Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes

The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management

First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see

ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)

Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions

Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes

Initial situation

Consider the implementation details of the undo-log Figure 3 shows the initial situation

Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record

Figure 3

Initial situation before any UPDATE - only the one record version exists Undo log is empty

The first UPDATE

The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager

It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk

TestBed

20

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 4

UPDATE1 create small delta version on disk and put record number into UNDO log

This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled

The second UPDATE

When the second UPDATE statement is performed on the same set of records we have adifferent situation

Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used

To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory

The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required

Figure 5

The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log

This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters

When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first

The third UPDATE

The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further

TestBed

21

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 6

The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one

Figure 7

If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint

Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)

Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log

A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know

The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE

Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit

BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log

For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7

In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope

SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping

Development area

22

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article

The sources in the bibliography arelisted in the order of their appear-ance in my mind

The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability

of all the methods taken from thearticles and of course I have testedall my own methods

Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise

I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc

And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle

The problem state-ment

What is the problem

Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common

1Everyone got used to them andfelt comfortable with them

2They are quite fast (if you usethem according to certain knownstandards)

3They use SQL which is an easycomprehensible and time-proveddata manipulation method

4They are based upon a strongmathematical theory

5They are convenient for applica-

tion development - if you developyour applications just like 20-30years ago

As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and

Author Vladimir Kotlyarevskyvladcontekru

Development area

23

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip

As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored

What do we need

The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing

RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task

The OID

All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it

First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have

any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database

OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex

Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)

Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world

And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones

Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles

The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers

ClassId

All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is

Development area

24

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain

Name Description cre-ation_date change_dateowner

Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect

links is often a very complicatedtask to put it mildly

Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later

The OBJECTS Table

Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure

So let us design this table

Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)

The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)

Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method

Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage

There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student

create domain TOID as integer not nullcreate table OBJECTS(

OID TOID primary keyClassId TOIDName varchar(32)

A simple plain dictionary can be retrieved from the OBJECTStable by the following query

select OID Name Description from OBJECTS where ClassId = LookupId

Development area

25

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description

from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)

Later I will demonstrate how to add alittle more intelligence to such simpledictionaries

Storing of more complexobjects

It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping

Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders

create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))

which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to

ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query

select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id

As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did

If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M

create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))

One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present

Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database

Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this

create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))

connected 1M with OBJECTS by OID There is also a table

create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))

which describes type metadata ndash an attribute set (their names and types) for eachobject type

All attributes for particular object where the OID is known are retrieved by the query

select attribute_id value from attributeswhere OID = oid

or with names of attributes

select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id

Development area

26

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries

select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id

Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query

Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the

database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base

The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1

Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage

Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB

It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-

utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation

To be continuedhellip

FastReport 3 - new generation of the reporting tools

Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix

reports support support most popular DB-engines

Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags

(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)

Access to DB fields styles text flow URLs Anchors

httpwwwfast-reportcom

Development area

27

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

Replicating and synchronizingInterBaseFirebirddatabases using CopyCat

AuthorJonathan Neve

jonathanmicrotecfr

wwwmicrotecfr

PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication

Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases

Data logging

Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)

Multi-node replication

Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after

In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it

replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)

When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it

Two-way replication

One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed

The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change

Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself

Primary key synchronization

One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-

Development area

28

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values

Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server

When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique

Conflict management

Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-

sion of the record this is a conflict

CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved

This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently

Difficult database structures

There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node

Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged

To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an

Development area

29

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable

PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms

1 As a set of Delphi C++ Builder components

2 As a standalone Win32 replication tool

We will now present each one of these products

1 CopyCat Delphi C++ Builder Component Set

CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions

Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server

Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table

Below is a concise guide for getting started with the CopyCat component suite

Download and install the evaluation components fromhttpwwwmicrotecfrcopycat

Prepare databases

1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components

2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)

3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo

4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base

5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo

Development area

30

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data

6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority

7) Apply all the generated SQL to all databases that should be replicated

8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)

Replicate

1 In Delphi open the ldquoReplicatorrdquo example project

2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property

3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases

4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)

5 Compile and run the example

6 Press the ldquoReplicate nowrdquo button

2 CopyTiger the CopyCat Win32standalone replication tool

For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER

Features include

bull Easy to use installer

bull Independent server Administrator tool (CTAdmin)

bull Configuration wizard for setting up links to master slave databases

bull Robust replication engine based on Microtec CopyCat

bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections

bull Simple amp intuitive control panel

bullAutomatic email notification on certain events (conflicts PK violations etc)

Visit the CopyTiger homepage to download a time-limited trial version

httpwwwmicrotecfrcopycatct

SUMMARY

In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit

1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications

2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure

CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool

You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr

As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold

Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code

ibd-852-150

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 3: The InterBase and Firebird Developer Magazine, Issue 2, 2005

Editorrsquos note

4

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Rock around the blockFirebird conference

ed to InterBase and Firebird andoffer very attractive discounts fordozens of the most popular prod-ucts

The price for CD is USD$999 plusshipping

For details seewwwibdevelopercommagazine-cd

Paper versionof the magazine

In 2006 we intend to issue the firstpaper version of ldquoThe InterBaseand Firebird Developer Maga-zinerdquo The paper version cannot befree because of high productioncosts

We intend to publish 4 issues peryear with a subscription price ofaround USD$49 The issue volumewill be approximately 64 pages

In the paper version we plan toinclude the best articles from theonline issues and of course exclu-sive articles and materials

To be or not be ndash this is the questionthat only you can answer If theidea of subscribing to the paperversion appeals to you pleaseplace a comment in our blog(httpibdevelopercompaper-version) or send email toreadersibdevelopercom withyour pro and cons We look for-

This issue of our magazine almostcoincides with the third annualFirebird World Conference start-ing November 13 in PragueCzech Republic This years con-ference spans three nights with atight programme of back-to-backand parallel presentations overtwo days With speakers in atten-dance from all across Europe andthe Americas presenting topicsthat range from specialised appli-cation development techniques tofirst-hand accounts of Firebirdinternals from the core develop-ers this one promises to be thebest ever

Registration has been well underway for some weeks nowalthough the time is now past forearly bird discounts on registra-tion fees At the time of publish-ing bookings were still beingtaken for accommodation in theconference venue itself HotelOlsanka in downtown PragueThe hotel offers a variety of roomconfigurations including bed andbreakfast if wanted at modesttariffs Registration includes

FirebirdConference

lunches on both conference daysas well as mid-moring and mid-afternoon refreshments

Presentations include one on Fire-birds future development fromthe Firebird Project CoordinatorDmitry Yemanov and anotherfrom Alex Peshkov the architectof the security changes coming inFirebird 2 Jim Starkey will bethere talking about Firebird Vul-can and members of the SASInstitute team will talk aboutaspects of SASs taking on Fire-bird Vulcan as the back-end to itsillustrious statistical softwareMost of the interface develop-ment tools are on the menuincluding Oracle-mode FirebirdPHP Delphi Python and Java(Jaybird)

It is a dont miss for any Firebirddeveloper who can make it toPrague Link to details and pro-gramme either athttpfirebirdsourceforgenetindexphpop=konferenz or at theIBPhoenix websitehttpwwwibphoenixcom

ward for your feedback

Invitation

As a footnote Irsquod like to invite

all people who are in touch withFirebird and InterBase to partici-pate in the life of the communityThe importance of a large activecommunity to any public project --and a DBMS with hundreds of thou-sands of users is certainly public --cannot be emphasised enough It isthe key to survival for such projectsRead our magazine ask your ques-tions on forums and leave com-ments in blog just rock and roll withus

Sincerely

Alexey Kovyazin

Chief Editor

editoribdevelopercom

comingsoon

Oldest Active

5

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

On Silly Questionsand Idiotic Outcomes

Firebird owes its existence to the communitys support lists Though itsall ancient history now if it hadnt been for the esprit de corps among theregulars of the old InterBase support list hosted by merscom in thenineties Borlands shock-horror decision to kill off InterBase develop-ment at the end of 1999 would have been the end of it for the English-speaking users at least

Firebirds support lists on Yahoo and Sourceforge were born out of thesame cataclysm of fatal fire and regeneration that caused the phoenixdrawn from the fire against extreme adversity to take flight as Firebirdin July 2000 Existing and new users both of Firebird and of the richselection of drivers and tools that spun off from and for it depend heav-ily on the lists to get up to speed with the evolution of all of these soft-ware products

Newbies often observe that there is just nothing like our lists for anyother software be it open source or not One of the things that I thinkmakes the Firebird lists stand out from the crowd is that once gripped bythe power of the software our users never leave the lists Instead theystick around learn hard and stay ready and willing to impart to otherswhat they have learnt After nearly six years of this across a dozen listsour concentrated mix of expertise and fidelity is hard to beat

Now doesnt all this make us all feel warm and fuzzy For those of usin the hot core of this voluntary support infrastructure the answer has tobe unfortunately Not all the time There are some bugbears that canmake the most patient person see red Im using this column to drawattention to some of the worst

First and worst is silly questions You have all seen them Perhapsyou even posted them Subject Help Body Every time I try toconnect to Firebird I get the message Connection refused Whatam I doing wrong What follows from that is a frustrating andtedious game for responders What version of Firebird are youusing Which server model Which platform Which toolWhat connection path (By connection path we mean)

Everyones time is valuable Nobody has the luxury of beingable to sit around all day playing this game If we werent will-ing to help we wouldnt be there But you make it hard or even

impossible when you post prob-lems with no descriptions Youwaste our time You waste yourtime We get frustrated becauseyou dont provide the facts youget frustrated because we cantread your mind And the list getsfilled with noise

If theres something Ive learnt inmore than 12 years of on-linesoftware support it is that a prob-lem well described is a problemsolved If a person applies timeand thought to presenting a gooddescription of the problemchances are the solution will hopup and punch him on the noseEven if it doesnt good descrip-tions produce the fastest rightanswers from list contributorsFurthermore those good descrip-tions and their solutions form apowerful resource in the archivesfor others coming along behind

Off-topic questions are anothersource of noise and irritation in thelists At the Firebird website wehave bent over backwards tomake it very clear which lists areappropriate for which areas ofinterest An off-topic posting iseasily forgiven if the poster polite-ly complies with a request from alist regular to move the question tox list When these events insteadbecome flame threads or whenthe same people persistently reof-fend they are an extreme burdenon everyone

In the highly tedious categorywe have the kind of question thatgoes like this Subject Bug inFirebird SQL Body This state-ment works inMSSQLAccessMySQLinsertany non-standard DBMS namehere It doesnt work in FirebirdWhere should I report this bugTo be fair George Bernard Shawwasnt totally right when he saidIgnorance is a sin However

you do at least owe it to yourselfto know what you are talkingabout and not to present yourproblem as an attack on our soft-ware Whether intended or not itcomes over as a troll and youcome across as a fool Theres astrong chance that your attitudewill discourage anyone frombothering with you at all Its no-win sure but its also a reflectionof human nature You reap whatyou sow

In closing this little tirade I justwant to say that I hope Ive strucka chord with those of our reader-ship who struggle to get satisfac-tion from their list postings It willbe a rare question indeed that hasno answer Those rarities well-presented become excellent bugreports If youre not getting ananswer theres a high chance thatyou asked a silly question

Helen E M Borrie

helebortpgcomau

On Silly Questions and Idiotic Outcomes

Cover story

6

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

Firebird WorldwideConference 2005

The Firebird conferencewill take place

at the Hotel Olsankain Prague

Czech Republicfrom the evening

of Sunday the 13thof November

(opening session)until the evening

of Tuesday the 15thof November

(closing session)

News amp Events

One of the best-known facts aboutFirebird is that it uses multi-versionconcurrency control instead ofrecord locking For databases thatmanage concurrency thoughrecord locks understanding mecha-nisms of record locking is critical todesigning a high performance highconcurrency application In gener-al record locking is irrelevant toFirebird However Firebird useslocks to maintain internal consisten-cy and for interprocess coordina-tion Understanding how Firebirddoes (and does not) use locks helpswith system tuning and anticipatinghow Firebird will behave underload

Irsquom going to start by describinglocking abstractly then Firebirdlocking more specifically then thecontrols you have over the Firebirdlock table and how they can affectyour database performance So ifyou just want to learn about tuningskip to the end of the article

Concurrency control

Concurrency control in databasesystems has three basic functionspreventing concurrent transactionsfrom overwriting each othersrsquochanges preventing readers fromseeing uncommitted changes andgiving a consistent view of the data-

Locking Firebird and the Lock Tablebase to running transactions Thosethree functions can be implementedin various ways The simplest is toserialize transactions ndash allowingeach transaction exclusive access tothe database until it finishes Thatsolution is neither interesting nordesirable

A common solution is to allow eachtransaction to lock the data it useskeeping other transactions fromreading data it changes and fromchanging data it reads Moderndatabases generally lock recordsFirebird provides concurrency con-trol without record locks by keepingmultiple versions of records eachmarked with the identifier of thetransaction that created it Concur-rent transactions donrsquot overwriteeach otherrsquos changes because thesystem will not allow a transactionto change a record if the mostrecent version was created by aconcurrent transaction Readersnever see uncommitted databecause the system will not returnrecord versions created by concur-rent transactions Readers see aconsistent version of the databasebecause the system returns only thedata committed when they start ndashallowing other transactions to cre-ate newer versions Readers donrsquotblock writers

A lock sounds like a very solidobject but in database systems alock anything but solid ldquoLockingrdquo isa shorthand description of a proto-col for reserving resources Data-bases use locks to maintain consis-tency while allowing concurrentindependent transactions to updatedistinct parts of the database Eachtransaction reserves the resources ndashtables records data pages ndash that itneeds to do its work Typically thereservation is made in memory tocontrol a resource on disk so thecost of reserving the resource is notsignificant compared with the costof reading or changing theresource

Locking example

ldquoResourcerdquo is a very abstract termLets start by talking about lockingtables Firebird does lock tables butnormally it locks them only to pre-vent catastrophes like having onetransaction drop a table that anoth-er transaction is using

Before a Firebird transaction canaccess a table it must get a lock onthe table The lock prevents othertransactions from dropping thetable while it is in use When atransaction gets a lock on a tableFirebird makes an entry in its tableof locks indicating the identity of

the transaction that has the lock theidentity of the table being lockedand a function to call if anothertransaction wants an incompatiblelock on the table The normal tablelock is shared ndash other transactionscan lock the same table in the sameway

Before a transaction can delete atable it must get an exclusive lockon the table An exclusive lock isincompatible with other locks If anytransaction has a lock on the tablethe request for an exclusive lock isdenied the drop table statementfails Returning an immediate erroris one way of dealing with conflict-ing lock requests The other is to putthe conflicting request on a list ofunfulfilled requests and let it wait forthe resource to become available

In its simplest form that is how lock-ing works All transactions followthe formal protocol of requesting alock on a resource before using itFirebird maintains a list of lockedresources a list of requests for lockson resources ndash satisfied or waitingndash and a list of the owners of lockrequests When a transactionrequests a lock that is incompatiblewith existing locks on a resource

Author Ann W Harrisonaharrisonibphoenixcom

Requestfor Sponsors

How to Register

ConferenceTimetable

ConferencePapers

and Speaker

Cover story

7

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

Firebird either denies the newrequest or puts it on a list to waituntil the resource is available Inter-nal lock requests specify whetherthey wait or receive an immediateerror on a case-by-case basisWhen a transaction starts it speci-fies whether it will wait for locks thatit acquires on tables etc

Lock modes

For concurrency and read commit-ted transactions Firebird lockstables for shared read or sharedwrite Either mode says ldquoIrsquom usingthis table but you are free to use ittoordquo Consistency mode transac-tions follow different rules Theylock tables for protected read orprotected write Those modes sayldquoIrsquom using the table and no one elseis allowed to change it until Irsquomdonerdquo Protected read is compati-ble with shared read and other pro-tected read transactions Protectedwrite is only compatible with shareread

The important concept about lockmodes is that locks are more subtlethan mutexes ndash locks allowresource sharing as well as protect-ing resources from incompatibleuse

Two-phase locking vs tran-sient locking

The table locks that wersquove been

describing follow a protocol knownas two-phase locking which is typi-cal of locks taken by transactions indatabase systems Databases thatuse record locking for consistencycontrol always use two-phaserecord locks In two-phase lockinga transaction acquires locks as itproceeds and holds the locks until itends Once it releases any lock itcan no longer acquire another Thetwo phases are lock acquisition andlock release They cannot overlap

When a Firebird transaction readsa table it holds a lock on that tableuntil it ends When a concurrencytransaction has acquired a sharedwrite lock to update a table no con-sistency mode transaction will beable to get a protected lock on thattable until the transaction with theshared write lock ends and releasesits locks Table locking in Firebird istwo-phase locking

Locks can also be transient takenand released as necessary duringthe running of a transaction Fire-bird uses transient locking exten-sively to manage physical access tothe database

Firebird page locks

One major difference between Fire-bird and most other databases isFirebirdrsquos Classic mode In Classicmode many separate processesshare write access to a single data-

base file Most databases have asingle server process like Super-Server that has exclusive access tothe database and coordinatesphysical access to the file within

itself Firebird coordinates physicalaccess to the database throughlocks on database pages

In general database theory a trans-

Cover story

8

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

action is a set of steps that transformthe database from on consistentstate to another During that trans-formation the resources held by thetransaction must be protected fromincompatible changes by othertransactions Two-phase locks arethat protection

In Firebird internally each time atransaction changes a page itchanges that page ndash and the physi-cal structure of the database as awhole ndash from one consistent state toanother Before a transaction readsor writes a database page it locksthe page When it finishes readingor writing it can release the lockwithout compromising the physicalconsistency of the database fileFirebird page level locking is tran-sient Transactions acquire andrelease page locks throughout theirexistence However to preventdeadlocks transactions must beable to release all the page locks itholds before acquiring a lock on anew page

The Firebird lock table

When all access to a database isdone in a single process ndash as is thecase with most database systems ndashlocks are held in the serverrsquos memo-ry and the lock table is largely invis-ible The server process extends orremaps the lock information asrequired Firebird however man-ages its locks in a shared memory

section In SuperServer only theserver uses that share memoryarea In Classic every databaseconnection maps the shared memo-ry and every connection can readand change the contents of thememory

The lock table is a separate piece ofshare memory In SuperServer thelock table is mapped into the serverprocess In Classic each processmaps the lock table All databaseson a server computer share thesame lock table except those run-ning with the embedded server

The Firebird lock manager

We often talk about the FirebirdLock Manager as if it were a sepa-rate process but it isnrsquot The lockmanagement code is part of theengine just like the optimizer pars-er and expression evaluator Thereis a formal interface to the lockmanagement code which is similarto the formal interface to the distrib-uted lock manager that was part ofVAXVMS and one of the inter-faces to the Distributed Lock Man-ager from IBM

The lock manager is code in theengine In Classic each process hasits own lock manager When aClassic process requests or releasesa lock its lock management codeacquires a mutex on the sharedmemory section and changes the

state of the lock table to reflect itsrequest

Conflicting lock requests

When a request is made for a lock

on a resource that is already lockedin an incompatible mode one oftwo things happens Either therequesting transaction gets animmediate error or the request is

Registering for the Conference

Call for papers

Sponsoring the Firebird Conference

httpfirebird-conferencecom

Cover story

9

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

put on a list of waiting requests andthe transactions that hold conflictinglocks on the resource are notified ofthe conflicting request Part of everylock request is the address of a rou-tine to call when the lock interfereswith another request for a lock onthe same object Depending on theresource the routine may cause thelock to be released or require thenew request to wait

Transient locks like the locks ondatabase pages are releasedimmediately When a transactionrequests a page lock and that pageis already locked in an incompati-ble mode the transaction or trans-actions that hold the lock are noti-fied and must complete what theyare doing and release their locksimmediately Two-phase locks liketable locks are held until the trans-action that owns the lock completesWhen the conflicting lock isreleased and the new lock is grant-ed then transaction that had beenwaiting can proceed

Locks as interprocess com-munication

Lock management requires a highspeed completely reliable commu-nication mechanism between trans-actions including transactions indifferent processes The actualmechanism varies from platform toplatform but for the database to

work the mechanism must be fastand reliable A fast reliable inter-process communication mechanismcan be ndash and is ndash useful for a num-ber of purposes outside the areathatrsquos normally considered data-base locking

For example Firebird uses the locktable to notify running transactionsof the existence of a new index on atable Thatrsquos important since assoon as an index becomes activeevery transaction must help main-tain it ndash making new entries when itstores or modifies data removingentries when it modifies or deletesdata

When a transaction first referencesa table it gets a lock on the exis-tence of indexes for the tableWhen another transaction wants tocreate a new index on that table itmust get an exclusive lock on theexistence of indexes for the tableIts request conflicts with existinglocks and the owners of those locksare notified of the conflict Whenthose transactions are in a statewhere they can accept a new indexthey release their locks and imme-diately request new shared locks onthe existence of indexes for thetable The transaction that wants tocreate the index gets its exclusivelock creates the index and com-mits releasing its exclusive lock onthe existence of indexing As othertransactions get their new locks

they check the index definitions forthe table find the new index defini-tion and begin maintaining theindex

Firebird locking summary

Although Firebird does not lockrecords it uses locks extensively toisolate the effects of concurrenttransactions Locking and the locktable are more visible in Firebirdthan in other databases because thelock table is a central communica-tion channel between the separateprocesses that access the databasein Classic mode In addition to con-trolling access to database objectslike tables and data pages the Fire-bird lock manager allows differenttransactions and processes to notifyeach other of changes to the state ofthe database new indexes etc

Lock table specifics

The Firebird lock table is an in-mem-ory data area that contains of fourprimary types of blocks The lockheader block describes the locktable as a whole and containspointers to lists of other blocks andfree blocks Owner blocks describethe owners of lock requests ndash gen-erally lock owners are transactionsconnections or the SuperServerRequest blocks describe the rela-tionship between an owner and alockable resource ndash whether therequest is granted or pending the

mode of the request etc Lockblocks describe the resources beinglocked

To request a lock the owner findsthe lock block follows the linked listof requests for that lock and addsits request at the end If other own-ers must be notified of a conflictingrequest they are located throughthe request blocks already in the listEach owner block also has a list ofits own requests The performancecritical part of locking is finding lockblocks For that purpose the locktable includes a hash table foraccess to lock blocks based on thename of the resource being locked

A quick refresher on hashing

A hash table is an array with linkedlists of duplicates and collisionshanging from it The names of lock-able objects are transformed by afunction called the hash functioninto the offset of one of the elementsof the array When two namestransform to the same offset theresult is a collision When two lockshave the same name they areduplicates and always collide

In the Firebird lock table the arrayof the hash table contains theaddress of a hash block Hashblocks contain the original name acollision pointer a duplicate point-er and the address of the lock blockthat corresponds to the name Thecollision pointer contains the

Cover story

10

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

address of a hash block whosename hashed to the same valueThe duplicate pointer contains theaddress of a hash block that hasexactly the same name

A hash table is fast when there arerelatively few collisions With nocollisions finding a lock blockinvolved hashing the name index-ing into the array and reading thepointer from the first hash blockEach collision adds another pointerto follow and name to check Theratio of the size of the array to thenumber of locks determines thenumber of collisions Unfortunatelythe width of the array cannot beadjusted dynamically because thesize of the array is part of the hashfunction Changing the widthchanges the result of the functionand invalidates all existing entries inthe hash table

Adjusting the lock table toimprove performance

The size of the hash table is set inthe Firebird configuration file Youmust shut down all activity on alldatabases that share the hash tablendash normally all databases on themachine ndash before changes takeeffect The Classic architecture usesthe lock table more heavily thanSuperServer If you choose theClassic architecture you shouldcheck the load on the hash tableperiodically and increase the num-

ber of hash slots if the load is highThe symptom of an overloadedhash table is sluggish performanceunder load

The tool for checking the lock tableis fb_lock_print which is a com-mand line utility in the bin directoryof the Firebird installation tree Thefull lock print describes the entirestate of the lock table and is of limit-ed interest When your system isunder load and behaving badlyinvoke the utility with no options orswitches directing the output to afile Open the file with an editorYoull see output that starts some-thing like this

LOCK_HEADER BLOCK Version114 Active owner0 Length262144 Used85740 Semmask0x0 Flags 0x0001 Enqs 18512 Converts 490 Rejects0 Blocks 0 Deadlock scans0 Deadlocks0 Scan interval10 Acquires 21048 Acquire blocks0 Spin count0 Mutex wait103 Hash slots101 Hash lengths (minavgmax)3 15 30

hellipThe seventh and eighth lines suggestthat the hash table is too small andthat it is affecting system perform-ance In the example these valuesindicate a problem

Mutex wait 103 Hash slots 101 Hash lengths (minavgmax)3 15 30 In the Classic architecture eachprocess makes its own changes tothe lock table Only one process is

allowed to update the lock table atany instant When updating the locktable a process holds the tablersquosmutex A non-zero mutex wait indi-cates that processes are blocked bythe mutex and forced to wait foraccess to the lock table In turn thatindicates a performance probleminside the lock table typicallybecause looking up a lock is slow

If the hash lengths are more thanmin 5 avg 10 or max 30 youneed to increase the number ofhash slots The hash function used inFirebird is quick but not terribly effi-cient It works best if the number ofhash slots is prime

Change this line in the configurationfileLockHashSlots = 101

Uncomment the line by removing

the leading Choose a value thatis a prime number less than 2048

LockHashSlots = 499

The change will not take effect untilall connections to all databases onthe server machine shut down

If you increase the number of hashslots you should also increase thelock table size The second line ofthe lock print

Version114 Active owner 0Length 262144 Used 85740

tells you how close you are to run-ning out of space in the lock tableThe Version and Active owner areuninteresting The length is the max-imum size of the lock table Used isthe amount of space currently allo-cated for the various block typesand hash table If the amount usedis anywhere near the total lengthuncomment this parameter in theconfiguration file by removing theleading and increase the value

LockMemSize = 262144

to this

LockMemSize = 1048576

The value is bytes The default locktable is about a quarter of amegabyte which is insignificant onmodern computers Changing thelock table size will not take effectuntil all connections to all databaseson the server machine shut down

PHP Server

One of the more interest-ing recent developmentsin information technolo-gy has been the rise ofbrowser based applica-tions often referred toby the acronym LAMP

One key hurdle forbroad use of the LAMPtechnology for mid-mar-ket solutions was that itwas never easy to con-figure and manage

PHPServer changes thatit can be installed withjust four clicks of themouse and support forFirebird is compiled in

PHPServer shares all thequalities of Firebird it isa capable compacteasy to install and easyto manage solution

PHPServer is a freedownload

Read more at

wwwfyracleorgphpserverhtml

News amp Events

Server internals

11

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Inside BLOBs

How the serverworks with BLOBs

The BLOB data type is intended forstoring data of variable size Fieldsof BLOB type allow for storage ofdata that cannot be placed in fieldsof other types - for example pic-tures audio files video fragmentsetc

From the point view of the databaseapplication developer using BLOBfields is as transparent as it is forother field types (see chapter ldquoDatatypesrdquo for details) However thereis a significant difference betweenthe internal implementation mecha-nism for BLOBs and that for otherdata

Unlike the mechanism used for han-dling other types of fields the data-base engine uses a special mecha-nism to work with BLOB fields Thismechanism is transparently inte-grated with other record handlingat the application level and at thesame time has its own means ofpage organization Lets consider indetail how the BLOB-handlingmechanism works

Inside BLOBsThis is an excerpt from the book ldquo1000 InterBase amp Firebird Tips amp Tricksrdquoby Alexey Kovyazin and Dmitri Kouzmenko which will be published in 2006

Initially the basic record data onthe data page includes a referenceto a ldquoBLOB recordrdquo for each non-null BLOB field ie to record-likestructure or quasi-record that actu-

ally contains the BLOB dataDepending on the size of the BLOBthis BLOB-record will be one ofthree types

The first type is the simplest If thesize of BLOB-field data is less thanthe free space on the data page it isplaced on the data page as a sepa-rate record of BLOB type

Author Dmitri Kouzmenkokdvib-aidcom

Author Alexey Kovyazinakib-aidcom

Server internals

12

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Inside BLOBs

The second type is used when thesize of BLOB is greater than thefree space on the page In thiscase references to pages contain-ing the actual BLOB data arestored in a quasi-record Thus atwo-level structure of BLOB-fielddata is used

If the size of BLOB-field contents isvery large a three-level structureis used ndash a quasi-record stores ref-erences to BLOB pointer pageswhich contain references to theactual BLOB data

The whole structure of BLOB stor-age (except for the quasi-recordof course) is implemented by onepage type ndash the BLOB page typeDifferent types of BLOB-pages dif-fer from each other in the presenceof a flag (value 0 or 1) defininghow the server should interpret thegiven page

BLOB Page

The blob page consists of the fol-lowing parts

The special header contains thefollowing information

bullThe number of the first blob pagein this blob It is used to check thatpages belong to one blob

bullA sequence number This isimportant in checking the integrityof a BLOB For a BLOB pointerpage it is equal to zero

bullThe length of data on a page Asa page may or may not be filled tothe full extent the length of actualdata is indicated in the header

Maximum BLOB size

As the internal structure for storingBLOB data can have only 3 levelsof organization and the size ofdata page is also limited it is pos-sible to calculate the maximum sizeof a BLOB

However this is a theoretical limit(if you want you can calculate it)but in practice the limit will bemuch lower The reason for thislower limit is that the length ofBLOB-field data is determined by avariable of ULONG type ie itsmaximal size will be equal to 4gigabytes

Moreover in reality this practicallimit is reduced if a UDF is to beused for BLOB processing Aninternal UDF implementationassumes that the maximum BLOB

size will be 2 gigabytes So if youplan to have very large BLOBfields in your database you shouldexperiment with storing data of alarge size beforehand

The segment size mystery

Developers of database applica-tions often ask what the SegmentSize parameter in the definition ofa BLOB is why we need it andwhether or not we should set itwhen creating Blob-fields

In reality there is no need to setthis parameter Actually it is a bitof a relic used by the GPRE utilitywhen pre-processing EmbeddedSQL When working with BLOBsGPRE declares a buffer of speci-fied size based on the segmentsize Setting the segment size hasno influence over the allocationand the size of segments whenstoring the BLOB on disk It alsohas no influence on performanceTherefore the segment size can besafely set to any value but it is setto 80 bytes by default

Information for those who want toknow everything the number 80was chosen because 80 symbolscould be allocated in alphanumer-ic terminals

TestBed

13

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

In issue 1 we published DmitriYemanovrsquos article about the inter-nals of savepoints While that articlewas still on the desk Borlandannounced the release of InterBase751 introducing amongst otherthings a NO SAVEPOINT optionfor transaction management Is thisan important improvement for Inter-Base We decided to give thisimplementation a close look and testit some to discover what it is allabout

Testing NO SAVEPOINT

In order to analyze the problem thatthe new transaction option wasintended to address and to assessits real value we performed severalvery simple SQL tests The tests areall 100 reproducible so you willbe able to verify our results easily

Database for testing

The test database file was created inInterBase 751 page size = 4096character encoding is NONE Itcontains two tables one stored pro-cedure and three generators

For the test we will use only one

Testing the NO SAVEPOINTfeature in InterBase 751 Author Alexey Kovyazin

akib-aidcom

Author Vlad Horsunhvladuserssourceforgenet

table with the following structure

CREATE TABLE TEST (ID NUMERIC(182)NAME VARCHAR(120)DESCRIPTION VARCHAR(250)CNT INTEGERQRT DOUBLE PRECISIONTS_CHANGE TIMESTAMPTS_CREATE TIMESTAMPNOTES BLOB

)This table contains 100000 records which will be updated during the testThe stored procedure and generators are used to fill the table with test dataYou can increase the quantity of records in the test table by calling thestored procedure to insert them

SELECT FROM INSERTRECS(1000)

The second table TEST2DROP has the same structure as the first and is filledwith the same records as TESTINSERT INTO TEST2DROP SELECT FROM TESTAs you will see the second table will be dropped immediately after connectWe are just using it as a way to increase database size cheaply the pagesoccupied by the TEST2DROP table will be released for reuse after we dropthe table With this trick we avoid the impact of database file growth on thetest results

Setting test environment

All that is needed to perform this test is the trial installation package of Inter-Base 751 the test database and an SQL script

Download the InterBase 751 trialversion from wwwborlandcomThe installation process is obviousand well-described in the InterBasedocumentation

You can download a backup of thetest database ready to use fromhttpwwwibdevelopercomissue2testspbackupzip (~4 Mb) oralternatively an SQL script for cre-ating it fromhttpwwwibdevelopercomissue2testspdatabasescriptzip (~1Kb)

If you download the databasebackup the test tables are alreadypopulated with records and youcan proceed straight to the sectionldquoPreparing to testrdquo below

If you choose instead to use theSQL script you will create data-base yourself Make sure youinsert 100000 records into tableTEST using the INSERTRECS storedprocedure and then copy all ofthem to TEST2DROP three or fourtimes

After that perform a backup of thisdatabase and you will be on thesame position as if you had down-

TestBed

14

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

loaded ldquoready for userdquo backup

Hardware is not a material issue for these tests since we are only comparing performance with and without the NOSAVEPOINT option Our test platform was a modest computer with Pentium-4 2 GHz with 512 RAM and an 80GBSamsung HDD

Preparing to test

A separate copy of the test database is used for each test case in order to eliminate any interference between state-ments We create four fresh copies of the database for this purpose Supposing all files are in a directory calledCTEST simply create the four test databases from your test backup file

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp1ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp2ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp3ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp4ib

SQL test scripts

The first script tests a regular one-pass update without the NO SAVEPOINT option

For convenience the important commands are clarified with comments

connect CTESTtestsp1ib USER SYSDBA Password masterkey Connect drop table TEST2DROP Drop table TEST2DROP to free database pagescommitselect count() from test Walk down all records in TEST to place them into cachecommitset time on enable time statistics for performed statementsset stat on enables writesfetchesmemory statisticscommit perform bulk update of all records in TEST table update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommit

quit The second script tests performance for the same UPDATE with the NO SAVEPOINT option

TestBed

15

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

connect Ctestsp2ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitSET TRANSACTION NO SAVEPOINT enable NO SAVEPOINTupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit

Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script both scripts arethe same simply testing the behavior of engine in case of the single bulk UPDATE

To test sequential UPDATEs we added several UPDATE statements--we recommend using five The script for testing without NO SAVEPOINT would be

connect Etestsp3ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit

Fyracle 089

Janus has released anew version of Oracle-mode Firebird FyracleFyracle is a specialized

build of Firebird 15it adds temporary

tables hierarchicalqueries and

a PLSQL engine

Version 089 addssupport for storedprocedures written

in Java

Fyracle dramaticallyreduces the cost of port-ing Oracle-based appli-

cations to Firebird

Common usageincludes the

Compiere open sourceERP package

mid-market deploymentsof Developer

2000 applicationsand demo CDsof applications

without license trouble

Read more at

wwwjanus-soft-warecom

News amp Events

IBAnalyst 19

IBSurgeon has issued newversion of IBAnalyst

Now it can better analyzeInterBase or Firebird data-base statistics with usingmetadata information (in thiscase connection to databaseis required)

IBAnalyst is a tool that assistsa user to analyze in detailFirebird or InterBase data-base statistics and identifypossible problems with data-base performance mainte-nance and how an applica-tion interacts with the data-base

It graphically displays data-base statistics and can thenautomatically make intelli-gent suggestions aboutimproving database per-formance and databasemaintenance

wwwibsurgeoncomnewshtml

News amp Events

You can download all the scripts and the raw results of their execution from this locationhttpwwwibdevelopercomissue2testresultszip

TestBed

16

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

How to perform the test

The easiest way to perform the test is to use isqls INPUT command

Suppose you have the scripts located in ctestscripts

gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql

Test resultsThe single bulk UPDATE

First letrsquos perform the test where the single-pass bulk UPDATE is performed

This is an excerpt from the one-pass script with default transaction settings

SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980

SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3

Fast Report 319

The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains

Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors

TrialBuy Now

News amp Events

TECT Softwarepresents

New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet

SPGen Stored Procedure Gen-erator for Firebird and Inter-

Base creates a standard set ofstored procs for any table full

details can be found here

httpwwwtectsoftnetProductsFirebird

FIBSPGenaspx

And FBMail Firebird EMail(FBMail) is a cross platform PHP

utility which easily allows thesending of emails direct from

within a database

This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected

to the internet

Full detailscan be found here

httpwwwtectsoftnetProductsFirebird

FBMailaspx

News amp Events

This is an excerpt from the one-pass script with NO SAVEPOINT enabled

CopyTiger 100

CopyTigera new productfrom Microtecis an advanced

databasereplication amp

synchronisationtool for

InterBaseFirebird

powered bytheir CopyCatcomponent set(see the articleabout CopyCat

in this issue)

For moreinformation about

CopyTiger seewwwmicrotecfr

copycatct

Downloadan evaluation

version

TestBed

17

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154

SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3

Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not

Sequential bulk UPDATE statements

With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large

Table 1

Test results for 5 sequentalUPDATEs

News amp Events

SQLHammer 15Enterprise Edition

The interestingdeveloper toolSQLHammer

now hasan Enterprise Edition

SQLHammer isa high-performance tool

for rapiddatabase development

and administration

It includes a commonintegrated development

environmentregistered custom

componentsbased on the

Borland Package Librarymechanism

and an Open Tools APIwritten in

DelphiObject Pascal

wwwmetadataforgecom

News amp Events

TestBed

18

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

For convenience the results are tabulated above

The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT

Figure 1

Time to perform UPDATEs with and without NO SAVEPOINT

Table 1

Test results for 5 sequental UPDATEs

In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used

The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters

Figure 2

Memory usage while performing UPDATEs with and without NO SAVEPOINT

With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below

Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains

A few words about versions

You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The

TestBed

19

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

old version does not disappearimmediately but is retained as abackversion

In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk

The UNDO log concept

You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested

A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point

So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources

The NO SAVEPOINToption

The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)

Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes

The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management

First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see

ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)

Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions

Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes

Initial situation

Consider the implementation details of the undo-log Figure 3 shows the initial situation

Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record

Figure 3

Initial situation before any UPDATE - only the one record version exists Undo log is empty

The first UPDATE

The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager

It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk

TestBed

20

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 4

UPDATE1 create small delta version on disk and put record number into UNDO log

This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled

The second UPDATE

When the second UPDATE statement is performed on the same set of records we have adifferent situation

Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used

To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory

The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required

Figure 5

The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log

This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters

When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first

The third UPDATE

The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further

TestBed

21

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 6

The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one

Figure 7

If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint

Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)

Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log

A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know

The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE

Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit

BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log

For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7

In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope

SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping

Development area

22

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article

The sources in the bibliography arelisted in the order of their appear-ance in my mind

The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability

of all the methods taken from thearticles and of course I have testedall my own methods

Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise

I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc

And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle

The problem state-ment

What is the problem

Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common

1Everyone got used to them andfelt comfortable with them

2They are quite fast (if you usethem according to certain knownstandards)

3They use SQL which is an easycomprehensible and time-proveddata manipulation method

4They are based upon a strongmathematical theory

5They are convenient for applica-

tion development - if you developyour applications just like 20-30years ago

As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and

Author Vladimir Kotlyarevskyvladcontekru

Development area

23

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip

As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored

What do we need

The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing

RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task

The OID

All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it

First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have

any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database

OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex

Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)

Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world

And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones

Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles

The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers

ClassId

All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is

Development area

24

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain

Name Description cre-ation_date change_dateowner

Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect

links is often a very complicatedtask to put it mildly

Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later

The OBJECTS Table

Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure

So let us design this table

Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)

The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)

Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method

Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage

There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student

create domain TOID as integer not nullcreate table OBJECTS(

OID TOID primary keyClassId TOIDName varchar(32)

A simple plain dictionary can be retrieved from the OBJECTStable by the following query

select OID Name Description from OBJECTS where ClassId = LookupId

Development area

25

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description

from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)

Later I will demonstrate how to add alittle more intelligence to such simpledictionaries

Storing of more complexobjects

It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping

Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders

create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))

which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to

ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query

select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id

As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did

If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M

create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))

One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present

Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database

Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this

create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))

connected 1M with OBJECTS by OID There is also a table

create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))

which describes type metadata ndash an attribute set (their names and types) for eachobject type

All attributes for particular object where the OID is known are retrieved by the query

select attribute_id value from attributeswhere OID = oid

or with names of attributes

select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id

Development area

26

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries

select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id

Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query

Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the

database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base

The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1

Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage

Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB

It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-

utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation

To be continuedhellip

FastReport 3 - new generation of the reporting tools

Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix

reports support support most popular DB-engines

Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags

(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)

Access to DB fields styles text flow URLs Anchors

httpwwwfast-reportcom

Development area

27

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

Replicating and synchronizingInterBaseFirebirddatabases using CopyCat

AuthorJonathan Neve

jonathanmicrotecfr

wwwmicrotecfr

PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication

Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases

Data logging

Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)

Multi-node replication

Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after

In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it

replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)

When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it

Two-way replication

One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed

The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change

Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself

Primary key synchronization

One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-

Development area

28

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values

Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server

When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique

Conflict management

Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-

sion of the record this is a conflict

CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved

This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently

Difficult database structures

There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node

Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged

To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an

Development area

29

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable

PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms

1 As a set of Delphi C++ Builder components

2 As a standalone Win32 replication tool

We will now present each one of these products

1 CopyCat Delphi C++ Builder Component Set

CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions

Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server

Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table

Below is a concise guide for getting started with the CopyCat component suite

Download and install the evaluation components fromhttpwwwmicrotecfrcopycat

Prepare databases

1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components

2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)

3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo

4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base

5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo

Development area

30

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data

6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority

7) Apply all the generated SQL to all databases that should be replicated

8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)

Replicate

1 In Delphi open the ldquoReplicatorrdquo example project

2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property

3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases

4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)

5 Compile and run the example

6 Press the ldquoReplicate nowrdquo button

2 CopyTiger the CopyCat Win32standalone replication tool

For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER

Features include

bull Easy to use installer

bull Independent server Administrator tool (CTAdmin)

bull Configuration wizard for setting up links to master slave databases

bull Robust replication engine based on Microtec CopyCat

bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections

bull Simple amp intuitive control panel

bullAutomatic email notification on certain events (conflicts PK violations etc)

Visit the CopyTiger homepage to download a time-limited trial version

httpwwwmicrotecfrcopycatct

SUMMARY

In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit

1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications

2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure

CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool

You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr

As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold

Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code

ibd-852-150

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 4: The InterBase and Firebird Developer Magazine, Issue 2, 2005

Oldest Active

5

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

On Silly Questionsand Idiotic Outcomes

Firebird owes its existence to the communitys support lists Though itsall ancient history now if it hadnt been for the esprit de corps among theregulars of the old InterBase support list hosted by merscom in thenineties Borlands shock-horror decision to kill off InterBase develop-ment at the end of 1999 would have been the end of it for the English-speaking users at least

Firebirds support lists on Yahoo and Sourceforge were born out of thesame cataclysm of fatal fire and regeneration that caused the phoenixdrawn from the fire against extreme adversity to take flight as Firebirdin July 2000 Existing and new users both of Firebird and of the richselection of drivers and tools that spun off from and for it depend heav-ily on the lists to get up to speed with the evolution of all of these soft-ware products

Newbies often observe that there is just nothing like our lists for anyother software be it open source or not One of the things that I thinkmakes the Firebird lists stand out from the crowd is that once gripped bythe power of the software our users never leave the lists Instead theystick around learn hard and stay ready and willing to impart to otherswhat they have learnt After nearly six years of this across a dozen listsour concentrated mix of expertise and fidelity is hard to beat

Now doesnt all this make us all feel warm and fuzzy For those of usin the hot core of this voluntary support infrastructure the answer has tobe unfortunately Not all the time There are some bugbears that canmake the most patient person see red Im using this column to drawattention to some of the worst

First and worst is silly questions You have all seen them Perhapsyou even posted them Subject Help Body Every time I try toconnect to Firebird I get the message Connection refused Whatam I doing wrong What follows from that is a frustrating andtedious game for responders What version of Firebird are youusing Which server model Which platform Which toolWhat connection path (By connection path we mean)

Everyones time is valuable Nobody has the luxury of beingable to sit around all day playing this game If we werent will-ing to help we wouldnt be there But you make it hard or even

impossible when you post prob-lems with no descriptions Youwaste our time You waste yourtime We get frustrated becauseyou dont provide the facts youget frustrated because we cantread your mind And the list getsfilled with noise

If theres something Ive learnt inmore than 12 years of on-linesoftware support it is that a prob-lem well described is a problemsolved If a person applies timeand thought to presenting a gooddescription of the problemchances are the solution will hopup and punch him on the noseEven if it doesnt good descrip-tions produce the fastest rightanswers from list contributorsFurthermore those good descrip-tions and their solutions form apowerful resource in the archivesfor others coming along behind

Off-topic questions are anothersource of noise and irritation in thelists At the Firebird website wehave bent over backwards tomake it very clear which lists areappropriate for which areas ofinterest An off-topic posting iseasily forgiven if the poster polite-ly complies with a request from alist regular to move the question tox list When these events insteadbecome flame threads or whenthe same people persistently reof-fend they are an extreme burdenon everyone

In the highly tedious categorywe have the kind of question thatgoes like this Subject Bug inFirebird SQL Body This state-ment works inMSSQLAccessMySQLinsertany non-standard DBMS namehere It doesnt work in FirebirdWhere should I report this bugTo be fair George Bernard Shawwasnt totally right when he saidIgnorance is a sin However

you do at least owe it to yourselfto know what you are talkingabout and not to present yourproblem as an attack on our soft-ware Whether intended or not itcomes over as a troll and youcome across as a fool Theres astrong chance that your attitudewill discourage anyone frombothering with you at all Its no-win sure but its also a reflectionof human nature You reap whatyou sow

In closing this little tirade I justwant to say that I hope Ive strucka chord with those of our reader-ship who struggle to get satisfac-tion from their list postings It willbe a rare question indeed that hasno answer Those rarities well-presented become excellent bugreports If youre not getting ananswer theres a high chance thatyou asked a silly question

Helen E M Borrie

helebortpgcomau

On Silly Questions and Idiotic Outcomes

Cover story

6

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

Firebird WorldwideConference 2005

The Firebird conferencewill take place

at the Hotel Olsankain Prague

Czech Republicfrom the evening

of Sunday the 13thof November

(opening session)until the evening

of Tuesday the 15thof November

(closing session)

News amp Events

One of the best-known facts aboutFirebird is that it uses multi-versionconcurrency control instead ofrecord locking For databases thatmanage concurrency thoughrecord locks understanding mecha-nisms of record locking is critical todesigning a high performance highconcurrency application In gener-al record locking is irrelevant toFirebird However Firebird useslocks to maintain internal consisten-cy and for interprocess coordina-tion Understanding how Firebirddoes (and does not) use locks helpswith system tuning and anticipatinghow Firebird will behave underload

Irsquom going to start by describinglocking abstractly then Firebirdlocking more specifically then thecontrols you have over the Firebirdlock table and how they can affectyour database performance So ifyou just want to learn about tuningskip to the end of the article

Concurrency control

Concurrency control in databasesystems has three basic functionspreventing concurrent transactionsfrom overwriting each othersrsquochanges preventing readers fromseeing uncommitted changes andgiving a consistent view of the data-

Locking Firebird and the Lock Tablebase to running transactions Thosethree functions can be implementedin various ways The simplest is toserialize transactions ndash allowingeach transaction exclusive access tothe database until it finishes Thatsolution is neither interesting nordesirable

A common solution is to allow eachtransaction to lock the data it useskeeping other transactions fromreading data it changes and fromchanging data it reads Moderndatabases generally lock recordsFirebird provides concurrency con-trol without record locks by keepingmultiple versions of records eachmarked with the identifier of thetransaction that created it Concur-rent transactions donrsquot overwriteeach otherrsquos changes because thesystem will not allow a transactionto change a record if the mostrecent version was created by aconcurrent transaction Readersnever see uncommitted databecause the system will not returnrecord versions created by concur-rent transactions Readers see aconsistent version of the databasebecause the system returns only thedata committed when they start ndashallowing other transactions to cre-ate newer versions Readers donrsquotblock writers

A lock sounds like a very solidobject but in database systems alock anything but solid ldquoLockingrdquo isa shorthand description of a proto-col for reserving resources Data-bases use locks to maintain consis-tency while allowing concurrentindependent transactions to updatedistinct parts of the database Eachtransaction reserves the resources ndashtables records data pages ndash that itneeds to do its work Typically thereservation is made in memory tocontrol a resource on disk so thecost of reserving the resource is notsignificant compared with the costof reading or changing theresource

Locking example

ldquoResourcerdquo is a very abstract termLets start by talking about lockingtables Firebird does lock tables butnormally it locks them only to pre-vent catastrophes like having onetransaction drop a table that anoth-er transaction is using

Before a Firebird transaction canaccess a table it must get a lock onthe table The lock prevents othertransactions from dropping thetable while it is in use When atransaction gets a lock on a tableFirebird makes an entry in its tableof locks indicating the identity of

the transaction that has the lock theidentity of the table being lockedand a function to call if anothertransaction wants an incompatiblelock on the table The normal tablelock is shared ndash other transactionscan lock the same table in the sameway

Before a transaction can delete atable it must get an exclusive lockon the table An exclusive lock isincompatible with other locks If anytransaction has a lock on the tablethe request for an exclusive lock isdenied the drop table statementfails Returning an immediate erroris one way of dealing with conflict-ing lock requests The other is to putthe conflicting request on a list ofunfulfilled requests and let it wait forthe resource to become available

In its simplest form that is how lock-ing works All transactions followthe formal protocol of requesting alock on a resource before using itFirebird maintains a list of lockedresources a list of requests for lockson resources ndash satisfied or waitingndash and a list of the owners of lockrequests When a transactionrequests a lock that is incompatiblewith existing locks on a resource

Author Ann W Harrisonaharrisonibphoenixcom

Requestfor Sponsors

How to Register

ConferenceTimetable

ConferencePapers

and Speaker

Cover story

7

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

Firebird either denies the newrequest or puts it on a list to waituntil the resource is available Inter-nal lock requests specify whetherthey wait or receive an immediateerror on a case-by-case basisWhen a transaction starts it speci-fies whether it will wait for locks thatit acquires on tables etc

Lock modes

For concurrency and read commit-ted transactions Firebird lockstables for shared read or sharedwrite Either mode says ldquoIrsquom usingthis table but you are free to use ittoordquo Consistency mode transac-tions follow different rules Theylock tables for protected read orprotected write Those modes sayldquoIrsquom using the table and no one elseis allowed to change it until Irsquomdonerdquo Protected read is compati-ble with shared read and other pro-tected read transactions Protectedwrite is only compatible with shareread

The important concept about lockmodes is that locks are more subtlethan mutexes ndash locks allowresource sharing as well as protect-ing resources from incompatibleuse

Two-phase locking vs tran-sient locking

The table locks that wersquove been

describing follow a protocol knownas two-phase locking which is typi-cal of locks taken by transactions indatabase systems Databases thatuse record locking for consistencycontrol always use two-phaserecord locks In two-phase lockinga transaction acquires locks as itproceeds and holds the locks until itends Once it releases any lock itcan no longer acquire another Thetwo phases are lock acquisition andlock release They cannot overlap

When a Firebird transaction readsa table it holds a lock on that tableuntil it ends When a concurrencytransaction has acquired a sharedwrite lock to update a table no con-sistency mode transaction will beable to get a protected lock on thattable until the transaction with theshared write lock ends and releasesits locks Table locking in Firebird istwo-phase locking

Locks can also be transient takenand released as necessary duringthe running of a transaction Fire-bird uses transient locking exten-sively to manage physical access tothe database

Firebird page locks

One major difference between Fire-bird and most other databases isFirebirdrsquos Classic mode In Classicmode many separate processesshare write access to a single data-

base file Most databases have asingle server process like Super-Server that has exclusive access tothe database and coordinatesphysical access to the file within

itself Firebird coordinates physicalaccess to the database throughlocks on database pages

In general database theory a trans-

Cover story

8

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

action is a set of steps that transformthe database from on consistentstate to another During that trans-formation the resources held by thetransaction must be protected fromincompatible changes by othertransactions Two-phase locks arethat protection

In Firebird internally each time atransaction changes a page itchanges that page ndash and the physi-cal structure of the database as awhole ndash from one consistent state toanother Before a transaction readsor writes a database page it locksthe page When it finishes readingor writing it can release the lockwithout compromising the physicalconsistency of the database fileFirebird page level locking is tran-sient Transactions acquire andrelease page locks throughout theirexistence However to preventdeadlocks transactions must beable to release all the page locks itholds before acquiring a lock on anew page

The Firebird lock table

When all access to a database isdone in a single process ndash as is thecase with most database systems ndashlocks are held in the serverrsquos memo-ry and the lock table is largely invis-ible The server process extends orremaps the lock information asrequired Firebird however man-ages its locks in a shared memory

section In SuperServer only theserver uses that share memoryarea In Classic every databaseconnection maps the shared memo-ry and every connection can readand change the contents of thememory

The lock table is a separate piece ofshare memory In SuperServer thelock table is mapped into the serverprocess In Classic each processmaps the lock table All databaseson a server computer share thesame lock table except those run-ning with the embedded server

The Firebird lock manager

We often talk about the FirebirdLock Manager as if it were a sepa-rate process but it isnrsquot The lockmanagement code is part of theengine just like the optimizer pars-er and expression evaluator Thereis a formal interface to the lockmanagement code which is similarto the formal interface to the distrib-uted lock manager that was part ofVAXVMS and one of the inter-faces to the Distributed Lock Man-ager from IBM

The lock manager is code in theengine In Classic each process hasits own lock manager When aClassic process requests or releasesa lock its lock management codeacquires a mutex on the sharedmemory section and changes the

state of the lock table to reflect itsrequest

Conflicting lock requests

When a request is made for a lock

on a resource that is already lockedin an incompatible mode one oftwo things happens Either therequesting transaction gets animmediate error or the request is

Registering for the Conference

Call for papers

Sponsoring the Firebird Conference

httpfirebird-conferencecom

Cover story

9

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

put on a list of waiting requests andthe transactions that hold conflictinglocks on the resource are notified ofthe conflicting request Part of everylock request is the address of a rou-tine to call when the lock interfereswith another request for a lock onthe same object Depending on theresource the routine may cause thelock to be released or require thenew request to wait

Transient locks like the locks ondatabase pages are releasedimmediately When a transactionrequests a page lock and that pageis already locked in an incompati-ble mode the transaction or trans-actions that hold the lock are noti-fied and must complete what theyare doing and release their locksimmediately Two-phase locks liketable locks are held until the trans-action that owns the lock completesWhen the conflicting lock isreleased and the new lock is grant-ed then transaction that had beenwaiting can proceed

Locks as interprocess com-munication

Lock management requires a highspeed completely reliable commu-nication mechanism between trans-actions including transactions indifferent processes The actualmechanism varies from platform toplatform but for the database to

work the mechanism must be fastand reliable A fast reliable inter-process communication mechanismcan be ndash and is ndash useful for a num-ber of purposes outside the areathatrsquos normally considered data-base locking

For example Firebird uses the locktable to notify running transactionsof the existence of a new index on atable Thatrsquos important since assoon as an index becomes activeevery transaction must help main-tain it ndash making new entries when itstores or modifies data removingentries when it modifies or deletesdata

When a transaction first referencesa table it gets a lock on the exis-tence of indexes for the tableWhen another transaction wants tocreate a new index on that table itmust get an exclusive lock on theexistence of indexes for the tableIts request conflicts with existinglocks and the owners of those locksare notified of the conflict Whenthose transactions are in a statewhere they can accept a new indexthey release their locks and imme-diately request new shared locks onthe existence of indexes for thetable The transaction that wants tocreate the index gets its exclusivelock creates the index and com-mits releasing its exclusive lock onthe existence of indexing As othertransactions get their new locks

they check the index definitions forthe table find the new index defini-tion and begin maintaining theindex

Firebird locking summary

Although Firebird does not lockrecords it uses locks extensively toisolate the effects of concurrenttransactions Locking and the locktable are more visible in Firebirdthan in other databases because thelock table is a central communica-tion channel between the separateprocesses that access the databasein Classic mode In addition to con-trolling access to database objectslike tables and data pages the Fire-bird lock manager allows differenttransactions and processes to notifyeach other of changes to the state ofthe database new indexes etc

Lock table specifics

The Firebird lock table is an in-mem-ory data area that contains of fourprimary types of blocks The lockheader block describes the locktable as a whole and containspointers to lists of other blocks andfree blocks Owner blocks describethe owners of lock requests ndash gen-erally lock owners are transactionsconnections or the SuperServerRequest blocks describe the rela-tionship between an owner and alockable resource ndash whether therequest is granted or pending the

mode of the request etc Lockblocks describe the resources beinglocked

To request a lock the owner findsthe lock block follows the linked listof requests for that lock and addsits request at the end If other own-ers must be notified of a conflictingrequest they are located throughthe request blocks already in the listEach owner block also has a list ofits own requests The performancecritical part of locking is finding lockblocks For that purpose the locktable includes a hash table foraccess to lock blocks based on thename of the resource being locked

A quick refresher on hashing

A hash table is an array with linkedlists of duplicates and collisionshanging from it The names of lock-able objects are transformed by afunction called the hash functioninto the offset of one of the elementsof the array When two namestransform to the same offset theresult is a collision When two lockshave the same name they areduplicates and always collide

In the Firebird lock table the arrayof the hash table contains theaddress of a hash block Hashblocks contain the original name acollision pointer a duplicate point-er and the address of the lock blockthat corresponds to the name Thecollision pointer contains the

Cover story

10

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

address of a hash block whosename hashed to the same valueThe duplicate pointer contains theaddress of a hash block that hasexactly the same name

A hash table is fast when there arerelatively few collisions With nocollisions finding a lock blockinvolved hashing the name index-ing into the array and reading thepointer from the first hash blockEach collision adds another pointerto follow and name to check Theratio of the size of the array to thenumber of locks determines thenumber of collisions Unfortunatelythe width of the array cannot beadjusted dynamically because thesize of the array is part of the hashfunction Changing the widthchanges the result of the functionand invalidates all existing entries inthe hash table

Adjusting the lock table toimprove performance

The size of the hash table is set inthe Firebird configuration file Youmust shut down all activity on alldatabases that share the hash tablendash normally all databases on themachine ndash before changes takeeffect The Classic architecture usesthe lock table more heavily thanSuperServer If you choose theClassic architecture you shouldcheck the load on the hash tableperiodically and increase the num-

ber of hash slots if the load is highThe symptom of an overloadedhash table is sluggish performanceunder load

The tool for checking the lock tableis fb_lock_print which is a com-mand line utility in the bin directoryof the Firebird installation tree Thefull lock print describes the entirestate of the lock table and is of limit-ed interest When your system isunder load and behaving badlyinvoke the utility with no options orswitches directing the output to afile Open the file with an editorYoull see output that starts some-thing like this

LOCK_HEADER BLOCK Version114 Active owner0 Length262144 Used85740 Semmask0x0 Flags 0x0001 Enqs 18512 Converts 490 Rejects0 Blocks 0 Deadlock scans0 Deadlocks0 Scan interval10 Acquires 21048 Acquire blocks0 Spin count0 Mutex wait103 Hash slots101 Hash lengths (minavgmax)3 15 30

hellipThe seventh and eighth lines suggestthat the hash table is too small andthat it is affecting system perform-ance In the example these valuesindicate a problem

Mutex wait 103 Hash slots 101 Hash lengths (minavgmax)3 15 30 In the Classic architecture eachprocess makes its own changes tothe lock table Only one process is

allowed to update the lock table atany instant When updating the locktable a process holds the tablersquosmutex A non-zero mutex wait indi-cates that processes are blocked bythe mutex and forced to wait foraccess to the lock table In turn thatindicates a performance probleminside the lock table typicallybecause looking up a lock is slow

If the hash lengths are more thanmin 5 avg 10 or max 30 youneed to increase the number ofhash slots The hash function used inFirebird is quick but not terribly effi-cient It works best if the number ofhash slots is prime

Change this line in the configurationfileLockHashSlots = 101

Uncomment the line by removing

the leading Choose a value thatis a prime number less than 2048

LockHashSlots = 499

The change will not take effect untilall connections to all databases onthe server machine shut down

If you increase the number of hashslots you should also increase thelock table size The second line ofthe lock print

Version114 Active owner 0Length 262144 Used 85740

tells you how close you are to run-ning out of space in the lock tableThe Version and Active owner areuninteresting The length is the max-imum size of the lock table Used isthe amount of space currently allo-cated for the various block typesand hash table If the amount usedis anywhere near the total lengthuncomment this parameter in theconfiguration file by removing theleading and increase the value

LockMemSize = 262144

to this

LockMemSize = 1048576

The value is bytes The default locktable is about a quarter of amegabyte which is insignificant onmodern computers Changing thelock table size will not take effectuntil all connections to all databaseson the server machine shut down

PHP Server

One of the more interest-ing recent developmentsin information technolo-gy has been the rise ofbrowser based applica-tions often referred toby the acronym LAMP

One key hurdle forbroad use of the LAMPtechnology for mid-mar-ket solutions was that itwas never easy to con-figure and manage

PHPServer changes thatit can be installed withjust four clicks of themouse and support forFirebird is compiled in

PHPServer shares all thequalities of Firebird it isa capable compacteasy to install and easyto manage solution

PHPServer is a freedownload

Read more at

wwwfyracleorgphpserverhtml

News amp Events

Server internals

11

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Inside BLOBs

How the serverworks with BLOBs

The BLOB data type is intended forstoring data of variable size Fieldsof BLOB type allow for storage ofdata that cannot be placed in fieldsof other types - for example pic-tures audio files video fragmentsetc

From the point view of the databaseapplication developer using BLOBfields is as transparent as it is forother field types (see chapter ldquoDatatypesrdquo for details) However thereis a significant difference betweenthe internal implementation mecha-nism for BLOBs and that for otherdata

Unlike the mechanism used for han-dling other types of fields the data-base engine uses a special mecha-nism to work with BLOB fields Thismechanism is transparently inte-grated with other record handlingat the application level and at thesame time has its own means ofpage organization Lets consider indetail how the BLOB-handlingmechanism works

Inside BLOBsThis is an excerpt from the book ldquo1000 InterBase amp Firebird Tips amp Tricksrdquoby Alexey Kovyazin and Dmitri Kouzmenko which will be published in 2006

Initially the basic record data onthe data page includes a referenceto a ldquoBLOB recordrdquo for each non-null BLOB field ie to record-likestructure or quasi-record that actu-

ally contains the BLOB dataDepending on the size of the BLOBthis BLOB-record will be one ofthree types

The first type is the simplest If thesize of BLOB-field data is less thanthe free space on the data page it isplaced on the data page as a sepa-rate record of BLOB type

Author Dmitri Kouzmenkokdvib-aidcom

Author Alexey Kovyazinakib-aidcom

Server internals

12

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Inside BLOBs

The second type is used when thesize of BLOB is greater than thefree space on the page In thiscase references to pages contain-ing the actual BLOB data arestored in a quasi-record Thus atwo-level structure of BLOB-fielddata is used

If the size of BLOB-field contents isvery large a three-level structureis used ndash a quasi-record stores ref-erences to BLOB pointer pageswhich contain references to theactual BLOB data

The whole structure of BLOB stor-age (except for the quasi-recordof course) is implemented by onepage type ndash the BLOB page typeDifferent types of BLOB-pages dif-fer from each other in the presenceof a flag (value 0 or 1) defininghow the server should interpret thegiven page

BLOB Page

The blob page consists of the fol-lowing parts

The special header contains thefollowing information

bullThe number of the first blob pagein this blob It is used to check thatpages belong to one blob

bullA sequence number This isimportant in checking the integrityof a BLOB For a BLOB pointerpage it is equal to zero

bullThe length of data on a page Asa page may or may not be filled tothe full extent the length of actualdata is indicated in the header

Maximum BLOB size

As the internal structure for storingBLOB data can have only 3 levelsof organization and the size ofdata page is also limited it is pos-sible to calculate the maximum sizeof a BLOB

However this is a theoretical limit(if you want you can calculate it)but in practice the limit will bemuch lower The reason for thislower limit is that the length ofBLOB-field data is determined by avariable of ULONG type ie itsmaximal size will be equal to 4gigabytes

Moreover in reality this practicallimit is reduced if a UDF is to beused for BLOB processing Aninternal UDF implementationassumes that the maximum BLOB

size will be 2 gigabytes So if youplan to have very large BLOBfields in your database you shouldexperiment with storing data of alarge size beforehand

The segment size mystery

Developers of database applica-tions often ask what the SegmentSize parameter in the definition ofa BLOB is why we need it andwhether or not we should set itwhen creating Blob-fields

In reality there is no need to setthis parameter Actually it is a bitof a relic used by the GPRE utilitywhen pre-processing EmbeddedSQL When working with BLOBsGPRE declares a buffer of speci-fied size based on the segmentsize Setting the segment size hasno influence over the allocationand the size of segments whenstoring the BLOB on disk It alsohas no influence on performanceTherefore the segment size can besafely set to any value but it is setto 80 bytes by default

Information for those who want toknow everything the number 80was chosen because 80 symbolscould be allocated in alphanumer-ic terminals

TestBed

13

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

In issue 1 we published DmitriYemanovrsquos article about the inter-nals of savepoints While that articlewas still on the desk Borlandannounced the release of InterBase751 introducing amongst otherthings a NO SAVEPOINT optionfor transaction management Is thisan important improvement for Inter-Base We decided to give thisimplementation a close look and testit some to discover what it is allabout

Testing NO SAVEPOINT

In order to analyze the problem thatthe new transaction option wasintended to address and to assessits real value we performed severalvery simple SQL tests The tests areall 100 reproducible so you willbe able to verify our results easily

Database for testing

The test database file was created inInterBase 751 page size = 4096character encoding is NONE Itcontains two tables one stored pro-cedure and three generators

For the test we will use only one

Testing the NO SAVEPOINTfeature in InterBase 751 Author Alexey Kovyazin

akib-aidcom

Author Vlad Horsunhvladuserssourceforgenet

table with the following structure

CREATE TABLE TEST (ID NUMERIC(182)NAME VARCHAR(120)DESCRIPTION VARCHAR(250)CNT INTEGERQRT DOUBLE PRECISIONTS_CHANGE TIMESTAMPTS_CREATE TIMESTAMPNOTES BLOB

)This table contains 100000 records which will be updated during the testThe stored procedure and generators are used to fill the table with test dataYou can increase the quantity of records in the test table by calling thestored procedure to insert them

SELECT FROM INSERTRECS(1000)

The second table TEST2DROP has the same structure as the first and is filledwith the same records as TESTINSERT INTO TEST2DROP SELECT FROM TESTAs you will see the second table will be dropped immediately after connectWe are just using it as a way to increase database size cheaply the pagesoccupied by the TEST2DROP table will be released for reuse after we dropthe table With this trick we avoid the impact of database file growth on thetest results

Setting test environment

All that is needed to perform this test is the trial installation package of Inter-Base 751 the test database and an SQL script

Download the InterBase 751 trialversion from wwwborlandcomThe installation process is obviousand well-described in the InterBasedocumentation

You can download a backup of thetest database ready to use fromhttpwwwibdevelopercomissue2testspbackupzip (~4 Mb) oralternatively an SQL script for cre-ating it fromhttpwwwibdevelopercomissue2testspdatabasescriptzip (~1Kb)

If you download the databasebackup the test tables are alreadypopulated with records and youcan proceed straight to the sectionldquoPreparing to testrdquo below

If you choose instead to use theSQL script you will create data-base yourself Make sure youinsert 100000 records into tableTEST using the INSERTRECS storedprocedure and then copy all ofthem to TEST2DROP three or fourtimes

After that perform a backup of thisdatabase and you will be on thesame position as if you had down-

TestBed

14

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

loaded ldquoready for userdquo backup

Hardware is not a material issue for these tests since we are only comparing performance with and without the NOSAVEPOINT option Our test platform was a modest computer with Pentium-4 2 GHz with 512 RAM and an 80GBSamsung HDD

Preparing to test

A separate copy of the test database is used for each test case in order to eliminate any interference between state-ments We create four fresh copies of the database for this purpose Supposing all files are in a directory calledCTEST simply create the four test databases from your test backup file

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp1ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp2ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp3ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp4ib

SQL test scripts

The first script tests a regular one-pass update without the NO SAVEPOINT option

For convenience the important commands are clarified with comments

connect CTESTtestsp1ib USER SYSDBA Password masterkey Connect drop table TEST2DROP Drop table TEST2DROP to free database pagescommitselect count() from test Walk down all records in TEST to place them into cachecommitset time on enable time statistics for performed statementsset stat on enables writesfetchesmemory statisticscommit perform bulk update of all records in TEST table update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommit

quit The second script tests performance for the same UPDATE with the NO SAVEPOINT option

TestBed

15

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

connect Ctestsp2ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitSET TRANSACTION NO SAVEPOINT enable NO SAVEPOINTupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit

Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script both scripts arethe same simply testing the behavior of engine in case of the single bulk UPDATE

To test sequential UPDATEs we added several UPDATE statements--we recommend using five The script for testing without NO SAVEPOINT would be

connect Etestsp3ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit

Fyracle 089

Janus has released anew version of Oracle-mode Firebird FyracleFyracle is a specialized

build of Firebird 15it adds temporary

tables hierarchicalqueries and

a PLSQL engine

Version 089 addssupport for storedprocedures written

in Java

Fyracle dramaticallyreduces the cost of port-ing Oracle-based appli-

cations to Firebird

Common usageincludes the

Compiere open sourceERP package

mid-market deploymentsof Developer

2000 applicationsand demo CDsof applications

without license trouble

Read more at

wwwjanus-soft-warecom

News amp Events

IBAnalyst 19

IBSurgeon has issued newversion of IBAnalyst

Now it can better analyzeInterBase or Firebird data-base statistics with usingmetadata information (in thiscase connection to databaseis required)

IBAnalyst is a tool that assistsa user to analyze in detailFirebird or InterBase data-base statistics and identifypossible problems with data-base performance mainte-nance and how an applica-tion interacts with the data-base

It graphically displays data-base statistics and can thenautomatically make intelli-gent suggestions aboutimproving database per-formance and databasemaintenance

wwwibsurgeoncomnewshtml

News amp Events

You can download all the scripts and the raw results of their execution from this locationhttpwwwibdevelopercomissue2testresultszip

TestBed

16

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

How to perform the test

The easiest way to perform the test is to use isqls INPUT command

Suppose you have the scripts located in ctestscripts

gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql

Test resultsThe single bulk UPDATE

First letrsquos perform the test where the single-pass bulk UPDATE is performed

This is an excerpt from the one-pass script with default transaction settings

SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980

SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3

Fast Report 319

The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains

Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors

TrialBuy Now

News amp Events

TECT Softwarepresents

New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet

SPGen Stored Procedure Gen-erator for Firebird and Inter-

Base creates a standard set ofstored procs for any table full

details can be found here

httpwwwtectsoftnetProductsFirebird

FIBSPGenaspx

And FBMail Firebird EMail(FBMail) is a cross platform PHP

utility which easily allows thesending of emails direct from

within a database

This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected

to the internet

Full detailscan be found here

httpwwwtectsoftnetProductsFirebird

FBMailaspx

News amp Events

This is an excerpt from the one-pass script with NO SAVEPOINT enabled

CopyTiger 100

CopyTigera new productfrom Microtecis an advanced

databasereplication amp

synchronisationtool for

InterBaseFirebird

powered bytheir CopyCatcomponent set(see the articleabout CopyCat

in this issue)

For moreinformation about

CopyTiger seewwwmicrotecfr

copycatct

Downloadan evaluation

version

TestBed

17

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154

SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3

Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not

Sequential bulk UPDATE statements

With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large

Table 1

Test results for 5 sequentalUPDATEs

News amp Events

SQLHammer 15Enterprise Edition

The interestingdeveloper toolSQLHammer

now hasan Enterprise Edition

SQLHammer isa high-performance tool

for rapiddatabase development

and administration

It includes a commonintegrated development

environmentregistered custom

componentsbased on the

Borland Package Librarymechanism

and an Open Tools APIwritten in

DelphiObject Pascal

wwwmetadataforgecom

News amp Events

TestBed

18

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

For convenience the results are tabulated above

The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT

Figure 1

Time to perform UPDATEs with and without NO SAVEPOINT

Table 1

Test results for 5 sequental UPDATEs

In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used

The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters

Figure 2

Memory usage while performing UPDATEs with and without NO SAVEPOINT

With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below

Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains

A few words about versions

You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The

TestBed

19

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

old version does not disappearimmediately but is retained as abackversion

In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk

The UNDO log concept

You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested

A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point

So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources

The NO SAVEPOINToption

The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)

Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes

The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management

First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see

ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)

Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions

Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes

Initial situation

Consider the implementation details of the undo-log Figure 3 shows the initial situation

Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record

Figure 3

Initial situation before any UPDATE - only the one record version exists Undo log is empty

The first UPDATE

The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager

It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk

TestBed

20

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 4

UPDATE1 create small delta version on disk and put record number into UNDO log

This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled

The second UPDATE

When the second UPDATE statement is performed on the same set of records we have adifferent situation

Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used

To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory

The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required

Figure 5

The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log

This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters

When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first

The third UPDATE

The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further

TestBed

21

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 6

The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one

Figure 7

If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint

Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)

Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log

A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know

The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE

Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit

BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log

For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7

In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope

SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping

Development area

22

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article

The sources in the bibliography arelisted in the order of their appear-ance in my mind

The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability

of all the methods taken from thearticles and of course I have testedall my own methods

Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise

I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc

And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle

The problem state-ment

What is the problem

Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common

1Everyone got used to them andfelt comfortable with them

2They are quite fast (if you usethem according to certain knownstandards)

3They use SQL which is an easycomprehensible and time-proveddata manipulation method

4They are based upon a strongmathematical theory

5They are convenient for applica-

tion development - if you developyour applications just like 20-30years ago

As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and

Author Vladimir Kotlyarevskyvladcontekru

Development area

23

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip

As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored

What do we need

The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing

RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task

The OID

All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it

First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have

any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database

OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex

Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)

Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world

And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones

Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles

The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers

ClassId

All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is

Development area

24

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain

Name Description cre-ation_date change_dateowner

Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect

links is often a very complicatedtask to put it mildly

Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later

The OBJECTS Table

Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure

So let us design this table

Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)

The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)

Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method

Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage

There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student

create domain TOID as integer not nullcreate table OBJECTS(

OID TOID primary keyClassId TOIDName varchar(32)

A simple plain dictionary can be retrieved from the OBJECTStable by the following query

select OID Name Description from OBJECTS where ClassId = LookupId

Development area

25

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description

from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)

Later I will demonstrate how to add alittle more intelligence to such simpledictionaries

Storing of more complexobjects

It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping

Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders

create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))

which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to

ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query

select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id

As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did

If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M

create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))

One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present

Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database

Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this

create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))

connected 1M with OBJECTS by OID There is also a table

create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))

which describes type metadata ndash an attribute set (their names and types) for eachobject type

All attributes for particular object where the OID is known are retrieved by the query

select attribute_id value from attributeswhere OID = oid

or with names of attributes

select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id

Development area

26

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries

select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id

Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query

Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the

database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base

The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1

Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage

Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB

It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-

utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation

To be continuedhellip

FastReport 3 - new generation of the reporting tools

Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix

reports support support most popular DB-engines

Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags

(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)

Access to DB fields styles text flow URLs Anchors

httpwwwfast-reportcom

Development area

27

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

Replicating and synchronizingInterBaseFirebirddatabases using CopyCat

AuthorJonathan Neve

jonathanmicrotecfr

wwwmicrotecfr

PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication

Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases

Data logging

Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)

Multi-node replication

Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after

In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it

replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)

When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it

Two-way replication

One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed

The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change

Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself

Primary key synchronization

One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-

Development area

28

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values

Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server

When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique

Conflict management

Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-

sion of the record this is a conflict

CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved

This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently

Difficult database structures

There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node

Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged

To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an

Development area

29

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable

PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms

1 As a set of Delphi C++ Builder components

2 As a standalone Win32 replication tool

We will now present each one of these products

1 CopyCat Delphi C++ Builder Component Set

CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions

Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server

Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table

Below is a concise guide for getting started with the CopyCat component suite

Download and install the evaluation components fromhttpwwwmicrotecfrcopycat

Prepare databases

1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components

2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)

3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo

4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base

5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo

Development area

30

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data

6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority

7) Apply all the generated SQL to all databases that should be replicated

8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)

Replicate

1 In Delphi open the ldquoReplicatorrdquo example project

2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property

3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases

4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)

5 Compile and run the example

6 Press the ldquoReplicate nowrdquo button

2 CopyTiger the CopyCat Win32standalone replication tool

For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER

Features include

bull Easy to use installer

bull Independent server Administrator tool (CTAdmin)

bull Configuration wizard for setting up links to master slave databases

bull Robust replication engine based on Microtec CopyCat

bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections

bull Simple amp intuitive control panel

bullAutomatic email notification on certain events (conflicts PK violations etc)

Visit the CopyTiger homepage to download a time-limited trial version

httpwwwmicrotecfrcopycatct

SUMMARY

In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit

1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications

2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure

CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool

You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr

As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold

Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code

ibd-852-150

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 5: The InterBase and Firebird Developer Magazine, Issue 2, 2005

Cover story

6

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

Firebird WorldwideConference 2005

The Firebird conferencewill take place

at the Hotel Olsankain Prague

Czech Republicfrom the evening

of Sunday the 13thof November

(opening session)until the evening

of Tuesday the 15thof November

(closing session)

News amp Events

One of the best-known facts aboutFirebird is that it uses multi-versionconcurrency control instead ofrecord locking For databases thatmanage concurrency thoughrecord locks understanding mecha-nisms of record locking is critical todesigning a high performance highconcurrency application In gener-al record locking is irrelevant toFirebird However Firebird useslocks to maintain internal consisten-cy and for interprocess coordina-tion Understanding how Firebirddoes (and does not) use locks helpswith system tuning and anticipatinghow Firebird will behave underload

Irsquom going to start by describinglocking abstractly then Firebirdlocking more specifically then thecontrols you have over the Firebirdlock table and how they can affectyour database performance So ifyou just want to learn about tuningskip to the end of the article

Concurrency control

Concurrency control in databasesystems has three basic functionspreventing concurrent transactionsfrom overwriting each othersrsquochanges preventing readers fromseeing uncommitted changes andgiving a consistent view of the data-

Locking Firebird and the Lock Tablebase to running transactions Thosethree functions can be implementedin various ways The simplest is toserialize transactions ndash allowingeach transaction exclusive access tothe database until it finishes Thatsolution is neither interesting nordesirable

A common solution is to allow eachtransaction to lock the data it useskeeping other transactions fromreading data it changes and fromchanging data it reads Moderndatabases generally lock recordsFirebird provides concurrency con-trol without record locks by keepingmultiple versions of records eachmarked with the identifier of thetransaction that created it Concur-rent transactions donrsquot overwriteeach otherrsquos changes because thesystem will not allow a transactionto change a record if the mostrecent version was created by aconcurrent transaction Readersnever see uncommitted databecause the system will not returnrecord versions created by concur-rent transactions Readers see aconsistent version of the databasebecause the system returns only thedata committed when they start ndashallowing other transactions to cre-ate newer versions Readers donrsquotblock writers

A lock sounds like a very solidobject but in database systems alock anything but solid ldquoLockingrdquo isa shorthand description of a proto-col for reserving resources Data-bases use locks to maintain consis-tency while allowing concurrentindependent transactions to updatedistinct parts of the database Eachtransaction reserves the resources ndashtables records data pages ndash that itneeds to do its work Typically thereservation is made in memory tocontrol a resource on disk so thecost of reserving the resource is notsignificant compared with the costof reading or changing theresource

Locking example

ldquoResourcerdquo is a very abstract termLets start by talking about lockingtables Firebird does lock tables butnormally it locks them only to pre-vent catastrophes like having onetransaction drop a table that anoth-er transaction is using

Before a Firebird transaction canaccess a table it must get a lock onthe table The lock prevents othertransactions from dropping thetable while it is in use When atransaction gets a lock on a tableFirebird makes an entry in its tableof locks indicating the identity of

the transaction that has the lock theidentity of the table being lockedand a function to call if anothertransaction wants an incompatiblelock on the table The normal tablelock is shared ndash other transactionscan lock the same table in the sameway

Before a transaction can delete atable it must get an exclusive lockon the table An exclusive lock isincompatible with other locks If anytransaction has a lock on the tablethe request for an exclusive lock isdenied the drop table statementfails Returning an immediate erroris one way of dealing with conflict-ing lock requests The other is to putthe conflicting request on a list ofunfulfilled requests and let it wait forthe resource to become available

In its simplest form that is how lock-ing works All transactions followthe formal protocol of requesting alock on a resource before using itFirebird maintains a list of lockedresources a list of requests for lockson resources ndash satisfied or waitingndash and a list of the owners of lockrequests When a transactionrequests a lock that is incompatiblewith existing locks on a resource

Author Ann W Harrisonaharrisonibphoenixcom

Requestfor Sponsors

How to Register

ConferenceTimetable

ConferencePapers

and Speaker

Cover story

7

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

Firebird either denies the newrequest or puts it on a list to waituntil the resource is available Inter-nal lock requests specify whetherthey wait or receive an immediateerror on a case-by-case basisWhen a transaction starts it speci-fies whether it will wait for locks thatit acquires on tables etc

Lock modes

For concurrency and read commit-ted transactions Firebird lockstables for shared read or sharedwrite Either mode says ldquoIrsquom usingthis table but you are free to use ittoordquo Consistency mode transac-tions follow different rules Theylock tables for protected read orprotected write Those modes sayldquoIrsquom using the table and no one elseis allowed to change it until Irsquomdonerdquo Protected read is compati-ble with shared read and other pro-tected read transactions Protectedwrite is only compatible with shareread

The important concept about lockmodes is that locks are more subtlethan mutexes ndash locks allowresource sharing as well as protect-ing resources from incompatibleuse

Two-phase locking vs tran-sient locking

The table locks that wersquove been

describing follow a protocol knownas two-phase locking which is typi-cal of locks taken by transactions indatabase systems Databases thatuse record locking for consistencycontrol always use two-phaserecord locks In two-phase lockinga transaction acquires locks as itproceeds and holds the locks until itends Once it releases any lock itcan no longer acquire another Thetwo phases are lock acquisition andlock release They cannot overlap

When a Firebird transaction readsa table it holds a lock on that tableuntil it ends When a concurrencytransaction has acquired a sharedwrite lock to update a table no con-sistency mode transaction will beable to get a protected lock on thattable until the transaction with theshared write lock ends and releasesits locks Table locking in Firebird istwo-phase locking

Locks can also be transient takenand released as necessary duringthe running of a transaction Fire-bird uses transient locking exten-sively to manage physical access tothe database

Firebird page locks

One major difference between Fire-bird and most other databases isFirebirdrsquos Classic mode In Classicmode many separate processesshare write access to a single data-

base file Most databases have asingle server process like Super-Server that has exclusive access tothe database and coordinatesphysical access to the file within

itself Firebird coordinates physicalaccess to the database throughlocks on database pages

In general database theory a trans-

Cover story

8

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

action is a set of steps that transformthe database from on consistentstate to another During that trans-formation the resources held by thetransaction must be protected fromincompatible changes by othertransactions Two-phase locks arethat protection

In Firebird internally each time atransaction changes a page itchanges that page ndash and the physi-cal structure of the database as awhole ndash from one consistent state toanother Before a transaction readsor writes a database page it locksthe page When it finishes readingor writing it can release the lockwithout compromising the physicalconsistency of the database fileFirebird page level locking is tran-sient Transactions acquire andrelease page locks throughout theirexistence However to preventdeadlocks transactions must beable to release all the page locks itholds before acquiring a lock on anew page

The Firebird lock table

When all access to a database isdone in a single process ndash as is thecase with most database systems ndashlocks are held in the serverrsquos memo-ry and the lock table is largely invis-ible The server process extends orremaps the lock information asrequired Firebird however man-ages its locks in a shared memory

section In SuperServer only theserver uses that share memoryarea In Classic every databaseconnection maps the shared memo-ry and every connection can readand change the contents of thememory

The lock table is a separate piece ofshare memory In SuperServer thelock table is mapped into the serverprocess In Classic each processmaps the lock table All databaseson a server computer share thesame lock table except those run-ning with the embedded server

The Firebird lock manager

We often talk about the FirebirdLock Manager as if it were a sepa-rate process but it isnrsquot The lockmanagement code is part of theengine just like the optimizer pars-er and expression evaluator Thereis a formal interface to the lockmanagement code which is similarto the formal interface to the distrib-uted lock manager that was part ofVAXVMS and one of the inter-faces to the Distributed Lock Man-ager from IBM

The lock manager is code in theengine In Classic each process hasits own lock manager When aClassic process requests or releasesa lock its lock management codeacquires a mutex on the sharedmemory section and changes the

state of the lock table to reflect itsrequest

Conflicting lock requests

When a request is made for a lock

on a resource that is already lockedin an incompatible mode one oftwo things happens Either therequesting transaction gets animmediate error or the request is

Registering for the Conference

Call for papers

Sponsoring the Firebird Conference

httpfirebird-conferencecom

Cover story

9

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

put on a list of waiting requests andthe transactions that hold conflictinglocks on the resource are notified ofthe conflicting request Part of everylock request is the address of a rou-tine to call when the lock interfereswith another request for a lock onthe same object Depending on theresource the routine may cause thelock to be released or require thenew request to wait

Transient locks like the locks ondatabase pages are releasedimmediately When a transactionrequests a page lock and that pageis already locked in an incompati-ble mode the transaction or trans-actions that hold the lock are noti-fied and must complete what theyare doing and release their locksimmediately Two-phase locks liketable locks are held until the trans-action that owns the lock completesWhen the conflicting lock isreleased and the new lock is grant-ed then transaction that had beenwaiting can proceed

Locks as interprocess com-munication

Lock management requires a highspeed completely reliable commu-nication mechanism between trans-actions including transactions indifferent processes The actualmechanism varies from platform toplatform but for the database to

work the mechanism must be fastand reliable A fast reliable inter-process communication mechanismcan be ndash and is ndash useful for a num-ber of purposes outside the areathatrsquos normally considered data-base locking

For example Firebird uses the locktable to notify running transactionsof the existence of a new index on atable Thatrsquos important since assoon as an index becomes activeevery transaction must help main-tain it ndash making new entries when itstores or modifies data removingentries when it modifies or deletesdata

When a transaction first referencesa table it gets a lock on the exis-tence of indexes for the tableWhen another transaction wants tocreate a new index on that table itmust get an exclusive lock on theexistence of indexes for the tableIts request conflicts with existinglocks and the owners of those locksare notified of the conflict Whenthose transactions are in a statewhere they can accept a new indexthey release their locks and imme-diately request new shared locks onthe existence of indexes for thetable The transaction that wants tocreate the index gets its exclusivelock creates the index and com-mits releasing its exclusive lock onthe existence of indexing As othertransactions get their new locks

they check the index definitions forthe table find the new index defini-tion and begin maintaining theindex

Firebird locking summary

Although Firebird does not lockrecords it uses locks extensively toisolate the effects of concurrenttransactions Locking and the locktable are more visible in Firebirdthan in other databases because thelock table is a central communica-tion channel between the separateprocesses that access the databasein Classic mode In addition to con-trolling access to database objectslike tables and data pages the Fire-bird lock manager allows differenttransactions and processes to notifyeach other of changes to the state ofthe database new indexes etc

Lock table specifics

The Firebird lock table is an in-mem-ory data area that contains of fourprimary types of blocks The lockheader block describes the locktable as a whole and containspointers to lists of other blocks andfree blocks Owner blocks describethe owners of lock requests ndash gen-erally lock owners are transactionsconnections or the SuperServerRequest blocks describe the rela-tionship between an owner and alockable resource ndash whether therequest is granted or pending the

mode of the request etc Lockblocks describe the resources beinglocked

To request a lock the owner findsthe lock block follows the linked listof requests for that lock and addsits request at the end If other own-ers must be notified of a conflictingrequest they are located throughthe request blocks already in the listEach owner block also has a list ofits own requests The performancecritical part of locking is finding lockblocks For that purpose the locktable includes a hash table foraccess to lock blocks based on thename of the resource being locked

A quick refresher on hashing

A hash table is an array with linkedlists of duplicates and collisionshanging from it The names of lock-able objects are transformed by afunction called the hash functioninto the offset of one of the elementsof the array When two namestransform to the same offset theresult is a collision When two lockshave the same name they areduplicates and always collide

In the Firebird lock table the arrayof the hash table contains theaddress of a hash block Hashblocks contain the original name acollision pointer a duplicate point-er and the address of the lock blockthat corresponds to the name Thecollision pointer contains the

Cover story

10

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

address of a hash block whosename hashed to the same valueThe duplicate pointer contains theaddress of a hash block that hasexactly the same name

A hash table is fast when there arerelatively few collisions With nocollisions finding a lock blockinvolved hashing the name index-ing into the array and reading thepointer from the first hash blockEach collision adds another pointerto follow and name to check Theratio of the size of the array to thenumber of locks determines thenumber of collisions Unfortunatelythe width of the array cannot beadjusted dynamically because thesize of the array is part of the hashfunction Changing the widthchanges the result of the functionand invalidates all existing entries inthe hash table

Adjusting the lock table toimprove performance

The size of the hash table is set inthe Firebird configuration file Youmust shut down all activity on alldatabases that share the hash tablendash normally all databases on themachine ndash before changes takeeffect The Classic architecture usesthe lock table more heavily thanSuperServer If you choose theClassic architecture you shouldcheck the load on the hash tableperiodically and increase the num-

ber of hash slots if the load is highThe symptom of an overloadedhash table is sluggish performanceunder load

The tool for checking the lock tableis fb_lock_print which is a com-mand line utility in the bin directoryof the Firebird installation tree Thefull lock print describes the entirestate of the lock table and is of limit-ed interest When your system isunder load and behaving badlyinvoke the utility with no options orswitches directing the output to afile Open the file with an editorYoull see output that starts some-thing like this

LOCK_HEADER BLOCK Version114 Active owner0 Length262144 Used85740 Semmask0x0 Flags 0x0001 Enqs 18512 Converts 490 Rejects0 Blocks 0 Deadlock scans0 Deadlocks0 Scan interval10 Acquires 21048 Acquire blocks0 Spin count0 Mutex wait103 Hash slots101 Hash lengths (minavgmax)3 15 30

hellipThe seventh and eighth lines suggestthat the hash table is too small andthat it is affecting system perform-ance In the example these valuesindicate a problem

Mutex wait 103 Hash slots 101 Hash lengths (minavgmax)3 15 30 In the Classic architecture eachprocess makes its own changes tothe lock table Only one process is

allowed to update the lock table atany instant When updating the locktable a process holds the tablersquosmutex A non-zero mutex wait indi-cates that processes are blocked bythe mutex and forced to wait foraccess to the lock table In turn thatindicates a performance probleminside the lock table typicallybecause looking up a lock is slow

If the hash lengths are more thanmin 5 avg 10 or max 30 youneed to increase the number ofhash slots The hash function used inFirebird is quick but not terribly effi-cient It works best if the number ofhash slots is prime

Change this line in the configurationfileLockHashSlots = 101

Uncomment the line by removing

the leading Choose a value thatis a prime number less than 2048

LockHashSlots = 499

The change will not take effect untilall connections to all databases onthe server machine shut down

If you increase the number of hashslots you should also increase thelock table size The second line ofthe lock print

Version114 Active owner 0Length 262144 Used 85740

tells you how close you are to run-ning out of space in the lock tableThe Version and Active owner areuninteresting The length is the max-imum size of the lock table Used isthe amount of space currently allo-cated for the various block typesand hash table If the amount usedis anywhere near the total lengthuncomment this parameter in theconfiguration file by removing theleading and increase the value

LockMemSize = 262144

to this

LockMemSize = 1048576

The value is bytes The default locktable is about a quarter of amegabyte which is insignificant onmodern computers Changing thelock table size will not take effectuntil all connections to all databaseson the server machine shut down

PHP Server

One of the more interest-ing recent developmentsin information technolo-gy has been the rise ofbrowser based applica-tions often referred toby the acronym LAMP

One key hurdle forbroad use of the LAMPtechnology for mid-mar-ket solutions was that itwas never easy to con-figure and manage

PHPServer changes thatit can be installed withjust four clicks of themouse and support forFirebird is compiled in

PHPServer shares all thequalities of Firebird it isa capable compacteasy to install and easyto manage solution

PHPServer is a freedownload

Read more at

wwwfyracleorgphpserverhtml

News amp Events

Server internals

11

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Inside BLOBs

How the serverworks with BLOBs

The BLOB data type is intended forstoring data of variable size Fieldsof BLOB type allow for storage ofdata that cannot be placed in fieldsof other types - for example pic-tures audio files video fragmentsetc

From the point view of the databaseapplication developer using BLOBfields is as transparent as it is forother field types (see chapter ldquoDatatypesrdquo for details) However thereis a significant difference betweenthe internal implementation mecha-nism for BLOBs and that for otherdata

Unlike the mechanism used for han-dling other types of fields the data-base engine uses a special mecha-nism to work with BLOB fields Thismechanism is transparently inte-grated with other record handlingat the application level and at thesame time has its own means ofpage organization Lets consider indetail how the BLOB-handlingmechanism works

Inside BLOBsThis is an excerpt from the book ldquo1000 InterBase amp Firebird Tips amp Tricksrdquoby Alexey Kovyazin and Dmitri Kouzmenko which will be published in 2006

Initially the basic record data onthe data page includes a referenceto a ldquoBLOB recordrdquo for each non-null BLOB field ie to record-likestructure or quasi-record that actu-

ally contains the BLOB dataDepending on the size of the BLOBthis BLOB-record will be one ofthree types

The first type is the simplest If thesize of BLOB-field data is less thanthe free space on the data page it isplaced on the data page as a sepa-rate record of BLOB type

Author Dmitri Kouzmenkokdvib-aidcom

Author Alexey Kovyazinakib-aidcom

Server internals

12

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Inside BLOBs

The second type is used when thesize of BLOB is greater than thefree space on the page In thiscase references to pages contain-ing the actual BLOB data arestored in a quasi-record Thus atwo-level structure of BLOB-fielddata is used

If the size of BLOB-field contents isvery large a three-level structureis used ndash a quasi-record stores ref-erences to BLOB pointer pageswhich contain references to theactual BLOB data

The whole structure of BLOB stor-age (except for the quasi-recordof course) is implemented by onepage type ndash the BLOB page typeDifferent types of BLOB-pages dif-fer from each other in the presenceof a flag (value 0 or 1) defininghow the server should interpret thegiven page

BLOB Page

The blob page consists of the fol-lowing parts

The special header contains thefollowing information

bullThe number of the first blob pagein this blob It is used to check thatpages belong to one blob

bullA sequence number This isimportant in checking the integrityof a BLOB For a BLOB pointerpage it is equal to zero

bullThe length of data on a page Asa page may or may not be filled tothe full extent the length of actualdata is indicated in the header

Maximum BLOB size

As the internal structure for storingBLOB data can have only 3 levelsof organization and the size ofdata page is also limited it is pos-sible to calculate the maximum sizeof a BLOB

However this is a theoretical limit(if you want you can calculate it)but in practice the limit will bemuch lower The reason for thislower limit is that the length ofBLOB-field data is determined by avariable of ULONG type ie itsmaximal size will be equal to 4gigabytes

Moreover in reality this practicallimit is reduced if a UDF is to beused for BLOB processing Aninternal UDF implementationassumes that the maximum BLOB

size will be 2 gigabytes So if youplan to have very large BLOBfields in your database you shouldexperiment with storing data of alarge size beforehand

The segment size mystery

Developers of database applica-tions often ask what the SegmentSize parameter in the definition ofa BLOB is why we need it andwhether or not we should set itwhen creating Blob-fields

In reality there is no need to setthis parameter Actually it is a bitof a relic used by the GPRE utilitywhen pre-processing EmbeddedSQL When working with BLOBsGPRE declares a buffer of speci-fied size based on the segmentsize Setting the segment size hasno influence over the allocationand the size of segments whenstoring the BLOB on disk It alsohas no influence on performanceTherefore the segment size can besafely set to any value but it is setto 80 bytes by default

Information for those who want toknow everything the number 80was chosen because 80 symbolscould be allocated in alphanumer-ic terminals

TestBed

13

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

In issue 1 we published DmitriYemanovrsquos article about the inter-nals of savepoints While that articlewas still on the desk Borlandannounced the release of InterBase751 introducing amongst otherthings a NO SAVEPOINT optionfor transaction management Is thisan important improvement for Inter-Base We decided to give thisimplementation a close look and testit some to discover what it is allabout

Testing NO SAVEPOINT

In order to analyze the problem thatthe new transaction option wasintended to address and to assessits real value we performed severalvery simple SQL tests The tests areall 100 reproducible so you willbe able to verify our results easily

Database for testing

The test database file was created inInterBase 751 page size = 4096character encoding is NONE Itcontains two tables one stored pro-cedure and three generators

For the test we will use only one

Testing the NO SAVEPOINTfeature in InterBase 751 Author Alexey Kovyazin

akib-aidcom

Author Vlad Horsunhvladuserssourceforgenet

table with the following structure

CREATE TABLE TEST (ID NUMERIC(182)NAME VARCHAR(120)DESCRIPTION VARCHAR(250)CNT INTEGERQRT DOUBLE PRECISIONTS_CHANGE TIMESTAMPTS_CREATE TIMESTAMPNOTES BLOB

)This table contains 100000 records which will be updated during the testThe stored procedure and generators are used to fill the table with test dataYou can increase the quantity of records in the test table by calling thestored procedure to insert them

SELECT FROM INSERTRECS(1000)

The second table TEST2DROP has the same structure as the first and is filledwith the same records as TESTINSERT INTO TEST2DROP SELECT FROM TESTAs you will see the second table will be dropped immediately after connectWe are just using it as a way to increase database size cheaply the pagesoccupied by the TEST2DROP table will be released for reuse after we dropthe table With this trick we avoid the impact of database file growth on thetest results

Setting test environment

All that is needed to perform this test is the trial installation package of Inter-Base 751 the test database and an SQL script

Download the InterBase 751 trialversion from wwwborlandcomThe installation process is obviousand well-described in the InterBasedocumentation

You can download a backup of thetest database ready to use fromhttpwwwibdevelopercomissue2testspbackupzip (~4 Mb) oralternatively an SQL script for cre-ating it fromhttpwwwibdevelopercomissue2testspdatabasescriptzip (~1Kb)

If you download the databasebackup the test tables are alreadypopulated with records and youcan proceed straight to the sectionldquoPreparing to testrdquo below

If you choose instead to use theSQL script you will create data-base yourself Make sure youinsert 100000 records into tableTEST using the INSERTRECS storedprocedure and then copy all ofthem to TEST2DROP three or fourtimes

After that perform a backup of thisdatabase and you will be on thesame position as if you had down-

TestBed

14

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

loaded ldquoready for userdquo backup

Hardware is not a material issue for these tests since we are only comparing performance with and without the NOSAVEPOINT option Our test platform was a modest computer with Pentium-4 2 GHz with 512 RAM and an 80GBSamsung HDD

Preparing to test

A separate copy of the test database is used for each test case in order to eliminate any interference between state-ments We create four fresh copies of the database for this purpose Supposing all files are in a directory calledCTEST simply create the four test databases from your test backup file

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp1ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp2ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp3ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp4ib

SQL test scripts

The first script tests a regular one-pass update without the NO SAVEPOINT option

For convenience the important commands are clarified with comments

connect CTESTtestsp1ib USER SYSDBA Password masterkey Connect drop table TEST2DROP Drop table TEST2DROP to free database pagescommitselect count() from test Walk down all records in TEST to place them into cachecommitset time on enable time statistics for performed statementsset stat on enables writesfetchesmemory statisticscommit perform bulk update of all records in TEST table update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommit

quit The second script tests performance for the same UPDATE with the NO SAVEPOINT option

TestBed

15

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

connect Ctestsp2ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitSET TRANSACTION NO SAVEPOINT enable NO SAVEPOINTupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit

Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script both scripts arethe same simply testing the behavior of engine in case of the single bulk UPDATE

To test sequential UPDATEs we added several UPDATE statements--we recommend using five The script for testing without NO SAVEPOINT would be

connect Etestsp3ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit

Fyracle 089

Janus has released anew version of Oracle-mode Firebird FyracleFyracle is a specialized

build of Firebird 15it adds temporary

tables hierarchicalqueries and

a PLSQL engine

Version 089 addssupport for storedprocedures written

in Java

Fyracle dramaticallyreduces the cost of port-ing Oracle-based appli-

cations to Firebird

Common usageincludes the

Compiere open sourceERP package

mid-market deploymentsof Developer

2000 applicationsand demo CDsof applications

without license trouble

Read more at

wwwjanus-soft-warecom

News amp Events

IBAnalyst 19

IBSurgeon has issued newversion of IBAnalyst

Now it can better analyzeInterBase or Firebird data-base statistics with usingmetadata information (in thiscase connection to databaseis required)

IBAnalyst is a tool that assistsa user to analyze in detailFirebird or InterBase data-base statistics and identifypossible problems with data-base performance mainte-nance and how an applica-tion interacts with the data-base

It graphically displays data-base statistics and can thenautomatically make intelli-gent suggestions aboutimproving database per-formance and databasemaintenance

wwwibsurgeoncomnewshtml

News amp Events

You can download all the scripts and the raw results of their execution from this locationhttpwwwibdevelopercomissue2testresultszip

TestBed

16

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

How to perform the test

The easiest way to perform the test is to use isqls INPUT command

Suppose you have the scripts located in ctestscripts

gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql

Test resultsThe single bulk UPDATE

First letrsquos perform the test where the single-pass bulk UPDATE is performed

This is an excerpt from the one-pass script with default transaction settings

SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980

SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3

Fast Report 319

The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains

Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors

TrialBuy Now

News amp Events

TECT Softwarepresents

New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet

SPGen Stored Procedure Gen-erator for Firebird and Inter-

Base creates a standard set ofstored procs for any table full

details can be found here

httpwwwtectsoftnetProductsFirebird

FIBSPGenaspx

And FBMail Firebird EMail(FBMail) is a cross platform PHP

utility which easily allows thesending of emails direct from

within a database

This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected

to the internet

Full detailscan be found here

httpwwwtectsoftnetProductsFirebird

FBMailaspx

News amp Events

This is an excerpt from the one-pass script with NO SAVEPOINT enabled

CopyTiger 100

CopyTigera new productfrom Microtecis an advanced

databasereplication amp

synchronisationtool for

InterBaseFirebird

powered bytheir CopyCatcomponent set(see the articleabout CopyCat

in this issue)

For moreinformation about

CopyTiger seewwwmicrotecfr

copycatct

Downloadan evaluation

version

TestBed

17

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154

SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3

Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not

Sequential bulk UPDATE statements

With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large

Table 1

Test results for 5 sequentalUPDATEs

News amp Events

SQLHammer 15Enterprise Edition

The interestingdeveloper toolSQLHammer

now hasan Enterprise Edition

SQLHammer isa high-performance tool

for rapiddatabase development

and administration

It includes a commonintegrated development

environmentregistered custom

componentsbased on the

Borland Package Librarymechanism

and an Open Tools APIwritten in

DelphiObject Pascal

wwwmetadataforgecom

News amp Events

TestBed

18

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

For convenience the results are tabulated above

The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT

Figure 1

Time to perform UPDATEs with and without NO SAVEPOINT

Table 1

Test results for 5 sequental UPDATEs

In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used

The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters

Figure 2

Memory usage while performing UPDATEs with and without NO SAVEPOINT

With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below

Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains

A few words about versions

You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The

TestBed

19

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

old version does not disappearimmediately but is retained as abackversion

In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk

The UNDO log concept

You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested

A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point

So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources

The NO SAVEPOINToption

The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)

Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes

The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management

First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see

ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)

Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions

Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes

Initial situation

Consider the implementation details of the undo-log Figure 3 shows the initial situation

Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record

Figure 3

Initial situation before any UPDATE - only the one record version exists Undo log is empty

The first UPDATE

The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager

It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk

TestBed

20

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 4

UPDATE1 create small delta version on disk and put record number into UNDO log

This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled

The second UPDATE

When the second UPDATE statement is performed on the same set of records we have adifferent situation

Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used

To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory

The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required

Figure 5

The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log

This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters

When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first

The third UPDATE

The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further

TestBed

21

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 6

The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one

Figure 7

If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint

Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)

Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log

A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know

The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE

Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit

BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log

For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7

In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope

SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping

Development area

22

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article

The sources in the bibliography arelisted in the order of their appear-ance in my mind

The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability

of all the methods taken from thearticles and of course I have testedall my own methods

Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise

I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc

And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle

The problem state-ment

What is the problem

Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common

1Everyone got used to them andfelt comfortable with them

2They are quite fast (if you usethem according to certain knownstandards)

3They use SQL which is an easycomprehensible and time-proveddata manipulation method

4They are based upon a strongmathematical theory

5They are convenient for applica-

tion development - if you developyour applications just like 20-30years ago

As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and

Author Vladimir Kotlyarevskyvladcontekru

Development area

23

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip

As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored

What do we need

The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing

RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task

The OID

All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it

First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have

any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database

OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex

Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)

Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world

And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones

Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles

The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers

ClassId

All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is

Development area

24

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain

Name Description cre-ation_date change_dateowner

Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect

links is often a very complicatedtask to put it mildly

Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later

The OBJECTS Table

Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure

So let us design this table

Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)

The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)

Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method

Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage

There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student

create domain TOID as integer not nullcreate table OBJECTS(

OID TOID primary keyClassId TOIDName varchar(32)

A simple plain dictionary can be retrieved from the OBJECTStable by the following query

select OID Name Description from OBJECTS where ClassId = LookupId

Development area

25

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description

from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)

Later I will demonstrate how to add alittle more intelligence to such simpledictionaries

Storing of more complexobjects

It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping

Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders

create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))

which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to

ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query

select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id

As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did

If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M

create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))

One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present

Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database

Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this

create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))

connected 1M with OBJECTS by OID There is also a table

create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))

which describes type metadata ndash an attribute set (their names and types) for eachobject type

All attributes for particular object where the OID is known are retrieved by the query

select attribute_id value from attributeswhere OID = oid

or with names of attributes

select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id

Development area

26

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries

select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id

Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query

Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the

database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base

The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1

Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage

Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB

It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-

utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation

To be continuedhellip

FastReport 3 - new generation of the reporting tools

Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix

reports support support most popular DB-engines

Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags

(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)

Access to DB fields styles text flow URLs Anchors

httpwwwfast-reportcom

Development area

27

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

Replicating and synchronizingInterBaseFirebirddatabases using CopyCat

AuthorJonathan Neve

jonathanmicrotecfr

wwwmicrotecfr

PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication

Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases

Data logging

Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)

Multi-node replication

Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after

In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it

replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)

When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it

Two-way replication

One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed

The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change

Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself

Primary key synchronization

One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-

Development area

28

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values

Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server

When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique

Conflict management

Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-

sion of the record this is a conflict

CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved

This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently

Difficult database structures

There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node

Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged

To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an

Development area

29

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable

PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms

1 As a set of Delphi C++ Builder components

2 As a standalone Win32 replication tool

We will now present each one of these products

1 CopyCat Delphi C++ Builder Component Set

CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions

Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server

Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table

Below is a concise guide for getting started with the CopyCat component suite

Download and install the evaluation components fromhttpwwwmicrotecfrcopycat

Prepare databases

1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components

2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)

3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo

4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base

5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo

Development area

30

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data

6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority

7) Apply all the generated SQL to all databases that should be replicated

8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)

Replicate

1 In Delphi open the ldquoReplicatorrdquo example project

2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property

3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases

4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)

5 Compile and run the example

6 Press the ldquoReplicate nowrdquo button

2 CopyTiger the CopyCat Win32standalone replication tool

For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER

Features include

bull Easy to use installer

bull Independent server Administrator tool (CTAdmin)

bull Configuration wizard for setting up links to master slave databases

bull Robust replication engine based on Microtec CopyCat

bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections

bull Simple amp intuitive control panel

bullAutomatic email notification on certain events (conflicts PK violations etc)

Visit the CopyTiger homepage to download a time-limited trial version

httpwwwmicrotecfrcopycatct

SUMMARY

In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit

1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications

2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure

CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool

You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr

As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold

Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code

ibd-852-150

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 6: The InterBase and Firebird Developer Magazine, Issue 2, 2005

Cover story

7

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

Firebird either denies the newrequest or puts it on a list to waituntil the resource is available Inter-nal lock requests specify whetherthey wait or receive an immediateerror on a case-by-case basisWhen a transaction starts it speci-fies whether it will wait for locks thatit acquires on tables etc

Lock modes

For concurrency and read commit-ted transactions Firebird lockstables for shared read or sharedwrite Either mode says ldquoIrsquom usingthis table but you are free to use ittoordquo Consistency mode transac-tions follow different rules Theylock tables for protected read orprotected write Those modes sayldquoIrsquom using the table and no one elseis allowed to change it until Irsquomdonerdquo Protected read is compati-ble with shared read and other pro-tected read transactions Protectedwrite is only compatible with shareread

The important concept about lockmodes is that locks are more subtlethan mutexes ndash locks allowresource sharing as well as protect-ing resources from incompatibleuse

Two-phase locking vs tran-sient locking

The table locks that wersquove been

describing follow a protocol knownas two-phase locking which is typi-cal of locks taken by transactions indatabase systems Databases thatuse record locking for consistencycontrol always use two-phaserecord locks In two-phase lockinga transaction acquires locks as itproceeds and holds the locks until itends Once it releases any lock itcan no longer acquire another Thetwo phases are lock acquisition andlock release They cannot overlap

When a Firebird transaction readsa table it holds a lock on that tableuntil it ends When a concurrencytransaction has acquired a sharedwrite lock to update a table no con-sistency mode transaction will beable to get a protected lock on thattable until the transaction with theshared write lock ends and releasesits locks Table locking in Firebird istwo-phase locking

Locks can also be transient takenand released as necessary duringthe running of a transaction Fire-bird uses transient locking exten-sively to manage physical access tothe database

Firebird page locks

One major difference between Fire-bird and most other databases isFirebirdrsquos Classic mode In Classicmode many separate processesshare write access to a single data-

base file Most databases have asingle server process like Super-Server that has exclusive access tothe database and coordinatesphysical access to the file within

itself Firebird coordinates physicalaccess to the database throughlocks on database pages

In general database theory a trans-

Cover story

8

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

action is a set of steps that transformthe database from on consistentstate to another During that trans-formation the resources held by thetransaction must be protected fromincompatible changes by othertransactions Two-phase locks arethat protection

In Firebird internally each time atransaction changes a page itchanges that page ndash and the physi-cal structure of the database as awhole ndash from one consistent state toanother Before a transaction readsor writes a database page it locksthe page When it finishes readingor writing it can release the lockwithout compromising the physicalconsistency of the database fileFirebird page level locking is tran-sient Transactions acquire andrelease page locks throughout theirexistence However to preventdeadlocks transactions must beable to release all the page locks itholds before acquiring a lock on anew page

The Firebird lock table

When all access to a database isdone in a single process ndash as is thecase with most database systems ndashlocks are held in the serverrsquos memo-ry and the lock table is largely invis-ible The server process extends orremaps the lock information asrequired Firebird however man-ages its locks in a shared memory

section In SuperServer only theserver uses that share memoryarea In Classic every databaseconnection maps the shared memo-ry and every connection can readand change the contents of thememory

The lock table is a separate piece ofshare memory In SuperServer thelock table is mapped into the serverprocess In Classic each processmaps the lock table All databaseson a server computer share thesame lock table except those run-ning with the embedded server

The Firebird lock manager

We often talk about the FirebirdLock Manager as if it were a sepa-rate process but it isnrsquot The lockmanagement code is part of theengine just like the optimizer pars-er and expression evaluator Thereis a formal interface to the lockmanagement code which is similarto the formal interface to the distrib-uted lock manager that was part ofVAXVMS and one of the inter-faces to the Distributed Lock Man-ager from IBM

The lock manager is code in theengine In Classic each process hasits own lock manager When aClassic process requests or releasesa lock its lock management codeacquires a mutex on the sharedmemory section and changes the

state of the lock table to reflect itsrequest

Conflicting lock requests

When a request is made for a lock

on a resource that is already lockedin an incompatible mode one oftwo things happens Either therequesting transaction gets animmediate error or the request is

Registering for the Conference

Call for papers

Sponsoring the Firebird Conference

httpfirebird-conferencecom

Cover story

9

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

put on a list of waiting requests andthe transactions that hold conflictinglocks on the resource are notified ofthe conflicting request Part of everylock request is the address of a rou-tine to call when the lock interfereswith another request for a lock onthe same object Depending on theresource the routine may cause thelock to be released or require thenew request to wait

Transient locks like the locks ondatabase pages are releasedimmediately When a transactionrequests a page lock and that pageis already locked in an incompati-ble mode the transaction or trans-actions that hold the lock are noti-fied and must complete what theyare doing and release their locksimmediately Two-phase locks liketable locks are held until the trans-action that owns the lock completesWhen the conflicting lock isreleased and the new lock is grant-ed then transaction that had beenwaiting can proceed

Locks as interprocess com-munication

Lock management requires a highspeed completely reliable commu-nication mechanism between trans-actions including transactions indifferent processes The actualmechanism varies from platform toplatform but for the database to

work the mechanism must be fastand reliable A fast reliable inter-process communication mechanismcan be ndash and is ndash useful for a num-ber of purposes outside the areathatrsquos normally considered data-base locking

For example Firebird uses the locktable to notify running transactionsof the existence of a new index on atable Thatrsquos important since assoon as an index becomes activeevery transaction must help main-tain it ndash making new entries when itstores or modifies data removingentries when it modifies or deletesdata

When a transaction first referencesa table it gets a lock on the exis-tence of indexes for the tableWhen another transaction wants tocreate a new index on that table itmust get an exclusive lock on theexistence of indexes for the tableIts request conflicts with existinglocks and the owners of those locksare notified of the conflict Whenthose transactions are in a statewhere they can accept a new indexthey release their locks and imme-diately request new shared locks onthe existence of indexes for thetable The transaction that wants tocreate the index gets its exclusivelock creates the index and com-mits releasing its exclusive lock onthe existence of indexing As othertransactions get their new locks

they check the index definitions forthe table find the new index defini-tion and begin maintaining theindex

Firebird locking summary

Although Firebird does not lockrecords it uses locks extensively toisolate the effects of concurrenttransactions Locking and the locktable are more visible in Firebirdthan in other databases because thelock table is a central communica-tion channel between the separateprocesses that access the databasein Classic mode In addition to con-trolling access to database objectslike tables and data pages the Fire-bird lock manager allows differenttransactions and processes to notifyeach other of changes to the state ofthe database new indexes etc

Lock table specifics

The Firebird lock table is an in-mem-ory data area that contains of fourprimary types of blocks The lockheader block describes the locktable as a whole and containspointers to lists of other blocks andfree blocks Owner blocks describethe owners of lock requests ndash gen-erally lock owners are transactionsconnections or the SuperServerRequest blocks describe the rela-tionship between an owner and alockable resource ndash whether therequest is granted or pending the

mode of the request etc Lockblocks describe the resources beinglocked

To request a lock the owner findsthe lock block follows the linked listof requests for that lock and addsits request at the end If other own-ers must be notified of a conflictingrequest they are located throughthe request blocks already in the listEach owner block also has a list ofits own requests The performancecritical part of locking is finding lockblocks For that purpose the locktable includes a hash table foraccess to lock blocks based on thename of the resource being locked

A quick refresher on hashing

A hash table is an array with linkedlists of duplicates and collisionshanging from it The names of lock-able objects are transformed by afunction called the hash functioninto the offset of one of the elementsof the array When two namestransform to the same offset theresult is a collision When two lockshave the same name they areduplicates and always collide

In the Firebird lock table the arrayof the hash table contains theaddress of a hash block Hashblocks contain the original name acollision pointer a duplicate point-er and the address of the lock blockthat corresponds to the name Thecollision pointer contains the

Cover story

10

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

address of a hash block whosename hashed to the same valueThe duplicate pointer contains theaddress of a hash block that hasexactly the same name

A hash table is fast when there arerelatively few collisions With nocollisions finding a lock blockinvolved hashing the name index-ing into the array and reading thepointer from the first hash blockEach collision adds another pointerto follow and name to check Theratio of the size of the array to thenumber of locks determines thenumber of collisions Unfortunatelythe width of the array cannot beadjusted dynamically because thesize of the array is part of the hashfunction Changing the widthchanges the result of the functionand invalidates all existing entries inthe hash table

Adjusting the lock table toimprove performance

The size of the hash table is set inthe Firebird configuration file Youmust shut down all activity on alldatabases that share the hash tablendash normally all databases on themachine ndash before changes takeeffect The Classic architecture usesthe lock table more heavily thanSuperServer If you choose theClassic architecture you shouldcheck the load on the hash tableperiodically and increase the num-

ber of hash slots if the load is highThe symptom of an overloadedhash table is sluggish performanceunder load

The tool for checking the lock tableis fb_lock_print which is a com-mand line utility in the bin directoryof the Firebird installation tree Thefull lock print describes the entirestate of the lock table and is of limit-ed interest When your system isunder load and behaving badlyinvoke the utility with no options orswitches directing the output to afile Open the file with an editorYoull see output that starts some-thing like this

LOCK_HEADER BLOCK Version114 Active owner0 Length262144 Used85740 Semmask0x0 Flags 0x0001 Enqs 18512 Converts 490 Rejects0 Blocks 0 Deadlock scans0 Deadlocks0 Scan interval10 Acquires 21048 Acquire blocks0 Spin count0 Mutex wait103 Hash slots101 Hash lengths (minavgmax)3 15 30

hellipThe seventh and eighth lines suggestthat the hash table is too small andthat it is affecting system perform-ance In the example these valuesindicate a problem

Mutex wait 103 Hash slots 101 Hash lengths (minavgmax)3 15 30 In the Classic architecture eachprocess makes its own changes tothe lock table Only one process is

allowed to update the lock table atany instant When updating the locktable a process holds the tablersquosmutex A non-zero mutex wait indi-cates that processes are blocked bythe mutex and forced to wait foraccess to the lock table In turn thatindicates a performance probleminside the lock table typicallybecause looking up a lock is slow

If the hash lengths are more thanmin 5 avg 10 or max 30 youneed to increase the number ofhash slots The hash function used inFirebird is quick but not terribly effi-cient It works best if the number ofhash slots is prime

Change this line in the configurationfileLockHashSlots = 101

Uncomment the line by removing

the leading Choose a value thatis a prime number less than 2048

LockHashSlots = 499

The change will not take effect untilall connections to all databases onthe server machine shut down

If you increase the number of hashslots you should also increase thelock table size The second line ofthe lock print

Version114 Active owner 0Length 262144 Used 85740

tells you how close you are to run-ning out of space in the lock tableThe Version and Active owner areuninteresting The length is the max-imum size of the lock table Used isthe amount of space currently allo-cated for the various block typesand hash table If the amount usedis anywhere near the total lengthuncomment this parameter in theconfiguration file by removing theleading and increase the value

LockMemSize = 262144

to this

LockMemSize = 1048576

The value is bytes The default locktable is about a quarter of amegabyte which is insignificant onmodern computers Changing thelock table size will not take effectuntil all connections to all databaseson the server machine shut down

PHP Server

One of the more interest-ing recent developmentsin information technolo-gy has been the rise ofbrowser based applica-tions often referred toby the acronym LAMP

One key hurdle forbroad use of the LAMPtechnology for mid-mar-ket solutions was that itwas never easy to con-figure and manage

PHPServer changes thatit can be installed withjust four clicks of themouse and support forFirebird is compiled in

PHPServer shares all thequalities of Firebird it isa capable compacteasy to install and easyto manage solution

PHPServer is a freedownload

Read more at

wwwfyracleorgphpserverhtml

News amp Events

Server internals

11

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Inside BLOBs

How the serverworks with BLOBs

The BLOB data type is intended forstoring data of variable size Fieldsof BLOB type allow for storage ofdata that cannot be placed in fieldsof other types - for example pic-tures audio files video fragmentsetc

From the point view of the databaseapplication developer using BLOBfields is as transparent as it is forother field types (see chapter ldquoDatatypesrdquo for details) However thereis a significant difference betweenthe internal implementation mecha-nism for BLOBs and that for otherdata

Unlike the mechanism used for han-dling other types of fields the data-base engine uses a special mecha-nism to work with BLOB fields Thismechanism is transparently inte-grated with other record handlingat the application level and at thesame time has its own means ofpage organization Lets consider indetail how the BLOB-handlingmechanism works

Inside BLOBsThis is an excerpt from the book ldquo1000 InterBase amp Firebird Tips amp Tricksrdquoby Alexey Kovyazin and Dmitri Kouzmenko which will be published in 2006

Initially the basic record data onthe data page includes a referenceto a ldquoBLOB recordrdquo for each non-null BLOB field ie to record-likestructure or quasi-record that actu-

ally contains the BLOB dataDepending on the size of the BLOBthis BLOB-record will be one ofthree types

The first type is the simplest If thesize of BLOB-field data is less thanthe free space on the data page it isplaced on the data page as a sepa-rate record of BLOB type

Author Dmitri Kouzmenkokdvib-aidcom

Author Alexey Kovyazinakib-aidcom

Server internals

12

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Inside BLOBs

The second type is used when thesize of BLOB is greater than thefree space on the page In thiscase references to pages contain-ing the actual BLOB data arestored in a quasi-record Thus atwo-level structure of BLOB-fielddata is used

If the size of BLOB-field contents isvery large a three-level structureis used ndash a quasi-record stores ref-erences to BLOB pointer pageswhich contain references to theactual BLOB data

The whole structure of BLOB stor-age (except for the quasi-recordof course) is implemented by onepage type ndash the BLOB page typeDifferent types of BLOB-pages dif-fer from each other in the presenceof a flag (value 0 or 1) defininghow the server should interpret thegiven page

BLOB Page

The blob page consists of the fol-lowing parts

The special header contains thefollowing information

bullThe number of the first blob pagein this blob It is used to check thatpages belong to one blob

bullA sequence number This isimportant in checking the integrityof a BLOB For a BLOB pointerpage it is equal to zero

bullThe length of data on a page Asa page may or may not be filled tothe full extent the length of actualdata is indicated in the header

Maximum BLOB size

As the internal structure for storingBLOB data can have only 3 levelsof organization and the size ofdata page is also limited it is pos-sible to calculate the maximum sizeof a BLOB

However this is a theoretical limit(if you want you can calculate it)but in practice the limit will bemuch lower The reason for thislower limit is that the length ofBLOB-field data is determined by avariable of ULONG type ie itsmaximal size will be equal to 4gigabytes

Moreover in reality this practicallimit is reduced if a UDF is to beused for BLOB processing Aninternal UDF implementationassumes that the maximum BLOB

size will be 2 gigabytes So if youplan to have very large BLOBfields in your database you shouldexperiment with storing data of alarge size beforehand

The segment size mystery

Developers of database applica-tions often ask what the SegmentSize parameter in the definition ofa BLOB is why we need it andwhether or not we should set itwhen creating Blob-fields

In reality there is no need to setthis parameter Actually it is a bitof a relic used by the GPRE utilitywhen pre-processing EmbeddedSQL When working with BLOBsGPRE declares a buffer of speci-fied size based on the segmentsize Setting the segment size hasno influence over the allocationand the size of segments whenstoring the BLOB on disk It alsohas no influence on performanceTherefore the segment size can besafely set to any value but it is setto 80 bytes by default

Information for those who want toknow everything the number 80was chosen because 80 symbolscould be allocated in alphanumer-ic terminals

TestBed

13

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

In issue 1 we published DmitriYemanovrsquos article about the inter-nals of savepoints While that articlewas still on the desk Borlandannounced the release of InterBase751 introducing amongst otherthings a NO SAVEPOINT optionfor transaction management Is thisan important improvement for Inter-Base We decided to give thisimplementation a close look and testit some to discover what it is allabout

Testing NO SAVEPOINT

In order to analyze the problem thatthe new transaction option wasintended to address and to assessits real value we performed severalvery simple SQL tests The tests areall 100 reproducible so you willbe able to verify our results easily

Database for testing

The test database file was created inInterBase 751 page size = 4096character encoding is NONE Itcontains two tables one stored pro-cedure and three generators

For the test we will use only one

Testing the NO SAVEPOINTfeature in InterBase 751 Author Alexey Kovyazin

akib-aidcom

Author Vlad Horsunhvladuserssourceforgenet

table with the following structure

CREATE TABLE TEST (ID NUMERIC(182)NAME VARCHAR(120)DESCRIPTION VARCHAR(250)CNT INTEGERQRT DOUBLE PRECISIONTS_CHANGE TIMESTAMPTS_CREATE TIMESTAMPNOTES BLOB

)This table contains 100000 records which will be updated during the testThe stored procedure and generators are used to fill the table with test dataYou can increase the quantity of records in the test table by calling thestored procedure to insert them

SELECT FROM INSERTRECS(1000)

The second table TEST2DROP has the same structure as the first and is filledwith the same records as TESTINSERT INTO TEST2DROP SELECT FROM TESTAs you will see the second table will be dropped immediately after connectWe are just using it as a way to increase database size cheaply the pagesoccupied by the TEST2DROP table will be released for reuse after we dropthe table With this trick we avoid the impact of database file growth on thetest results

Setting test environment

All that is needed to perform this test is the trial installation package of Inter-Base 751 the test database and an SQL script

Download the InterBase 751 trialversion from wwwborlandcomThe installation process is obviousand well-described in the InterBasedocumentation

You can download a backup of thetest database ready to use fromhttpwwwibdevelopercomissue2testspbackupzip (~4 Mb) oralternatively an SQL script for cre-ating it fromhttpwwwibdevelopercomissue2testspdatabasescriptzip (~1Kb)

If you download the databasebackup the test tables are alreadypopulated with records and youcan proceed straight to the sectionldquoPreparing to testrdquo below

If you choose instead to use theSQL script you will create data-base yourself Make sure youinsert 100000 records into tableTEST using the INSERTRECS storedprocedure and then copy all ofthem to TEST2DROP three or fourtimes

After that perform a backup of thisdatabase and you will be on thesame position as if you had down-

TestBed

14

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

loaded ldquoready for userdquo backup

Hardware is not a material issue for these tests since we are only comparing performance with and without the NOSAVEPOINT option Our test platform was a modest computer with Pentium-4 2 GHz with 512 RAM and an 80GBSamsung HDD

Preparing to test

A separate copy of the test database is used for each test case in order to eliminate any interference between state-ments We create four fresh copies of the database for this purpose Supposing all files are in a directory calledCTEST simply create the four test databases from your test backup file

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp1ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp2ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp3ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp4ib

SQL test scripts

The first script tests a regular one-pass update without the NO SAVEPOINT option

For convenience the important commands are clarified with comments

connect CTESTtestsp1ib USER SYSDBA Password masterkey Connect drop table TEST2DROP Drop table TEST2DROP to free database pagescommitselect count() from test Walk down all records in TEST to place them into cachecommitset time on enable time statistics for performed statementsset stat on enables writesfetchesmemory statisticscommit perform bulk update of all records in TEST table update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommit

quit The second script tests performance for the same UPDATE with the NO SAVEPOINT option

TestBed

15

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

connect Ctestsp2ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitSET TRANSACTION NO SAVEPOINT enable NO SAVEPOINTupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit

Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script both scripts arethe same simply testing the behavior of engine in case of the single bulk UPDATE

To test sequential UPDATEs we added several UPDATE statements--we recommend using five The script for testing without NO SAVEPOINT would be

connect Etestsp3ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit

Fyracle 089

Janus has released anew version of Oracle-mode Firebird FyracleFyracle is a specialized

build of Firebird 15it adds temporary

tables hierarchicalqueries and

a PLSQL engine

Version 089 addssupport for storedprocedures written

in Java

Fyracle dramaticallyreduces the cost of port-ing Oracle-based appli-

cations to Firebird

Common usageincludes the

Compiere open sourceERP package

mid-market deploymentsof Developer

2000 applicationsand demo CDsof applications

without license trouble

Read more at

wwwjanus-soft-warecom

News amp Events

IBAnalyst 19

IBSurgeon has issued newversion of IBAnalyst

Now it can better analyzeInterBase or Firebird data-base statistics with usingmetadata information (in thiscase connection to databaseis required)

IBAnalyst is a tool that assistsa user to analyze in detailFirebird or InterBase data-base statistics and identifypossible problems with data-base performance mainte-nance and how an applica-tion interacts with the data-base

It graphically displays data-base statistics and can thenautomatically make intelli-gent suggestions aboutimproving database per-formance and databasemaintenance

wwwibsurgeoncomnewshtml

News amp Events

You can download all the scripts and the raw results of their execution from this locationhttpwwwibdevelopercomissue2testresultszip

TestBed

16

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

How to perform the test

The easiest way to perform the test is to use isqls INPUT command

Suppose you have the scripts located in ctestscripts

gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql

Test resultsThe single bulk UPDATE

First letrsquos perform the test where the single-pass bulk UPDATE is performed

This is an excerpt from the one-pass script with default transaction settings

SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980

SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3

Fast Report 319

The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains

Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors

TrialBuy Now

News amp Events

TECT Softwarepresents

New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet

SPGen Stored Procedure Gen-erator for Firebird and Inter-

Base creates a standard set ofstored procs for any table full

details can be found here

httpwwwtectsoftnetProductsFirebird

FIBSPGenaspx

And FBMail Firebird EMail(FBMail) is a cross platform PHP

utility which easily allows thesending of emails direct from

within a database

This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected

to the internet

Full detailscan be found here

httpwwwtectsoftnetProductsFirebird

FBMailaspx

News amp Events

This is an excerpt from the one-pass script with NO SAVEPOINT enabled

CopyTiger 100

CopyTigera new productfrom Microtecis an advanced

databasereplication amp

synchronisationtool for

InterBaseFirebird

powered bytheir CopyCatcomponent set(see the articleabout CopyCat

in this issue)

For moreinformation about

CopyTiger seewwwmicrotecfr

copycatct

Downloadan evaluation

version

TestBed

17

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154

SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3

Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not

Sequential bulk UPDATE statements

With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large

Table 1

Test results for 5 sequentalUPDATEs

News amp Events

SQLHammer 15Enterprise Edition

The interestingdeveloper toolSQLHammer

now hasan Enterprise Edition

SQLHammer isa high-performance tool

for rapiddatabase development

and administration

It includes a commonintegrated development

environmentregistered custom

componentsbased on the

Borland Package Librarymechanism

and an Open Tools APIwritten in

DelphiObject Pascal

wwwmetadataforgecom

News amp Events

TestBed

18

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

For convenience the results are tabulated above

The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT

Figure 1

Time to perform UPDATEs with and without NO SAVEPOINT

Table 1

Test results for 5 sequental UPDATEs

In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used

The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters

Figure 2

Memory usage while performing UPDATEs with and without NO SAVEPOINT

With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below

Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains

A few words about versions

You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The

TestBed

19

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

old version does not disappearimmediately but is retained as abackversion

In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk

The UNDO log concept

You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested

A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point

So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources

The NO SAVEPOINToption

The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)

Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes

The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management

First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see

ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)

Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions

Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes

Initial situation

Consider the implementation details of the undo-log Figure 3 shows the initial situation

Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record

Figure 3

Initial situation before any UPDATE - only the one record version exists Undo log is empty

The first UPDATE

The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager

It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk

TestBed

20

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 4

UPDATE1 create small delta version on disk and put record number into UNDO log

This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled

The second UPDATE

When the second UPDATE statement is performed on the same set of records we have adifferent situation

Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used

To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory

The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required

Figure 5

The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log

This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters

When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first

The third UPDATE

The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further

TestBed

21

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 6

The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one

Figure 7

If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint

Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)

Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log

A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know

The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE

Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit

BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log

For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7

In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope

SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping

Development area

22

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article

The sources in the bibliography arelisted in the order of their appear-ance in my mind

The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability

of all the methods taken from thearticles and of course I have testedall my own methods

Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise

I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc

And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle

The problem state-ment

What is the problem

Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common

1Everyone got used to them andfelt comfortable with them

2They are quite fast (if you usethem according to certain knownstandards)

3They use SQL which is an easycomprehensible and time-proveddata manipulation method

4They are based upon a strongmathematical theory

5They are convenient for applica-

tion development - if you developyour applications just like 20-30years ago

As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and

Author Vladimir Kotlyarevskyvladcontekru

Development area

23

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip

As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored

What do we need

The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing

RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task

The OID

All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it

First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have

any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database

OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex

Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)

Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world

And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones

Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles

The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers

ClassId

All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is

Development area

24

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain

Name Description cre-ation_date change_dateowner

Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect

links is often a very complicatedtask to put it mildly

Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later

The OBJECTS Table

Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure

So let us design this table

Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)

The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)

Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method

Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage

There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student

create domain TOID as integer not nullcreate table OBJECTS(

OID TOID primary keyClassId TOIDName varchar(32)

A simple plain dictionary can be retrieved from the OBJECTStable by the following query

select OID Name Description from OBJECTS where ClassId = LookupId

Development area

25

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description

from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)

Later I will demonstrate how to add alittle more intelligence to such simpledictionaries

Storing of more complexobjects

It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping

Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders

create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))

which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to

ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query

select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id

As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did

If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M

create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))

One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present

Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database

Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this

create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))

connected 1M with OBJECTS by OID There is also a table

create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))

which describes type metadata ndash an attribute set (their names and types) for eachobject type

All attributes for particular object where the OID is known are retrieved by the query

select attribute_id value from attributeswhere OID = oid

or with names of attributes

select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id

Development area

26

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries

select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id

Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query

Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the

database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base

The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1

Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage

Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB

It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-

utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation

To be continuedhellip

FastReport 3 - new generation of the reporting tools

Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix

reports support support most popular DB-engines

Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags

(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)

Access to DB fields styles text flow URLs Anchors

httpwwwfast-reportcom

Development area

27

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

Replicating and synchronizingInterBaseFirebirddatabases using CopyCat

AuthorJonathan Neve

jonathanmicrotecfr

wwwmicrotecfr

PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication

Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases

Data logging

Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)

Multi-node replication

Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after

In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it

replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)

When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it

Two-way replication

One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed

The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change

Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself

Primary key synchronization

One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-

Development area

28

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values

Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server

When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique

Conflict management

Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-

sion of the record this is a conflict

CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved

This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently

Difficult database structures

There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node

Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged

To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an

Development area

29

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable

PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms

1 As a set of Delphi C++ Builder components

2 As a standalone Win32 replication tool

We will now present each one of these products

1 CopyCat Delphi C++ Builder Component Set

CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions

Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server

Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table

Below is a concise guide for getting started with the CopyCat component suite

Download and install the evaluation components fromhttpwwwmicrotecfrcopycat

Prepare databases

1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components

2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)

3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo

4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base

5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo

Development area

30

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data

6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority

7) Apply all the generated SQL to all databases that should be replicated

8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)

Replicate

1 In Delphi open the ldquoReplicatorrdquo example project

2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property

3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases

4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)

5 Compile and run the example

6 Press the ldquoReplicate nowrdquo button

2 CopyTiger the CopyCat Win32standalone replication tool

For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER

Features include

bull Easy to use installer

bull Independent server Administrator tool (CTAdmin)

bull Configuration wizard for setting up links to master slave databases

bull Robust replication engine based on Microtec CopyCat

bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections

bull Simple amp intuitive control panel

bullAutomatic email notification on certain events (conflicts PK violations etc)

Visit the CopyTiger homepage to download a time-limited trial version

httpwwwmicrotecfrcopycatct

SUMMARY

In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit

1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications

2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure

CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool

You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr

As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold

Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code

ibd-852-150

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 7: The InterBase and Firebird Developer Magazine, Issue 2, 2005

Cover story

8

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

action is a set of steps that transformthe database from on consistentstate to another During that trans-formation the resources held by thetransaction must be protected fromincompatible changes by othertransactions Two-phase locks arethat protection

In Firebird internally each time atransaction changes a page itchanges that page ndash and the physi-cal structure of the database as awhole ndash from one consistent state toanother Before a transaction readsor writes a database page it locksthe page When it finishes readingor writing it can release the lockwithout compromising the physicalconsistency of the database fileFirebird page level locking is tran-sient Transactions acquire andrelease page locks throughout theirexistence However to preventdeadlocks transactions must beable to release all the page locks itholds before acquiring a lock on anew page

The Firebird lock table

When all access to a database isdone in a single process ndash as is thecase with most database systems ndashlocks are held in the serverrsquos memo-ry and the lock table is largely invis-ible The server process extends orremaps the lock information asrequired Firebird however man-ages its locks in a shared memory

section In SuperServer only theserver uses that share memoryarea In Classic every databaseconnection maps the shared memo-ry and every connection can readand change the contents of thememory

The lock table is a separate piece ofshare memory In SuperServer thelock table is mapped into the serverprocess In Classic each processmaps the lock table All databaseson a server computer share thesame lock table except those run-ning with the embedded server

The Firebird lock manager

We often talk about the FirebirdLock Manager as if it were a sepa-rate process but it isnrsquot The lockmanagement code is part of theengine just like the optimizer pars-er and expression evaluator Thereis a formal interface to the lockmanagement code which is similarto the formal interface to the distrib-uted lock manager that was part ofVAXVMS and one of the inter-faces to the Distributed Lock Man-ager from IBM

The lock manager is code in theengine In Classic each process hasits own lock manager When aClassic process requests or releasesa lock its lock management codeacquires a mutex on the sharedmemory section and changes the

state of the lock table to reflect itsrequest

Conflicting lock requests

When a request is made for a lock

on a resource that is already lockedin an incompatible mode one oftwo things happens Either therequesting transaction gets animmediate error or the request is

Registering for the Conference

Call for papers

Sponsoring the Firebird Conference

httpfirebird-conferencecom

Cover story

9

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

put on a list of waiting requests andthe transactions that hold conflictinglocks on the resource are notified ofthe conflicting request Part of everylock request is the address of a rou-tine to call when the lock interfereswith another request for a lock onthe same object Depending on theresource the routine may cause thelock to be released or require thenew request to wait

Transient locks like the locks ondatabase pages are releasedimmediately When a transactionrequests a page lock and that pageis already locked in an incompati-ble mode the transaction or trans-actions that hold the lock are noti-fied and must complete what theyare doing and release their locksimmediately Two-phase locks liketable locks are held until the trans-action that owns the lock completesWhen the conflicting lock isreleased and the new lock is grant-ed then transaction that had beenwaiting can proceed

Locks as interprocess com-munication

Lock management requires a highspeed completely reliable commu-nication mechanism between trans-actions including transactions indifferent processes The actualmechanism varies from platform toplatform but for the database to

work the mechanism must be fastand reliable A fast reliable inter-process communication mechanismcan be ndash and is ndash useful for a num-ber of purposes outside the areathatrsquos normally considered data-base locking

For example Firebird uses the locktable to notify running transactionsof the existence of a new index on atable Thatrsquos important since assoon as an index becomes activeevery transaction must help main-tain it ndash making new entries when itstores or modifies data removingentries when it modifies or deletesdata

When a transaction first referencesa table it gets a lock on the exis-tence of indexes for the tableWhen another transaction wants tocreate a new index on that table itmust get an exclusive lock on theexistence of indexes for the tableIts request conflicts with existinglocks and the owners of those locksare notified of the conflict Whenthose transactions are in a statewhere they can accept a new indexthey release their locks and imme-diately request new shared locks onthe existence of indexes for thetable The transaction that wants tocreate the index gets its exclusivelock creates the index and com-mits releasing its exclusive lock onthe existence of indexing As othertransactions get their new locks

they check the index definitions forthe table find the new index defini-tion and begin maintaining theindex

Firebird locking summary

Although Firebird does not lockrecords it uses locks extensively toisolate the effects of concurrenttransactions Locking and the locktable are more visible in Firebirdthan in other databases because thelock table is a central communica-tion channel between the separateprocesses that access the databasein Classic mode In addition to con-trolling access to database objectslike tables and data pages the Fire-bird lock manager allows differenttransactions and processes to notifyeach other of changes to the state ofthe database new indexes etc

Lock table specifics

The Firebird lock table is an in-mem-ory data area that contains of fourprimary types of blocks The lockheader block describes the locktable as a whole and containspointers to lists of other blocks andfree blocks Owner blocks describethe owners of lock requests ndash gen-erally lock owners are transactionsconnections or the SuperServerRequest blocks describe the rela-tionship between an owner and alockable resource ndash whether therequest is granted or pending the

mode of the request etc Lockblocks describe the resources beinglocked

To request a lock the owner findsthe lock block follows the linked listof requests for that lock and addsits request at the end If other own-ers must be notified of a conflictingrequest they are located throughthe request blocks already in the listEach owner block also has a list ofits own requests The performancecritical part of locking is finding lockblocks For that purpose the locktable includes a hash table foraccess to lock blocks based on thename of the resource being locked

A quick refresher on hashing

A hash table is an array with linkedlists of duplicates and collisionshanging from it The names of lock-able objects are transformed by afunction called the hash functioninto the offset of one of the elementsof the array When two namestransform to the same offset theresult is a collision When two lockshave the same name they areduplicates and always collide

In the Firebird lock table the arrayof the hash table contains theaddress of a hash block Hashblocks contain the original name acollision pointer a duplicate point-er and the address of the lock blockthat corresponds to the name Thecollision pointer contains the

Cover story

10

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

address of a hash block whosename hashed to the same valueThe duplicate pointer contains theaddress of a hash block that hasexactly the same name

A hash table is fast when there arerelatively few collisions With nocollisions finding a lock blockinvolved hashing the name index-ing into the array and reading thepointer from the first hash blockEach collision adds another pointerto follow and name to check Theratio of the size of the array to thenumber of locks determines thenumber of collisions Unfortunatelythe width of the array cannot beadjusted dynamically because thesize of the array is part of the hashfunction Changing the widthchanges the result of the functionand invalidates all existing entries inthe hash table

Adjusting the lock table toimprove performance

The size of the hash table is set inthe Firebird configuration file Youmust shut down all activity on alldatabases that share the hash tablendash normally all databases on themachine ndash before changes takeeffect The Classic architecture usesthe lock table more heavily thanSuperServer If you choose theClassic architecture you shouldcheck the load on the hash tableperiodically and increase the num-

ber of hash slots if the load is highThe symptom of an overloadedhash table is sluggish performanceunder load

The tool for checking the lock tableis fb_lock_print which is a com-mand line utility in the bin directoryof the Firebird installation tree Thefull lock print describes the entirestate of the lock table and is of limit-ed interest When your system isunder load and behaving badlyinvoke the utility with no options orswitches directing the output to afile Open the file with an editorYoull see output that starts some-thing like this

LOCK_HEADER BLOCK Version114 Active owner0 Length262144 Used85740 Semmask0x0 Flags 0x0001 Enqs 18512 Converts 490 Rejects0 Blocks 0 Deadlock scans0 Deadlocks0 Scan interval10 Acquires 21048 Acquire blocks0 Spin count0 Mutex wait103 Hash slots101 Hash lengths (minavgmax)3 15 30

hellipThe seventh and eighth lines suggestthat the hash table is too small andthat it is affecting system perform-ance In the example these valuesindicate a problem

Mutex wait 103 Hash slots 101 Hash lengths (minavgmax)3 15 30 In the Classic architecture eachprocess makes its own changes tothe lock table Only one process is

allowed to update the lock table atany instant When updating the locktable a process holds the tablersquosmutex A non-zero mutex wait indi-cates that processes are blocked bythe mutex and forced to wait foraccess to the lock table In turn thatindicates a performance probleminside the lock table typicallybecause looking up a lock is slow

If the hash lengths are more thanmin 5 avg 10 or max 30 youneed to increase the number ofhash slots The hash function used inFirebird is quick but not terribly effi-cient It works best if the number ofhash slots is prime

Change this line in the configurationfileLockHashSlots = 101

Uncomment the line by removing

the leading Choose a value thatis a prime number less than 2048

LockHashSlots = 499

The change will not take effect untilall connections to all databases onthe server machine shut down

If you increase the number of hashslots you should also increase thelock table size The second line ofthe lock print

Version114 Active owner 0Length 262144 Used 85740

tells you how close you are to run-ning out of space in the lock tableThe Version and Active owner areuninteresting The length is the max-imum size of the lock table Used isthe amount of space currently allo-cated for the various block typesand hash table If the amount usedis anywhere near the total lengthuncomment this parameter in theconfiguration file by removing theleading and increase the value

LockMemSize = 262144

to this

LockMemSize = 1048576

The value is bytes The default locktable is about a quarter of amegabyte which is insignificant onmodern computers Changing thelock table size will not take effectuntil all connections to all databaseson the server machine shut down

PHP Server

One of the more interest-ing recent developmentsin information technolo-gy has been the rise ofbrowser based applica-tions often referred toby the acronym LAMP

One key hurdle forbroad use of the LAMPtechnology for mid-mar-ket solutions was that itwas never easy to con-figure and manage

PHPServer changes thatit can be installed withjust four clicks of themouse and support forFirebird is compiled in

PHPServer shares all thequalities of Firebird it isa capable compacteasy to install and easyto manage solution

PHPServer is a freedownload

Read more at

wwwfyracleorgphpserverhtml

News amp Events

Server internals

11

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Inside BLOBs

How the serverworks with BLOBs

The BLOB data type is intended forstoring data of variable size Fieldsof BLOB type allow for storage ofdata that cannot be placed in fieldsof other types - for example pic-tures audio files video fragmentsetc

From the point view of the databaseapplication developer using BLOBfields is as transparent as it is forother field types (see chapter ldquoDatatypesrdquo for details) However thereis a significant difference betweenthe internal implementation mecha-nism for BLOBs and that for otherdata

Unlike the mechanism used for han-dling other types of fields the data-base engine uses a special mecha-nism to work with BLOB fields Thismechanism is transparently inte-grated with other record handlingat the application level and at thesame time has its own means ofpage organization Lets consider indetail how the BLOB-handlingmechanism works

Inside BLOBsThis is an excerpt from the book ldquo1000 InterBase amp Firebird Tips amp Tricksrdquoby Alexey Kovyazin and Dmitri Kouzmenko which will be published in 2006

Initially the basic record data onthe data page includes a referenceto a ldquoBLOB recordrdquo for each non-null BLOB field ie to record-likestructure or quasi-record that actu-

ally contains the BLOB dataDepending on the size of the BLOBthis BLOB-record will be one ofthree types

The first type is the simplest If thesize of BLOB-field data is less thanthe free space on the data page it isplaced on the data page as a sepa-rate record of BLOB type

Author Dmitri Kouzmenkokdvib-aidcom

Author Alexey Kovyazinakib-aidcom

Server internals

12

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Inside BLOBs

The second type is used when thesize of BLOB is greater than thefree space on the page In thiscase references to pages contain-ing the actual BLOB data arestored in a quasi-record Thus atwo-level structure of BLOB-fielddata is used

If the size of BLOB-field contents isvery large a three-level structureis used ndash a quasi-record stores ref-erences to BLOB pointer pageswhich contain references to theactual BLOB data

The whole structure of BLOB stor-age (except for the quasi-recordof course) is implemented by onepage type ndash the BLOB page typeDifferent types of BLOB-pages dif-fer from each other in the presenceof a flag (value 0 or 1) defininghow the server should interpret thegiven page

BLOB Page

The blob page consists of the fol-lowing parts

The special header contains thefollowing information

bullThe number of the first blob pagein this blob It is used to check thatpages belong to one blob

bullA sequence number This isimportant in checking the integrityof a BLOB For a BLOB pointerpage it is equal to zero

bullThe length of data on a page Asa page may or may not be filled tothe full extent the length of actualdata is indicated in the header

Maximum BLOB size

As the internal structure for storingBLOB data can have only 3 levelsof organization and the size ofdata page is also limited it is pos-sible to calculate the maximum sizeof a BLOB

However this is a theoretical limit(if you want you can calculate it)but in practice the limit will bemuch lower The reason for thislower limit is that the length ofBLOB-field data is determined by avariable of ULONG type ie itsmaximal size will be equal to 4gigabytes

Moreover in reality this practicallimit is reduced if a UDF is to beused for BLOB processing Aninternal UDF implementationassumes that the maximum BLOB

size will be 2 gigabytes So if youplan to have very large BLOBfields in your database you shouldexperiment with storing data of alarge size beforehand

The segment size mystery

Developers of database applica-tions often ask what the SegmentSize parameter in the definition ofa BLOB is why we need it andwhether or not we should set itwhen creating Blob-fields

In reality there is no need to setthis parameter Actually it is a bitof a relic used by the GPRE utilitywhen pre-processing EmbeddedSQL When working with BLOBsGPRE declares a buffer of speci-fied size based on the segmentsize Setting the segment size hasno influence over the allocationand the size of segments whenstoring the BLOB on disk It alsohas no influence on performanceTherefore the segment size can besafely set to any value but it is setto 80 bytes by default

Information for those who want toknow everything the number 80was chosen because 80 symbolscould be allocated in alphanumer-ic terminals

TestBed

13

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

In issue 1 we published DmitriYemanovrsquos article about the inter-nals of savepoints While that articlewas still on the desk Borlandannounced the release of InterBase751 introducing amongst otherthings a NO SAVEPOINT optionfor transaction management Is thisan important improvement for Inter-Base We decided to give thisimplementation a close look and testit some to discover what it is allabout

Testing NO SAVEPOINT

In order to analyze the problem thatthe new transaction option wasintended to address and to assessits real value we performed severalvery simple SQL tests The tests areall 100 reproducible so you willbe able to verify our results easily

Database for testing

The test database file was created inInterBase 751 page size = 4096character encoding is NONE Itcontains two tables one stored pro-cedure and three generators

For the test we will use only one

Testing the NO SAVEPOINTfeature in InterBase 751 Author Alexey Kovyazin

akib-aidcom

Author Vlad Horsunhvladuserssourceforgenet

table with the following structure

CREATE TABLE TEST (ID NUMERIC(182)NAME VARCHAR(120)DESCRIPTION VARCHAR(250)CNT INTEGERQRT DOUBLE PRECISIONTS_CHANGE TIMESTAMPTS_CREATE TIMESTAMPNOTES BLOB

)This table contains 100000 records which will be updated during the testThe stored procedure and generators are used to fill the table with test dataYou can increase the quantity of records in the test table by calling thestored procedure to insert them

SELECT FROM INSERTRECS(1000)

The second table TEST2DROP has the same structure as the first and is filledwith the same records as TESTINSERT INTO TEST2DROP SELECT FROM TESTAs you will see the second table will be dropped immediately after connectWe are just using it as a way to increase database size cheaply the pagesoccupied by the TEST2DROP table will be released for reuse after we dropthe table With this trick we avoid the impact of database file growth on thetest results

Setting test environment

All that is needed to perform this test is the trial installation package of Inter-Base 751 the test database and an SQL script

Download the InterBase 751 trialversion from wwwborlandcomThe installation process is obviousand well-described in the InterBasedocumentation

You can download a backup of thetest database ready to use fromhttpwwwibdevelopercomissue2testspbackupzip (~4 Mb) oralternatively an SQL script for cre-ating it fromhttpwwwibdevelopercomissue2testspdatabasescriptzip (~1Kb)

If you download the databasebackup the test tables are alreadypopulated with records and youcan proceed straight to the sectionldquoPreparing to testrdquo below

If you choose instead to use theSQL script you will create data-base yourself Make sure youinsert 100000 records into tableTEST using the INSERTRECS storedprocedure and then copy all ofthem to TEST2DROP three or fourtimes

After that perform a backup of thisdatabase and you will be on thesame position as if you had down-

TestBed

14

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

loaded ldquoready for userdquo backup

Hardware is not a material issue for these tests since we are only comparing performance with and without the NOSAVEPOINT option Our test platform was a modest computer with Pentium-4 2 GHz with 512 RAM and an 80GBSamsung HDD

Preparing to test

A separate copy of the test database is used for each test case in order to eliminate any interference between state-ments We create four fresh copies of the database for this purpose Supposing all files are in a directory calledCTEST simply create the four test databases from your test backup file

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp1ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp2ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp3ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp4ib

SQL test scripts

The first script tests a regular one-pass update without the NO SAVEPOINT option

For convenience the important commands are clarified with comments

connect CTESTtestsp1ib USER SYSDBA Password masterkey Connect drop table TEST2DROP Drop table TEST2DROP to free database pagescommitselect count() from test Walk down all records in TEST to place them into cachecommitset time on enable time statistics for performed statementsset stat on enables writesfetchesmemory statisticscommit perform bulk update of all records in TEST table update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommit

quit The second script tests performance for the same UPDATE with the NO SAVEPOINT option

TestBed

15

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

connect Ctestsp2ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitSET TRANSACTION NO SAVEPOINT enable NO SAVEPOINTupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit

Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script both scripts arethe same simply testing the behavior of engine in case of the single bulk UPDATE

To test sequential UPDATEs we added several UPDATE statements--we recommend using five The script for testing without NO SAVEPOINT would be

connect Etestsp3ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit

Fyracle 089

Janus has released anew version of Oracle-mode Firebird FyracleFyracle is a specialized

build of Firebird 15it adds temporary

tables hierarchicalqueries and

a PLSQL engine

Version 089 addssupport for storedprocedures written

in Java

Fyracle dramaticallyreduces the cost of port-ing Oracle-based appli-

cations to Firebird

Common usageincludes the

Compiere open sourceERP package

mid-market deploymentsof Developer

2000 applicationsand demo CDsof applications

without license trouble

Read more at

wwwjanus-soft-warecom

News amp Events

IBAnalyst 19

IBSurgeon has issued newversion of IBAnalyst

Now it can better analyzeInterBase or Firebird data-base statistics with usingmetadata information (in thiscase connection to databaseis required)

IBAnalyst is a tool that assistsa user to analyze in detailFirebird or InterBase data-base statistics and identifypossible problems with data-base performance mainte-nance and how an applica-tion interacts with the data-base

It graphically displays data-base statistics and can thenautomatically make intelli-gent suggestions aboutimproving database per-formance and databasemaintenance

wwwibsurgeoncomnewshtml

News amp Events

You can download all the scripts and the raw results of their execution from this locationhttpwwwibdevelopercomissue2testresultszip

TestBed

16

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

How to perform the test

The easiest way to perform the test is to use isqls INPUT command

Suppose you have the scripts located in ctestscripts

gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql

Test resultsThe single bulk UPDATE

First letrsquos perform the test where the single-pass bulk UPDATE is performed

This is an excerpt from the one-pass script with default transaction settings

SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980

SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3

Fast Report 319

The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains

Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors

TrialBuy Now

News amp Events

TECT Softwarepresents

New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet

SPGen Stored Procedure Gen-erator for Firebird and Inter-

Base creates a standard set ofstored procs for any table full

details can be found here

httpwwwtectsoftnetProductsFirebird

FIBSPGenaspx

And FBMail Firebird EMail(FBMail) is a cross platform PHP

utility which easily allows thesending of emails direct from

within a database

This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected

to the internet

Full detailscan be found here

httpwwwtectsoftnetProductsFirebird

FBMailaspx

News amp Events

This is an excerpt from the one-pass script with NO SAVEPOINT enabled

CopyTiger 100

CopyTigera new productfrom Microtecis an advanced

databasereplication amp

synchronisationtool for

InterBaseFirebird

powered bytheir CopyCatcomponent set(see the articleabout CopyCat

in this issue)

For moreinformation about

CopyTiger seewwwmicrotecfr

copycatct

Downloadan evaluation

version

TestBed

17

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154

SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3

Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not

Sequential bulk UPDATE statements

With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large

Table 1

Test results for 5 sequentalUPDATEs

News amp Events

SQLHammer 15Enterprise Edition

The interestingdeveloper toolSQLHammer

now hasan Enterprise Edition

SQLHammer isa high-performance tool

for rapiddatabase development

and administration

It includes a commonintegrated development

environmentregistered custom

componentsbased on the

Borland Package Librarymechanism

and an Open Tools APIwritten in

DelphiObject Pascal

wwwmetadataforgecom

News amp Events

TestBed

18

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

For convenience the results are tabulated above

The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT

Figure 1

Time to perform UPDATEs with and without NO SAVEPOINT

Table 1

Test results for 5 sequental UPDATEs

In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used

The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters

Figure 2

Memory usage while performing UPDATEs with and without NO SAVEPOINT

With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below

Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains

A few words about versions

You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The

TestBed

19

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

old version does not disappearimmediately but is retained as abackversion

In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk

The UNDO log concept

You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested

A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point

So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources

The NO SAVEPOINToption

The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)

Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes

The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management

First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see

ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)

Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions

Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes

Initial situation

Consider the implementation details of the undo-log Figure 3 shows the initial situation

Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record

Figure 3

Initial situation before any UPDATE - only the one record version exists Undo log is empty

The first UPDATE

The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager

It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk

TestBed

20

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 4

UPDATE1 create small delta version on disk and put record number into UNDO log

This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled

The second UPDATE

When the second UPDATE statement is performed on the same set of records we have adifferent situation

Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used

To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory

The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required

Figure 5

The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log

This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters

When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first

The third UPDATE

The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further

TestBed

21

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 6

The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one

Figure 7

If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint

Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)

Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log

A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know

The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE

Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit

BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log

For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7

In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope

SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping

Development area

22

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article

The sources in the bibliography arelisted in the order of their appear-ance in my mind

The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability

of all the methods taken from thearticles and of course I have testedall my own methods

Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise

I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc

And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle

The problem state-ment

What is the problem

Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common

1Everyone got used to them andfelt comfortable with them

2They are quite fast (if you usethem according to certain knownstandards)

3They use SQL which is an easycomprehensible and time-proveddata manipulation method

4They are based upon a strongmathematical theory

5They are convenient for applica-

tion development - if you developyour applications just like 20-30years ago

As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and

Author Vladimir Kotlyarevskyvladcontekru

Development area

23

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip

As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored

What do we need

The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing

RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task

The OID

All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it

First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have

any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database

OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex

Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)

Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world

And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones

Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles

The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers

ClassId

All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is

Development area

24

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain

Name Description cre-ation_date change_dateowner

Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect

links is often a very complicatedtask to put it mildly

Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later

The OBJECTS Table

Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure

So let us design this table

Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)

The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)

Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method

Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage

There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student

create domain TOID as integer not nullcreate table OBJECTS(

OID TOID primary keyClassId TOIDName varchar(32)

A simple plain dictionary can be retrieved from the OBJECTStable by the following query

select OID Name Description from OBJECTS where ClassId = LookupId

Development area

25

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description

from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)

Later I will demonstrate how to add alittle more intelligence to such simpledictionaries

Storing of more complexobjects

It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping

Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders

create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))

which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to

ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query

select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id

As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did

If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M

create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))

One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present

Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database

Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this

create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))

connected 1M with OBJECTS by OID There is also a table

create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))

which describes type metadata ndash an attribute set (their names and types) for eachobject type

All attributes for particular object where the OID is known are retrieved by the query

select attribute_id value from attributeswhere OID = oid

or with names of attributes

select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id

Development area

26

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries

select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id

Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query

Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the

database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base

The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1

Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage

Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB

It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-

utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation

To be continuedhellip

FastReport 3 - new generation of the reporting tools

Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix

reports support support most popular DB-engines

Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags

(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)

Access to DB fields styles text flow URLs Anchors

httpwwwfast-reportcom

Development area

27

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

Replicating and synchronizingInterBaseFirebirddatabases using CopyCat

AuthorJonathan Neve

jonathanmicrotecfr

wwwmicrotecfr

PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication

Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases

Data logging

Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)

Multi-node replication

Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after

In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it

replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)

When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it

Two-way replication

One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed

The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change

Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself

Primary key synchronization

One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-

Development area

28

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values

Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server

When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique

Conflict management

Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-

sion of the record this is a conflict

CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved

This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently

Difficult database structures

There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node

Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged

To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an

Development area

29

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable

PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms

1 As a set of Delphi C++ Builder components

2 As a standalone Win32 replication tool

We will now present each one of these products

1 CopyCat Delphi C++ Builder Component Set

CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions

Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server

Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table

Below is a concise guide for getting started with the CopyCat component suite

Download and install the evaluation components fromhttpwwwmicrotecfrcopycat

Prepare databases

1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components

2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)

3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo

4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base

5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo

Development area

30

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data

6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority

7) Apply all the generated SQL to all databases that should be replicated

8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)

Replicate

1 In Delphi open the ldquoReplicatorrdquo example project

2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property

3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases

4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)

5 Compile and run the example

6 Press the ldquoReplicate nowrdquo button

2 CopyTiger the CopyCat Win32standalone replication tool

For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER

Features include

bull Easy to use installer

bull Independent server Administrator tool (CTAdmin)

bull Configuration wizard for setting up links to master slave databases

bull Robust replication engine based on Microtec CopyCat

bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections

bull Simple amp intuitive control panel

bullAutomatic email notification on certain events (conflicts PK violations etc)

Visit the CopyTiger homepage to download a time-limited trial version

httpwwwmicrotecfrcopycatct

SUMMARY

In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit

1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications

2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure

CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool

You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr

As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold

Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code

ibd-852-150

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 8: The InterBase and Firebird Developer Magazine, Issue 2, 2005

Cover story

9

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

put on a list of waiting requests andthe transactions that hold conflictinglocks on the resource are notified ofthe conflicting request Part of everylock request is the address of a rou-tine to call when the lock interfereswith another request for a lock onthe same object Depending on theresource the routine may cause thelock to be released or require thenew request to wait

Transient locks like the locks ondatabase pages are releasedimmediately When a transactionrequests a page lock and that pageis already locked in an incompati-ble mode the transaction or trans-actions that hold the lock are noti-fied and must complete what theyare doing and release their locksimmediately Two-phase locks liketable locks are held until the trans-action that owns the lock completesWhen the conflicting lock isreleased and the new lock is grant-ed then transaction that had beenwaiting can proceed

Locks as interprocess com-munication

Lock management requires a highspeed completely reliable commu-nication mechanism between trans-actions including transactions indifferent processes The actualmechanism varies from platform toplatform but for the database to

work the mechanism must be fastand reliable A fast reliable inter-process communication mechanismcan be ndash and is ndash useful for a num-ber of purposes outside the areathatrsquos normally considered data-base locking

For example Firebird uses the locktable to notify running transactionsof the existence of a new index on atable Thatrsquos important since assoon as an index becomes activeevery transaction must help main-tain it ndash making new entries when itstores or modifies data removingentries when it modifies or deletesdata

When a transaction first referencesa table it gets a lock on the exis-tence of indexes for the tableWhen another transaction wants tocreate a new index on that table itmust get an exclusive lock on theexistence of indexes for the tableIts request conflicts with existinglocks and the owners of those locksare notified of the conflict Whenthose transactions are in a statewhere they can accept a new indexthey release their locks and imme-diately request new shared locks onthe existence of indexes for thetable The transaction that wants tocreate the index gets its exclusivelock creates the index and com-mits releasing its exclusive lock onthe existence of indexing As othertransactions get their new locks

they check the index definitions forthe table find the new index defini-tion and begin maintaining theindex

Firebird locking summary

Although Firebird does not lockrecords it uses locks extensively toisolate the effects of concurrenttransactions Locking and the locktable are more visible in Firebirdthan in other databases because thelock table is a central communica-tion channel between the separateprocesses that access the databasein Classic mode In addition to con-trolling access to database objectslike tables and data pages the Fire-bird lock manager allows differenttransactions and processes to notifyeach other of changes to the state ofthe database new indexes etc

Lock table specifics

The Firebird lock table is an in-mem-ory data area that contains of fourprimary types of blocks The lockheader block describes the locktable as a whole and containspointers to lists of other blocks andfree blocks Owner blocks describethe owners of lock requests ndash gen-erally lock owners are transactionsconnections or the SuperServerRequest blocks describe the rela-tionship between an owner and alockable resource ndash whether therequest is granted or pending the

mode of the request etc Lockblocks describe the resources beinglocked

To request a lock the owner findsthe lock block follows the linked listof requests for that lock and addsits request at the end If other own-ers must be notified of a conflictingrequest they are located throughthe request blocks already in the listEach owner block also has a list ofits own requests The performancecritical part of locking is finding lockblocks For that purpose the locktable includes a hash table foraccess to lock blocks based on thename of the resource being locked

A quick refresher on hashing

A hash table is an array with linkedlists of duplicates and collisionshanging from it The names of lock-able objects are transformed by afunction called the hash functioninto the offset of one of the elementsof the array When two namestransform to the same offset theresult is a collision When two lockshave the same name they areduplicates and always collide

In the Firebird lock table the arrayof the hash table contains theaddress of a hash block Hashblocks contain the original name acollision pointer a duplicate point-er and the address of the lock blockthat corresponds to the name Thecollision pointer contains the

Cover story

10

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

address of a hash block whosename hashed to the same valueThe duplicate pointer contains theaddress of a hash block that hasexactly the same name

A hash table is fast when there arerelatively few collisions With nocollisions finding a lock blockinvolved hashing the name index-ing into the array and reading thepointer from the first hash blockEach collision adds another pointerto follow and name to check Theratio of the size of the array to thenumber of locks determines thenumber of collisions Unfortunatelythe width of the array cannot beadjusted dynamically because thesize of the array is part of the hashfunction Changing the widthchanges the result of the functionand invalidates all existing entries inthe hash table

Adjusting the lock table toimprove performance

The size of the hash table is set inthe Firebird configuration file Youmust shut down all activity on alldatabases that share the hash tablendash normally all databases on themachine ndash before changes takeeffect The Classic architecture usesthe lock table more heavily thanSuperServer If you choose theClassic architecture you shouldcheck the load on the hash tableperiodically and increase the num-

ber of hash slots if the load is highThe symptom of an overloadedhash table is sluggish performanceunder load

The tool for checking the lock tableis fb_lock_print which is a com-mand line utility in the bin directoryof the Firebird installation tree Thefull lock print describes the entirestate of the lock table and is of limit-ed interest When your system isunder load and behaving badlyinvoke the utility with no options orswitches directing the output to afile Open the file with an editorYoull see output that starts some-thing like this

LOCK_HEADER BLOCK Version114 Active owner0 Length262144 Used85740 Semmask0x0 Flags 0x0001 Enqs 18512 Converts 490 Rejects0 Blocks 0 Deadlock scans0 Deadlocks0 Scan interval10 Acquires 21048 Acquire blocks0 Spin count0 Mutex wait103 Hash slots101 Hash lengths (minavgmax)3 15 30

hellipThe seventh and eighth lines suggestthat the hash table is too small andthat it is affecting system perform-ance In the example these valuesindicate a problem

Mutex wait 103 Hash slots 101 Hash lengths (minavgmax)3 15 30 In the Classic architecture eachprocess makes its own changes tothe lock table Only one process is

allowed to update the lock table atany instant When updating the locktable a process holds the tablersquosmutex A non-zero mutex wait indi-cates that processes are blocked bythe mutex and forced to wait foraccess to the lock table In turn thatindicates a performance probleminside the lock table typicallybecause looking up a lock is slow

If the hash lengths are more thanmin 5 avg 10 or max 30 youneed to increase the number ofhash slots The hash function used inFirebird is quick but not terribly effi-cient It works best if the number ofhash slots is prime

Change this line in the configurationfileLockHashSlots = 101

Uncomment the line by removing

the leading Choose a value thatis a prime number less than 2048

LockHashSlots = 499

The change will not take effect untilall connections to all databases onthe server machine shut down

If you increase the number of hashslots you should also increase thelock table size The second line ofthe lock print

Version114 Active owner 0Length 262144 Used 85740

tells you how close you are to run-ning out of space in the lock tableThe Version and Active owner areuninteresting The length is the max-imum size of the lock table Used isthe amount of space currently allo-cated for the various block typesand hash table If the amount usedis anywhere near the total lengthuncomment this parameter in theconfiguration file by removing theleading and increase the value

LockMemSize = 262144

to this

LockMemSize = 1048576

The value is bytes The default locktable is about a quarter of amegabyte which is insignificant onmodern computers Changing thelock table size will not take effectuntil all connections to all databaseson the server machine shut down

PHP Server

One of the more interest-ing recent developmentsin information technolo-gy has been the rise ofbrowser based applica-tions often referred toby the acronym LAMP

One key hurdle forbroad use of the LAMPtechnology for mid-mar-ket solutions was that itwas never easy to con-figure and manage

PHPServer changes thatit can be installed withjust four clicks of themouse and support forFirebird is compiled in

PHPServer shares all thequalities of Firebird it isa capable compacteasy to install and easyto manage solution

PHPServer is a freedownload

Read more at

wwwfyracleorgphpserverhtml

News amp Events

Server internals

11

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Inside BLOBs

How the serverworks with BLOBs

The BLOB data type is intended forstoring data of variable size Fieldsof BLOB type allow for storage ofdata that cannot be placed in fieldsof other types - for example pic-tures audio files video fragmentsetc

From the point view of the databaseapplication developer using BLOBfields is as transparent as it is forother field types (see chapter ldquoDatatypesrdquo for details) However thereis a significant difference betweenthe internal implementation mecha-nism for BLOBs and that for otherdata

Unlike the mechanism used for han-dling other types of fields the data-base engine uses a special mecha-nism to work with BLOB fields Thismechanism is transparently inte-grated with other record handlingat the application level and at thesame time has its own means ofpage organization Lets consider indetail how the BLOB-handlingmechanism works

Inside BLOBsThis is an excerpt from the book ldquo1000 InterBase amp Firebird Tips amp Tricksrdquoby Alexey Kovyazin and Dmitri Kouzmenko which will be published in 2006

Initially the basic record data onthe data page includes a referenceto a ldquoBLOB recordrdquo for each non-null BLOB field ie to record-likestructure or quasi-record that actu-

ally contains the BLOB dataDepending on the size of the BLOBthis BLOB-record will be one ofthree types

The first type is the simplest If thesize of BLOB-field data is less thanthe free space on the data page it isplaced on the data page as a sepa-rate record of BLOB type

Author Dmitri Kouzmenkokdvib-aidcom

Author Alexey Kovyazinakib-aidcom

Server internals

12

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Inside BLOBs

The second type is used when thesize of BLOB is greater than thefree space on the page In thiscase references to pages contain-ing the actual BLOB data arestored in a quasi-record Thus atwo-level structure of BLOB-fielddata is used

If the size of BLOB-field contents isvery large a three-level structureis used ndash a quasi-record stores ref-erences to BLOB pointer pageswhich contain references to theactual BLOB data

The whole structure of BLOB stor-age (except for the quasi-recordof course) is implemented by onepage type ndash the BLOB page typeDifferent types of BLOB-pages dif-fer from each other in the presenceof a flag (value 0 or 1) defininghow the server should interpret thegiven page

BLOB Page

The blob page consists of the fol-lowing parts

The special header contains thefollowing information

bullThe number of the first blob pagein this blob It is used to check thatpages belong to one blob

bullA sequence number This isimportant in checking the integrityof a BLOB For a BLOB pointerpage it is equal to zero

bullThe length of data on a page Asa page may or may not be filled tothe full extent the length of actualdata is indicated in the header

Maximum BLOB size

As the internal structure for storingBLOB data can have only 3 levelsof organization and the size ofdata page is also limited it is pos-sible to calculate the maximum sizeof a BLOB

However this is a theoretical limit(if you want you can calculate it)but in practice the limit will bemuch lower The reason for thislower limit is that the length ofBLOB-field data is determined by avariable of ULONG type ie itsmaximal size will be equal to 4gigabytes

Moreover in reality this practicallimit is reduced if a UDF is to beused for BLOB processing Aninternal UDF implementationassumes that the maximum BLOB

size will be 2 gigabytes So if youplan to have very large BLOBfields in your database you shouldexperiment with storing data of alarge size beforehand

The segment size mystery

Developers of database applica-tions often ask what the SegmentSize parameter in the definition ofa BLOB is why we need it andwhether or not we should set itwhen creating Blob-fields

In reality there is no need to setthis parameter Actually it is a bitof a relic used by the GPRE utilitywhen pre-processing EmbeddedSQL When working with BLOBsGPRE declares a buffer of speci-fied size based on the segmentsize Setting the segment size hasno influence over the allocationand the size of segments whenstoring the BLOB on disk It alsohas no influence on performanceTherefore the segment size can besafely set to any value but it is setto 80 bytes by default

Information for those who want toknow everything the number 80was chosen because 80 symbolscould be allocated in alphanumer-ic terminals

TestBed

13

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

In issue 1 we published DmitriYemanovrsquos article about the inter-nals of savepoints While that articlewas still on the desk Borlandannounced the release of InterBase751 introducing amongst otherthings a NO SAVEPOINT optionfor transaction management Is thisan important improvement for Inter-Base We decided to give thisimplementation a close look and testit some to discover what it is allabout

Testing NO SAVEPOINT

In order to analyze the problem thatthe new transaction option wasintended to address and to assessits real value we performed severalvery simple SQL tests The tests areall 100 reproducible so you willbe able to verify our results easily

Database for testing

The test database file was created inInterBase 751 page size = 4096character encoding is NONE Itcontains two tables one stored pro-cedure and three generators

For the test we will use only one

Testing the NO SAVEPOINTfeature in InterBase 751 Author Alexey Kovyazin

akib-aidcom

Author Vlad Horsunhvladuserssourceforgenet

table with the following structure

CREATE TABLE TEST (ID NUMERIC(182)NAME VARCHAR(120)DESCRIPTION VARCHAR(250)CNT INTEGERQRT DOUBLE PRECISIONTS_CHANGE TIMESTAMPTS_CREATE TIMESTAMPNOTES BLOB

)This table contains 100000 records which will be updated during the testThe stored procedure and generators are used to fill the table with test dataYou can increase the quantity of records in the test table by calling thestored procedure to insert them

SELECT FROM INSERTRECS(1000)

The second table TEST2DROP has the same structure as the first and is filledwith the same records as TESTINSERT INTO TEST2DROP SELECT FROM TESTAs you will see the second table will be dropped immediately after connectWe are just using it as a way to increase database size cheaply the pagesoccupied by the TEST2DROP table will be released for reuse after we dropthe table With this trick we avoid the impact of database file growth on thetest results

Setting test environment

All that is needed to perform this test is the trial installation package of Inter-Base 751 the test database and an SQL script

Download the InterBase 751 trialversion from wwwborlandcomThe installation process is obviousand well-described in the InterBasedocumentation

You can download a backup of thetest database ready to use fromhttpwwwibdevelopercomissue2testspbackupzip (~4 Mb) oralternatively an SQL script for cre-ating it fromhttpwwwibdevelopercomissue2testspdatabasescriptzip (~1Kb)

If you download the databasebackup the test tables are alreadypopulated with records and youcan proceed straight to the sectionldquoPreparing to testrdquo below

If you choose instead to use theSQL script you will create data-base yourself Make sure youinsert 100000 records into tableTEST using the INSERTRECS storedprocedure and then copy all ofthem to TEST2DROP three or fourtimes

After that perform a backup of thisdatabase and you will be on thesame position as if you had down-

TestBed

14

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

loaded ldquoready for userdquo backup

Hardware is not a material issue for these tests since we are only comparing performance with and without the NOSAVEPOINT option Our test platform was a modest computer with Pentium-4 2 GHz with 512 RAM and an 80GBSamsung HDD

Preparing to test

A separate copy of the test database is used for each test case in order to eliminate any interference between state-ments We create four fresh copies of the database for this purpose Supposing all files are in a directory calledCTEST simply create the four test databases from your test backup file

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp1ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp2ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp3ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp4ib

SQL test scripts

The first script tests a regular one-pass update without the NO SAVEPOINT option

For convenience the important commands are clarified with comments

connect CTESTtestsp1ib USER SYSDBA Password masterkey Connect drop table TEST2DROP Drop table TEST2DROP to free database pagescommitselect count() from test Walk down all records in TEST to place them into cachecommitset time on enable time statistics for performed statementsset stat on enables writesfetchesmemory statisticscommit perform bulk update of all records in TEST table update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommit

quit The second script tests performance for the same UPDATE with the NO SAVEPOINT option

TestBed

15

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

connect Ctestsp2ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitSET TRANSACTION NO SAVEPOINT enable NO SAVEPOINTupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit

Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script both scripts arethe same simply testing the behavior of engine in case of the single bulk UPDATE

To test sequential UPDATEs we added several UPDATE statements--we recommend using five The script for testing without NO SAVEPOINT would be

connect Etestsp3ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit

Fyracle 089

Janus has released anew version of Oracle-mode Firebird FyracleFyracle is a specialized

build of Firebird 15it adds temporary

tables hierarchicalqueries and

a PLSQL engine

Version 089 addssupport for storedprocedures written

in Java

Fyracle dramaticallyreduces the cost of port-ing Oracle-based appli-

cations to Firebird

Common usageincludes the

Compiere open sourceERP package

mid-market deploymentsof Developer

2000 applicationsand demo CDsof applications

without license trouble

Read more at

wwwjanus-soft-warecom

News amp Events

IBAnalyst 19

IBSurgeon has issued newversion of IBAnalyst

Now it can better analyzeInterBase or Firebird data-base statistics with usingmetadata information (in thiscase connection to databaseis required)

IBAnalyst is a tool that assistsa user to analyze in detailFirebird or InterBase data-base statistics and identifypossible problems with data-base performance mainte-nance and how an applica-tion interacts with the data-base

It graphically displays data-base statistics and can thenautomatically make intelli-gent suggestions aboutimproving database per-formance and databasemaintenance

wwwibsurgeoncomnewshtml

News amp Events

You can download all the scripts and the raw results of their execution from this locationhttpwwwibdevelopercomissue2testresultszip

TestBed

16

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

How to perform the test

The easiest way to perform the test is to use isqls INPUT command

Suppose you have the scripts located in ctestscripts

gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql

Test resultsThe single bulk UPDATE

First letrsquos perform the test where the single-pass bulk UPDATE is performed

This is an excerpt from the one-pass script with default transaction settings

SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980

SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3

Fast Report 319

The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains

Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors

TrialBuy Now

News amp Events

TECT Softwarepresents

New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet

SPGen Stored Procedure Gen-erator for Firebird and Inter-

Base creates a standard set ofstored procs for any table full

details can be found here

httpwwwtectsoftnetProductsFirebird

FIBSPGenaspx

And FBMail Firebird EMail(FBMail) is a cross platform PHP

utility which easily allows thesending of emails direct from

within a database

This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected

to the internet

Full detailscan be found here

httpwwwtectsoftnetProductsFirebird

FBMailaspx

News amp Events

This is an excerpt from the one-pass script with NO SAVEPOINT enabled

CopyTiger 100

CopyTigera new productfrom Microtecis an advanced

databasereplication amp

synchronisationtool for

InterBaseFirebird

powered bytheir CopyCatcomponent set(see the articleabout CopyCat

in this issue)

For moreinformation about

CopyTiger seewwwmicrotecfr

copycatct

Downloadan evaluation

version

TestBed

17

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154

SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3

Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not

Sequential bulk UPDATE statements

With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large

Table 1

Test results for 5 sequentalUPDATEs

News amp Events

SQLHammer 15Enterprise Edition

The interestingdeveloper toolSQLHammer

now hasan Enterprise Edition

SQLHammer isa high-performance tool

for rapiddatabase development

and administration

It includes a commonintegrated development

environmentregistered custom

componentsbased on the

Borland Package Librarymechanism

and an Open Tools APIwritten in

DelphiObject Pascal

wwwmetadataforgecom

News amp Events

TestBed

18

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

For convenience the results are tabulated above

The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT

Figure 1

Time to perform UPDATEs with and without NO SAVEPOINT

Table 1

Test results for 5 sequental UPDATEs

In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used

The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters

Figure 2

Memory usage while performing UPDATEs with and without NO SAVEPOINT

With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below

Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains

A few words about versions

You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The

TestBed

19

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

old version does not disappearimmediately but is retained as abackversion

In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk

The UNDO log concept

You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested

A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point

So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources

The NO SAVEPOINToption

The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)

Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes

The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management

First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see

ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)

Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions

Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes

Initial situation

Consider the implementation details of the undo-log Figure 3 shows the initial situation

Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record

Figure 3

Initial situation before any UPDATE - only the one record version exists Undo log is empty

The first UPDATE

The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager

It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk

TestBed

20

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 4

UPDATE1 create small delta version on disk and put record number into UNDO log

This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled

The second UPDATE

When the second UPDATE statement is performed on the same set of records we have adifferent situation

Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used

To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory

The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required

Figure 5

The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log

This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters

When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first

The third UPDATE

The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further

TestBed

21

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 6

The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one

Figure 7

If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint

Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)

Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log

A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know

The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE

Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit

BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log

For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7

In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope

SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping

Development area

22

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article

The sources in the bibliography arelisted in the order of their appear-ance in my mind

The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability

of all the methods taken from thearticles and of course I have testedall my own methods

Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise

I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc

And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle

The problem state-ment

What is the problem

Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common

1Everyone got used to them andfelt comfortable with them

2They are quite fast (if you usethem according to certain knownstandards)

3They use SQL which is an easycomprehensible and time-proveddata manipulation method

4They are based upon a strongmathematical theory

5They are convenient for applica-

tion development - if you developyour applications just like 20-30years ago

As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and

Author Vladimir Kotlyarevskyvladcontekru

Development area

23

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip

As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored

What do we need

The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing

RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task

The OID

All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it

First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have

any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database

OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex

Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)

Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world

And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones

Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles

The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers

ClassId

All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is

Development area

24

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain

Name Description cre-ation_date change_dateowner

Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect

links is often a very complicatedtask to put it mildly

Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later

The OBJECTS Table

Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure

So let us design this table

Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)

The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)

Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method

Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage

There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student

create domain TOID as integer not nullcreate table OBJECTS(

OID TOID primary keyClassId TOIDName varchar(32)

A simple plain dictionary can be retrieved from the OBJECTStable by the following query

select OID Name Description from OBJECTS where ClassId = LookupId

Development area

25

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description

from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)

Later I will demonstrate how to add alittle more intelligence to such simpledictionaries

Storing of more complexobjects

It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping

Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders

create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))

which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to

ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query

select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id

As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did

If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M

create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))

One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present

Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database

Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this

create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))

connected 1M with OBJECTS by OID There is also a table

create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))

which describes type metadata ndash an attribute set (their names and types) for eachobject type

All attributes for particular object where the OID is known are retrieved by the query

select attribute_id value from attributeswhere OID = oid

or with names of attributes

select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id

Development area

26

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries

select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id

Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query

Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the

database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base

The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1

Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage

Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB

It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-

utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation

To be continuedhellip

FastReport 3 - new generation of the reporting tools

Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix

reports support support most popular DB-engines

Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags

(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)

Access to DB fields styles text flow URLs Anchors

httpwwwfast-reportcom

Development area

27

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

Replicating and synchronizingInterBaseFirebirddatabases using CopyCat

AuthorJonathan Neve

jonathanmicrotecfr

wwwmicrotecfr

PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication

Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases

Data logging

Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)

Multi-node replication

Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after

In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it

replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)

When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it

Two-way replication

One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed

The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change

Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself

Primary key synchronization

One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-

Development area

28

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values

Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server

When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique

Conflict management

Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-

sion of the record this is a conflict

CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved

This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently

Difficult database structures

There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node

Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged

To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an

Development area

29

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable

PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms

1 As a set of Delphi C++ Builder components

2 As a standalone Win32 replication tool

We will now present each one of these products

1 CopyCat Delphi C++ Builder Component Set

CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions

Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server

Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table

Below is a concise guide for getting started with the CopyCat component suite

Download and install the evaluation components fromhttpwwwmicrotecfrcopycat

Prepare databases

1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components

2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)

3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo

4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base

5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo

Development area

30

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data

6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority

7) Apply all the generated SQL to all databases that should be replicated

8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)

Replicate

1 In Delphi open the ldquoReplicatorrdquo example project

2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property

3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases

4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)

5 Compile and run the example

6 Press the ldquoReplicate nowrdquo button

2 CopyTiger the CopyCat Win32standalone replication tool

For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER

Features include

bull Easy to use installer

bull Independent server Administrator tool (CTAdmin)

bull Configuration wizard for setting up links to master slave databases

bull Robust replication engine based on Microtec CopyCat

bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections

bull Simple amp intuitive control panel

bullAutomatic email notification on certain events (conflicts PK violations etc)

Visit the CopyTiger homepage to download a time-limited trial version

httpwwwmicrotecfrcopycatct

SUMMARY

In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit

1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications

2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure

CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool

You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr

As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold

Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code

ibd-852-150

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 9: The InterBase and Firebird Developer Magazine, Issue 2, 2005

Cover story

10

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Locking Firebird and the Lock Table

address of a hash block whosename hashed to the same valueThe duplicate pointer contains theaddress of a hash block that hasexactly the same name

A hash table is fast when there arerelatively few collisions With nocollisions finding a lock blockinvolved hashing the name index-ing into the array and reading thepointer from the first hash blockEach collision adds another pointerto follow and name to check Theratio of the size of the array to thenumber of locks determines thenumber of collisions Unfortunatelythe width of the array cannot beadjusted dynamically because thesize of the array is part of the hashfunction Changing the widthchanges the result of the functionand invalidates all existing entries inthe hash table

Adjusting the lock table toimprove performance

The size of the hash table is set inthe Firebird configuration file Youmust shut down all activity on alldatabases that share the hash tablendash normally all databases on themachine ndash before changes takeeffect The Classic architecture usesthe lock table more heavily thanSuperServer If you choose theClassic architecture you shouldcheck the load on the hash tableperiodically and increase the num-

ber of hash slots if the load is highThe symptom of an overloadedhash table is sluggish performanceunder load

The tool for checking the lock tableis fb_lock_print which is a com-mand line utility in the bin directoryof the Firebird installation tree Thefull lock print describes the entirestate of the lock table and is of limit-ed interest When your system isunder load and behaving badlyinvoke the utility with no options orswitches directing the output to afile Open the file with an editorYoull see output that starts some-thing like this

LOCK_HEADER BLOCK Version114 Active owner0 Length262144 Used85740 Semmask0x0 Flags 0x0001 Enqs 18512 Converts 490 Rejects0 Blocks 0 Deadlock scans0 Deadlocks0 Scan interval10 Acquires 21048 Acquire blocks0 Spin count0 Mutex wait103 Hash slots101 Hash lengths (minavgmax)3 15 30

hellipThe seventh and eighth lines suggestthat the hash table is too small andthat it is affecting system perform-ance In the example these valuesindicate a problem

Mutex wait 103 Hash slots 101 Hash lengths (minavgmax)3 15 30 In the Classic architecture eachprocess makes its own changes tothe lock table Only one process is

allowed to update the lock table atany instant When updating the locktable a process holds the tablersquosmutex A non-zero mutex wait indi-cates that processes are blocked bythe mutex and forced to wait foraccess to the lock table In turn thatindicates a performance probleminside the lock table typicallybecause looking up a lock is slow

If the hash lengths are more thanmin 5 avg 10 or max 30 youneed to increase the number ofhash slots The hash function used inFirebird is quick but not terribly effi-cient It works best if the number ofhash slots is prime

Change this line in the configurationfileLockHashSlots = 101

Uncomment the line by removing

the leading Choose a value thatis a prime number less than 2048

LockHashSlots = 499

The change will not take effect untilall connections to all databases onthe server machine shut down

If you increase the number of hashslots you should also increase thelock table size The second line ofthe lock print

Version114 Active owner 0Length 262144 Used 85740

tells you how close you are to run-ning out of space in the lock tableThe Version and Active owner areuninteresting The length is the max-imum size of the lock table Used isthe amount of space currently allo-cated for the various block typesand hash table If the amount usedis anywhere near the total lengthuncomment this parameter in theconfiguration file by removing theleading and increase the value

LockMemSize = 262144

to this

LockMemSize = 1048576

The value is bytes The default locktable is about a quarter of amegabyte which is insignificant onmodern computers Changing thelock table size will not take effectuntil all connections to all databaseson the server machine shut down

PHP Server

One of the more interest-ing recent developmentsin information technolo-gy has been the rise ofbrowser based applica-tions often referred toby the acronym LAMP

One key hurdle forbroad use of the LAMPtechnology for mid-mar-ket solutions was that itwas never easy to con-figure and manage

PHPServer changes thatit can be installed withjust four clicks of themouse and support forFirebird is compiled in

PHPServer shares all thequalities of Firebird it isa capable compacteasy to install and easyto manage solution

PHPServer is a freedownload

Read more at

wwwfyracleorgphpserverhtml

News amp Events

Server internals

11

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Inside BLOBs

How the serverworks with BLOBs

The BLOB data type is intended forstoring data of variable size Fieldsof BLOB type allow for storage ofdata that cannot be placed in fieldsof other types - for example pic-tures audio files video fragmentsetc

From the point view of the databaseapplication developer using BLOBfields is as transparent as it is forother field types (see chapter ldquoDatatypesrdquo for details) However thereis a significant difference betweenthe internal implementation mecha-nism for BLOBs and that for otherdata

Unlike the mechanism used for han-dling other types of fields the data-base engine uses a special mecha-nism to work with BLOB fields Thismechanism is transparently inte-grated with other record handlingat the application level and at thesame time has its own means ofpage organization Lets consider indetail how the BLOB-handlingmechanism works

Inside BLOBsThis is an excerpt from the book ldquo1000 InterBase amp Firebird Tips amp Tricksrdquoby Alexey Kovyazin and Dmitri Kouzmenko which will be published in 2006

Initially the basic record data onthe data page includes a referenceto a ldquoBLOB recordrdquo for each non-null BLOB field ie to record-likestructure or quasi-record that actu-

ally contains the BLOB dataDepending on the size of the BLOBthis BLOB-record will be one ofthree types

The first type is the simplest If thesize of BLOB-field data is less thanthe free space on the data page it isplaced on the data page as a sepa-rate record of BLOB type

Author Dmitri Kouzmenkokdvib-aidcom

Author Alexey Kovyazinakib-aidcom

Server internals

12

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Inside BLOBs

The second type is used when thesize of BLOB is greater than thefree space on the page In thiscase references to pages contain-ing the actual BLOB data arestored in a quasi-record Thus atwo-level structure of BLOB-fielddata is used

If the size of BLOB-field contents isvery large a three-level structureis used ndash a quasi-record stores ref-erences to BLOB pointer pageswhich contain references to theactual BLOB data

The whole structure of BLOB stor-age (except for the quasi-recordof course) is implemented by onepage type ndash the BLOB page typeDifferent types of BLOB-pages dif-fer from each other in the presenceof a flag (value 0 or 1) defininghow the server should interpret thegiven page

BLOB Page

The blob page consists of the fol-lowing parts

The special header contains thefollowing information

bullThe number of the first blob pagein this blob It is used to check thatpages belong to one blob

bullA sequence number This isimportant in checking the integrityof a BLOB For a BLOB pointerpage it is equal to zero

bullThe length of data on a page Asa page may or may not be filled tothe full extent the length of actualdata is indicated in the header

Maximum BLOB size

As the internal structure for storingBLOB data can have only 3 levelsof organization and the size ofdata page is also limited it is pos-sible to calculate the maximum sizeof a BLOB

However this is a theoretical limit(if you want you can calculate it)but in practice the limit will bemuch lower The reason for thislower limit is that the length ofBLOB-field data is determined by avariable of ULONG type ie itsmaximal size will be equal to 4gigabytes

Moreover in reality this practicallimit is reduced if a UDF is to beused for BLOB processing Aninternal UDF implementationassumes that the maximum BLOB

size will be 2 gigabytes So if youplan to have very large BLOBfields in your database you shouldexperiment with storing data of alarge size beforehand

The segment size mystery

Developers of database applica-tions often ask what the SegmentSize parameter in the definition ofa BLOB is why we need it andwhether or not we should set itwhen creating Blob-fields

In reality there is no need to setthis parameter Actually it is a bitof a relic used by the GPRE utilitywhen pre-processing EmbeddedSQL When working with BLOBsGPRE declares a buffer of speci-fied size based on the segmentsize Setting the segment size hasno influence over the allocationand the size of segments whenstoring the BLOB on disk It alsohas no influence on performanceTherefore the segment size can besafely set to any value but it is setto 80 bytes by default

Information for those who want toknow everything the number 80was chosen because 80 symbolscould be allocated in alphanumer-ic terminals

TestBed

13

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

In issue 1 we published DmitriYemanovrsquos article about the inter-nals of savepoints While that articlewas still on the desk Borlandannounced the release of InterBase751 introducing amongst otherthings a NO SAVEPOINT optionfor transaction management Is thisan important improvement for Inter-Base We decided to give thisimplementation a close look and testit some to discover what it is allabout

Testing NO SAVEPOINT

In order to analyze the problem thatthe new transaction option wasintended to address and to assessits real value we performed severalvery simple SQL tests The tests areall 100 reproducible so you willbe able to verify our results easily

Database for testing

The test database file was created inInterBase 751 page size = 4096character encoding is NONE Itcontains two tables one stored pro-cedure and three generators

For the test we will use only one

Testing the NO SAVEPOINTfeature in InterBase 751 Author Alexey Kovyazin

akib-aidcom

Author Vlad Horsunhvladuserssourceforgenet

table with the following structure

CREATE TABLE TEST (ID NUMERIC(182)NAME VARCHAR(120)DESCRIPTION VARCHAR(250)CNT INTEGERQRT DOUBLE PRECISIONTS_CHANGE TIMESTAMPTS_CREATE TIMESTAMPNOTES BLOB

)This table contains 100000 records which will be updated during the testThe stored procedure and generators are used to fill the table with test dataYou can increase the quantity of records in the test table by calling thestored procedure to insert them

SELECT FROM INSERTRECS(1000)

The second table TEST2DROP has the same structure as the first and is filledwith the same records as TESTINSERT INTO TEST2DROP SELECT FROM TESTAs you will see the second table will be dropped immediately after connectWe are just using it as a way to increase database size cheaply the pagesoccupied by the TEST2DROP table will be released for reuse after we dropthe table With this trick we avoid the impact of database file growth on thetest results

Setting test environment

All that is needed to perform this test is the trial installation package of Inter-Base 751 the test database and an SQL script

Download the InterBase 751 trialversion from wwwborlandcomThe installation process is obviousand well-described in the InterBasedocumentation

You can download a backup of thetest database ready to use fromhttpwwwibdevelopercomissue2testspbackupzip (~4 Mb) oralternatively an SQL script for cre-ating it fromhttpwwwibdevelopercomissue2testspdatabasescriptzip (~1Kb)

If you download the databasebackup the test tables are alreadypopulated with records and youcan proceed straight to the sectionldquoPreparing to testrdquo below

If you choose instead to use theSQL script you will create data-base yourself Make sure youinsert 100000 records into tableTEST using the INSERTRECS storedprocedure and then copy all ofthem to TEST2DROP three or fourtimes

After that perform a backup of thisdatabase and you will be on thesame position as if you had down-

TestBed

14

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

loaded ldquoready for userdquo backup

Hardware is not a material issue for these tests since we are only comparing performance with and without the NOSAVEPOINT option Our test platform was a modest computer with Pentium-4 2 GHz with 512 RAM and an 80GBSamsung HDD

Preparing to test

A separate copy of the test database is used for each test case in order to eliminate any interference between state-ments We create four fresh copies of the database for this purpose Supposing all files are in a directory calledCTEST simply create the four test databases from your test backup file

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp1ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp2ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp3ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp4ib

SQL test scripts

The first script tests a regular one-pass update without the NO SAVEPOINT option

For convenience the important commands are clarified with comments

connect CTESTtestsp1ib USER SYSDBA Password masterkey Connect drop table TEST2DROP Drop table TEST2DROP to free database pagescommitselect count() from test Walk down all records in TEST to place them into cachecommitset time on enable time statistics for performed statementsset stat on enables writesfetchesmemory statisticscommit perform bulk update of all records in TEST table update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommit

quit The second script tests performance for the same UPDATE with the NO SAVEPOINT option

TestBed

15

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

connect Ctestsp2ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitSET TRANSACTION NO SAVEPOINT enable NO SAVEPOINTupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit

Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script both scripts arethe same simply testing the behavior of engine in case of the single bulk UPDATE

To test sequential UPDATEs we added several UPDATE statements--we recommend using five The script for testing without NO SAVEPOINT would be

connect Etestsp3ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit

Fyracle 089

Janus has released anew version of Oracle-mode Firebird FyracleFyracle is a specialized

build of Firebird 15it adds temporary

tables hierarchicalqueries and

a PLSQL engine

Version 089 addssupport for storedprocedures written

in Java

Fyracle dramaticallyreduces the cost of port-ing Oracle-based appli-

cations to Firebird

Common usageincludes the

Compiere open sourceERP package

mid-market deploymentsof Developer

2000 applicationsand demo CDsof applications

without license trouble

Read more at

wwwjanus-soft-warecom

News amp Events

IBAnalyst 19

IBSurgeon has issued newversion of IBAnalyst

Now it can better analyzeInterBase or Firebird data-base statistics with usingmetadata information (in thiscase connection to databaseis required)

IBAnalyst is a tool that assistsa user to analyze in detailFirebird or InterBase data-base statistics and identifypossible problems with data-base performance mainte-nance and how an applica-tion interacts with the data-base

It graphically displays data-base statistics and can thenautomatically make intelli-gent suggestions aboutimproving database per-formance and databasemaintenance

wwwibsurgeoncomnewshtml

News amp Events

You can download all the scripts and the raw results of their execution from this locationhttpwwwibdevelopercomissue2testresultszip

TestBed

16

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

How to perform the test

The easiest way to perform the test is to use isqls INPUT command

Suppose you have the scripts located in ctestscripts

gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql

Test resultsThe single bulk UPDATE

First letrsquos perform the test where the single-pass bulk UPDATE is performed

This is an excerpt from the one-pass script with default transaction settings

SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980

SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3

Fast Report 319

The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains

Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors

TrialBuy Now

News amp Events

TECT Softwarepresents

New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet

SPGen Stored Procedure Gen-erator for Firebird and Inter-

Base creates a standard set ofstored procs for any table full

details can be found here

httpwwwtectsoftnetProductsFirebird

FIBSPGenaspx

And FBMail Firebird EMail(FBMail) is a cross platform PHP

utility which easily allows thesending of emails direct from

within a database

This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected

to the internet

Full detailscan be found here

httpwwwtectsoftnetProductsFirebird

FBMailaspx

News amp Events

This is an excerpt from the one-pass script with NO SAVEPOINT enabled

CopyTiger 100

CopyTigera new productfrom Microtecis an advanced

databasereplication amp

synchronisationtool for

InterBaseFirebird

powered bytheir CopyCatcomponent set(see the articleabout CopyCat

in this issue)

For moreinformation about

CopyTiger seewwwmicrotecfr

copycatct

Downloadan evaluation

version

TestBed

17

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154

SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3

Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not

Sequential bulk UPDATE statements

With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large

Table 1

Test results for 5 sequentalUPDATEs

News amp Events

SQLHammer 15Enterprise Edition

The interestingdeveloper toolSQLHammer

now hasan Enterprise Edition

SQLHammer isa high-performance tool

for rapiddatabase development

and administration

It includes a commonintegrated development

environmentregistered custom

componentsbased on the

Borland Package Librarymechanism

and an Open Tools APIwritten in

DelphiObject Pascal

wwwmetadataforgecom

News amp Events

TestBed

18

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

For convenience the results are tabulated above

The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT

Figure 1

Time to perform UPDATEs with and without NO SAVEPOINT

Table 1

Test results for 5 sequental UPDATEs

In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used

The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters

Figure 2

Memory usage while performing UPDATEs with and without NO SAVEPOINT

With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below

Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains

A few words about versions

You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The

TestBed

19

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

old version does not disappearimmediately but is retained as abackversion

In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk

The UNDO log concept

You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested

A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point

So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources

The NO SAVEPOINToption

The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)

Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes

The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management

First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see

ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)

Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions

Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes

Initial situation

Consider the implementation details of the undo-log Figure 3 shows the initial situation

Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record

Figure 3

Initial situation before any UPDATE - only the one record version exists Undo log is empty

The first UPDATE

The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager

It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk

TestBed

20

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 4

UPDATE1 create small delta version on disk and put record number into UNDO log

This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled

The second UPDATE

When the second UPDATE statement is performed on the same set of records we have adifferent situation

Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used

To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory

The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required

Figure 5

The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log

This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters

When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first

The third UPDATE

The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further

TestBed

21

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 6

The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one

Figure 7

If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint

Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)

Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log

A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know

The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE

Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit

BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log

For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7

In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope

SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping

Development area

22

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article

The sources in the bibliography arelisted in the order of their appear-ance in my mind

The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability

of all the methods taken from thearticles and of course I have testedall my own methods

Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise

I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc

And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle

The problem state-ment

What is the problem

Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common

1Everyone got used to them andfelt comfortable with them

2They are quite fast (if you usethem according to certain knownstandards)

3They use SQL which is an easycomprehensible and time-proveddata manipulation method

4They are based upon a strongmathematical theory

5They are convenient for applica-

tion development - if you developyour applications just like 20-30years ago

As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and

Author Vladimir Kotlyarevskyvladcontekru

Development area

23

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip

As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored

What do we need

The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing

RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task

The OID

All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it

First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have

any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database

OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex

Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)

Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world

And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones

Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles

The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers

ClassId

All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is

Development area

24

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain

Name Description cre-ation_date change_dateowner

Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect

links is often a very complicatedtask to put it mildly

Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later

The OBJECTS Table

Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure

So let us design this table

Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)

The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)

Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method

Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage

There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student

create domain TOID as integer not nullcreate table OBJECTS(

OID TOID primary keyClassId TOIDName varchar(32)

A simple plain dictionary can be retrieved from the OBJECTStable by the following query

select OID Name Description from OBJECTS where ClassId = LookupId

Development area

25

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description

from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)

Later I will demonstrate how to add alittle more intelligence to such simpledictionaries

Storing of more complexobjects

It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping

Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders

create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))

which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to

ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query

select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id

As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did

If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M

create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))

One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present

Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database

Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this

create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))

connected 1M with OBJECTS by OID There is also a table

create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))

which describes type metadata ndash an attribute set (their names and types) for eachobject type

All attributes for particular object where the OID is known are retrieved by the query

select attribute_id value from attributeswhere OID = oid

or with names of attributes

select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id

Development area

26

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries

select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id

Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query

Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the

database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base

The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1

Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage

Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB

It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-

utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation

To be continuedhellip

FastReport 3 - new generation of the reporting tools

Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix

reports support support most popular DB-engines

Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags

(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)

Access to DB fields styles text flow URLs Anchors

httpwwwfast-reportcom

Development area

27

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

Replicating and synchronizingInterBaseFirebirddatabases using CopyCat

AuthorJonathan Neve

jonathanmicrotecfr

wwwmicrotecfr

PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication

Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases

Data logging

Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)

Multi-node replication

Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after

In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it

replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)

When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it

Two-way replication

One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed

The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change

Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself

Primary key synchronization

One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-

Development area

28

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values

Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server

When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique

Conflict management

Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-

sion of the record this is a conflict

CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved

This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently

Difficult database structures

There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node

Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged

To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an

Development area

29

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable

PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms

1 As a set of Delphi C++ Builder components

2 As a standalone Win32 replication tool

We will now present each one of these products

1 CopyCat Delphi C++ Builder Component Set

CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions

Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server

Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table

Below is a concise guide for getting started with the CopyCat component suite

Download and install the evaluation components fromhttpwwwmicrotecfrcopycat

Prepare databases

1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components

2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)

3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo

4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base

5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo

Development area

30

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data

6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority

7) Apply all the generated SQL to all databases that should be replicated

8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)

Replicate

1 In Delphi open the ldquoReplicatorrdquo example project

2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property

3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases

4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)

5 Compile and run the example

6 Press the ldquoReplicate nowrdquo button

2 CopyTiger the CopyCat Win32standalone replication tool

For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER

Features include

bull Easy to use installer

bull Independent server Administrator tool (CTAdmin)

bull Configuration wizard for setting up links to master slave databases

bull Robust replication engine based on Microtec CopyCat

bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections

bull Simple amp intuitive control panel

bullAutomatic email notification on certain events (conflicts PK violations etc)

Visit the CopyTiger homepage to download a time-limited trial version

httpwwwmicrotecfrcopycatct

SUMMARY

In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit

1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications

2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure

CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool

You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr

As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold

Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code

ibd-852-150

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 10: The InterBase and Firebird Developer Magazine, Issue 2, 2005

Server internals

11

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Inside BLOBs

How the serverworks with BLOBs

The BLOB data type is intended forstoring data of variable size Fieldsof BLOB type allow for storage ofdata that cannot be placed in fieldsof other types - for example pic-tures audio files video fragmentsetc

From the point view of the databaseapplication developer using BLOBfields is as transparent as it is forother field types (see chapter ldquoDatatypesrdquo for details) However thereis a significant difference betweenthe internal implementation mecha-nism for BLOBs and that for otherdata

Unlike the mechanism used for han-dling other types of fields the data-base engine uses a special mecha-nism to work with BLOB fields Thismechanism is transparently inte-grated with other record handlingat the application level and at thesame time has its own means ofpage organization Lets consider indetail how the BLOB-handlingmechanism works

Inside BLOBsThis is an excerpt from the book ldquo1000 InterBase amp Firebird Tips amp Tricksrdquoby Alexey Kovyazin and Dmitri Kouzmenko which will be published in 2006

Initially the basic record data onthe data page includes a referenceto a ldquoBLOB recordrdquo for each non-null BLOB field ie to record-likestructure or quasi-record that actu-

ally contains the BLOB dataDepending on the size of the BLOBthis BLOB-record will be one ofthree types

The first type is the simplest If thesize of BLOB-field data is less thanthe free space on the data page it isplaced on the data page as a sepa-rate record of BLOB type

Author Dmitri Kouzmenkokdvib-aidcom

Author Alexey Kovyazinakib-aidcom

Server internals

12

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Inside BLOBs

The second type is used when thesize of BLOB is greater than thefree space on the page In thiscase references to pages contain-ing the actual BLOB data arestored in a quasi-record Thus atwo-level structure of BLOB-fielddata is used

If the size of BLOB-field contents isvery large a three-level structureis used ndash a quasi-record stores ref-erences to BLOB pointer pageswhich contain references to theactual BLOB data

The whole structure of BLOB stor-age (except for the quasi-recordof course) is implemented by onepage type ndash the BLOB page typeDifferent types of BLOB-pages dif-fer from each other in the presenceof a flag (value 0 or 1) defininghow the server should interpret thegiven page

BLOB Page

The blob page consists of the fol-lowing parts

The special header contains thefollowing information

bullThe number of the first blob pagein this blob It is used to check thatpages belong to one blob

bullA sequence number This isimportant in checking the integrityof a BLOB For a BLOB pointerpage it is equal to zero

bullThe length of data on a page Asa page may or may not be filled tothe full extent the length of actualdata is indicated in the header

Maximum BLOB size

As the internal structure for storingBLOB data can have only 3 levelsof organization and the size ofdata page is also limited it is pos-sible to calculate the maximum sizeof a BLOB

However this is a theoretical limit(if you want you can calculate it)but in practice the limit will bemuch lower The reason for thislower limit is that the length ofBLOB-field data is determined by avariable of ULONG type ie itsmaximal size will be equal to 4gigabytes

Moreover in reality this practicallimit is reduced if a UDF is to beused for BLOB processing Aninternal UDF implementationassumes that the maximum BLOB

size will be 2 gigabytes So if youplan to have very large BLOBfields in your database you shouldexperiment with storing data of alarge size beforehand

The segment size mystery

Developers of database applica-tions often ask what the SegmentSize parameter in the definition ofa BLOB is why we need it andwhether or not we should set itwhen creating Blob-fields

In reality there is no need to setthis parameter Actually it is a bitof a relic used by the GPRE utilitywhen pre-processing EmbeddedSQL When working with BLOBsGPRE declares a buffer of speci-fied size based on the segmentsize Setting the segment size hasno influence over the allocationand the size of segments whenstoring the BLOB on disk It alsohas no influence on performanceTherefore the segment size can besafely set to any value but it is setto 80 bytes by default

Information for those who want toknow everything the number 80was chosen because 80 symbolscould be allocated in alphanumer-ic terminals

TestBed

13

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

In issue 1 we published DmitriYemanovrsquos article about the inter-nals of savepoints While that articlewas still on the desk Borlandannounced the release of InterBase751 introducing amongst otherthings a NO SAVEPOINT optionfor transaction management Is thisan important improvement for Inter-Base We decided to give thisimplementation a close look and testit some to discover what it is allabout

Testing NO SAVEPOINT

In order to analyze the problem thatthe new transaction option wasintended to address and to assessits real value we performed severalvery simple SQL tests The tests areall 100 reproducible so you willbe able to verify our results easily

Database for testing

The test database file was created inInterBase 751 page size = 4096character encoding is NONE Itcontains two tables one stored pro-cedure and three generators

For the test we will use only one

Testing the NO SAVEPOINTfeature in InterBase 751 Author Alexey Kovyazin

akib-aidcom

Author Vlad Horsunhvladuserssourceforgenet

table with the following structure

CREATE TABLE TEST (ID NUMERIC(182)NAME VARCHAR(120)DESCRIPTION VARCHAR(250)CNT INTEGERQRT DOUBLE PRECISIONTS_CHANGE TIMESTAMPTS_CREATE TIMESTAMPNOTES BLOB

)This table contains 100000 records which will be updated during the testThe stored procedure and generators are used to fill the table with test dataYou can increase the quantity of records in the test table by calling thestored procedure to insert them

SELECT FROM INSERTRECS(1000)

The second table TEST2DROP has the same structure as the first and is filledwith the same records as TESTINSERT INTO TEST2DROP SELECT FROM TESTAs you will see the second table will be dropped immediately after connectWe are just using it as a way to increase database size cheaply the pagesoccupied by the TEST2DROP table will be released for reuse after we dropthe table With this trick we avoid the impact of database file growth on thetest results

Setting test environment

All that is needed to perform this test is the trial installation package of Inter-Base 751 the test database and an SQL script

Download the InterBase 751 trialversion from wwwborlandcomThe installation process is obviousand well-described in the InterBasedocumentation

You can download a backup of thetest database ready to use fromhttpwwwibdevelopercomissue2testspbackupzip (~4 Mb) oralternatively an SQL script for cre-ating it fromhttpwwwibdevelopercomissue2testspdatabasescriptzip (~1Kb)

If you download the databasebackup the test tables are alreadypopulated with records and youcan proceed straight to the sectionldquoPreparing to testrdquo below

If you choose instead to use theSQL script you will create data-base yourself Make sure youinsert 100000 records into tableTEST using the INSERTRECS storedprocedure and then copy all ofthem to TEST2DROP three or fourtimes

After that perform a backup of thisdatabase and you will be on thesame position as if you had down-

TestBed

14

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

loaded ldquoready for userdquo backup

Hardware is not a material issue for these tests since we are only comparing performance with and without the NOSAVEPOINT option Our test platform was a modest computer with Pentium-4 2 GHz with 512 RAM and an 80GBSamsung HDD

Preparing to test

A separate copy of the test database is used for each test case in order to eliminate any interference between state-ments We create four fresh copies of the database for this purpose Supposing all files are in a directory calledCTEST simply create the four test databases from your test backup file

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp1ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp2ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp3ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp4ib

SQL test scripts

The first script tests a regular one-pass update without the NO SAVEPOINT option

For convenience the important commands are clarified with comments

connect CTESTtestsp1ib USER SYSDBA Password masterkey Connect drop table TEST2DROP Drop table TEST2DROP to free database pagescommitselect count() from test Walk down all records in TEST to place them into cachecommitset time on enable time statistics for performed statementsset stat on enables writesfetchesmemory statisticscommit perform bulk update of all records in TEST table update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommit

quit The second script tests performance for the same UPDATE with the NO SAVEPOINT option

TestBed

15

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

connect Ctestsp2ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitSET TRANSACTION NO SAVEPOINT enable NO SAVEPOINTupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit

Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script both scripts arethe same simply testing the behavior of engine in case of the single bulk UPDATE

To test sequential UPDATEs we added several UPDATE statements--we recommend using five The script for testing without NO SAVEPOINT would be

connect Etestsp3ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit

Fyracle 089

Janus has released anew version of Oracle-mode Firebird FyracleFyracle is a specialized

build of Firebird 15it adds temporary

tables hierarchicalqueries and

a PLSQL engine

Version 089 addssupport for storedprocedures written

in Java

Fyracle dramaticallyreduces the cost of port-ing Oracle-based appli-

cations to Firebird

Common usageincludes the

Compiere open sourceERP package

mid-market deploymentsof Developer

2000 applicationsand demo CDsof applications

without license trouble

Read more at

wwwjanus-soft-warecom

News amp Events

IBAnalyst 19

IBSurgeon has issued newversion of IBAnalyst

Now it can better analyzeInterBase or Firebird data-base statistics with usingmetadata information (in thiscase connection to databaseis required)

IBAnalyst is a tool that assistsa user to analyze in detailFirebird or InterBase data-base statistics and identifypossible problems with data-base performance mainte-nance and how an applica-tion interacts with the data-base

It graphically displays data-base statistics and can thenautomatically make intelli-gent suggestions aboutimproving database per-formance and databasemaintenance

wwwibsurgeoncomnewshtml

News amp Events

You can download all the scripts and the raw results of their execution from this locationhttpwwwibdevelopercomissue2testresultszip

TestBed

16

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

How to perform the test

The easiest way to perform the test is to use isqls INPUT command

Suppose you have the scripts located in ctestscripts

gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql

Test resultsThe single bulk UPDATE

First letrsquos perform the test where the single-pass bulk UPDATE is performed

This is an excerpt from the one-pass script with default transaction settings

SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980

SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3

Fast Report 319

The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains

Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors

TrialBuy Now

News amp Events

TECT Softwarepresents

New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet

SPGen Stored Procedure Gen-erator for Firebird and Inter-

Base creates a standard set ofstored procs for any table full

details can be found here

httpwwwtectsoftnetProductsFirebird

FIBSPGenaspx

And FBMail Firebird EMail(FBMail) is a cross platform PHP

utility which easily allows thesending of emails direct from

within a database

This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected

to the internet

Full detailscan be found here

httpwwwtectsoftnetProductsFirebird

FBMailaspx

News amp Events

This is an excerpt from the one-pass script with NO SAVEPOINT enabled

CopyTiger 100

CopyTigera new productfrom Microtecis an advanced

databasereplication amp

synchronisationtool for

InterBaseFirebird

powered bytheir CopyCatcomponent set(see the articleabout CopyCat

in this issue)

For moreinformation about

CopyTiger seewwwmicrotecfr

copycatct

Downloadan evaluation

version

TestBed

17

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154

SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3

Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not

Sequential bulk UPDATE statements

With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large

Table 1

Test results for 5 sequentalUPDATEs

News amp Events

SQLHammer 15Enterprise Edition

The interestingdeveloper toolSQLHammer

now hasan Enterprise Edition

SQLHammer isa high-performance tool

for rapiddatabase development

and administration

It includes a commonintegrated development

environmentregistered custom

componentsbased on the

Borland Package Librarymechanism

and an Open Tools APIwritten in

DelphiObject Pascal

wwwmetadataforgecom

News amp Events

TestBed

18

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

For convenience the results are tabulated above

The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT

Figure 1

Time to perform UPDATEs with and without NO SAVEPOINT

Table 1

Test results for 5 sequental UPDATEs

In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used

The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters

Figure 2

Memory usage while performing UPDATEs with and without NO SAVEPOINT

With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below

Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains

A few words about versions

You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The

TestBed

19

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

old version does not disappearimmediately but is retained as abackversion

In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk

The UNDO log concept

You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested

A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point

So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources

The NO SAVEPOINToption

The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)

Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes

The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management

First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see

ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)

Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions

Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes

Initial situation

Consider the implementation details of the undo-log Figure 3 shows the initial situation

Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record

Figure 3

Initial situation before any UPDATE - only the one record version exists Undo log is empty

The first UPDATE

The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager

It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk

TestBed

20

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 4

UPDATE1 create small delta version on disk and put record number into UNDO log

This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled

The second UPDATE

When the second UPDATE statement is performed on the same set of records we have adifferent situation

Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used

To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory

The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required

Figure 5

The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log

This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters

When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first

The third UPDATE

The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further

TestBed

21

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 6

The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one

Figure 7

If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint

Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)

Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log

A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know

The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE

Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit

BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log

For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7

In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope

SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping

Development area

22

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article

The sources in the bibliography arelisted in the order of their appear-ance in my mind

The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability

of all the methods taken from thearticles and of course I have testedall my own methods

Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise

I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc

And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle

The problem state-ment

What is the problem

Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common

1Everyone got used to them andfelt comfortable with them

2They are quite fast (if you usethem according to certain knownstandards)

3They use SQL which is an easycomprehensible and time-proveddata manipulation method

4They are based upon a strongmathematical theory

5They are convenient for applica-

tion development - if you developyour applications just like 20-30years ago

As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and

Author Vladimir Kotlyarevskyvladcontekru

Development area

23

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip

As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored

What do we need

The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing

RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task

The OID

All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it

First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have

any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database

OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex

Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)

Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world

And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones

Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles

The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers

ClassId

All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is

Development area

24

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain

Name Description cre-ation_date change_dateowner

Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect

links is often a very complicatedtask to put it mildly

Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later

The OBJECTS Table

Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure

So let us design this table

Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)

The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)

Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method

Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage

There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student

create domain TOID as integer not nullcreate table OBJECTS(

OID TOID primary keyClassId TOIDName varchar(32)

A simple plain dictionary can be retrieved from the OBJECTStable by the following query

select OID Name Description from OBJECTS where ClassId = LookupId

Development area

25

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description

from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)

Later I will demonstrate how to add alittle more intelligence to such simpledictionaries

Storing of more complexobjects

It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping

Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders

create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))

which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to

ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query

select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id

As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did

If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M

create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))

One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present

Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database

Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this

create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))

connected 1M with OBJECTS by OID There is also a table

create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))

which describes type metadata ndash an attribute set (their names and types) for eachobject type

All attributes for particular object where the OID is known are retrieved by the query

select attribute_id value from attributeswhere OID = oid

or with names of attributes

select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id

Development area

26

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries

select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id

Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query

Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the

database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base

The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1

Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage

Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB

It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-

utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation

To be continuedhellip

FastReport 3 - new generation of the reporting tools

Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix

reports support support most popular DB-engines

Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags

(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)

Access to DB fields styles text flow URLs Anchors

httpwwwfast-reportcom

Development area

27

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

Replicating and synchronizingInterBaseFirebirddatabases using CopyCat

AuthorJonathan Neve

jonathanmicrotecfr

wwwmicrotecfr

PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication

Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases

Data logging

Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)

Multi-node replication

Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after

In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it

replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)

When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it

Two-way replication

One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed

The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change

Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself

Primary key synchronization

One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-

Development area

28

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values

Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server

When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique

Conflict management

Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-

sion of the record this is a conflict

CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved

This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently

Difficult database structures

There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node

Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged

To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an

Development area

29

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable

PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms

1 As a set of Delphi C++ Builder components

2 As a standalone Win32 replication tool

We will now present each one of these products

1 CopyCat Delphi C++ Builder Component Set

CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions

Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server

Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table

Below is a concise guide for getting started with the CopyCat component suite

Download and install the evaluation components fromhttpwwwmicrotecfrcopycat

Prepare databases

1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components

2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)

3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo

4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base

5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo

Development area

30

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data

6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority

7) Apply all the generated SQL to all databases that should be replicated

8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)

Replicate

1 In Delphi open the ldquoReplicatorrdquo example project

2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property

3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases

4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)

5 Compile and run the example

6 Press the ldquoReplicate nowrdquo button

2 CopyTiger the CopyCat Win32standalone replication tool

For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER

Features include

bull Easy to use installer

bull Independent server Administrator tool (CTAdmin)

bull Configuration wizard for setting up links to master slave databases

bull Robust replication engine based on Microtec CopyCat

bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections

bull Simple amp intuitive control panel

bullAutomatic email notification on certain events (conflicts PK violations etc)

Visit the CopyTiger homepage to download a time-limited trial version

httpwwwmicrotecfrcopycatct

SUMMARY

In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit

1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications

2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure

CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool

You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr

As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold

Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code

ibd-852-150

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 11: The InterBase and Firebird Developer Magazine, Issue 2, 2005

Server internals

12

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Inside BLOBs

The second type is used when thesize of BLOB is greater than thefree space on the page In thiscase references to pages contain-ing the actual BLOB data arestored in a quasi-record Thus atwo-level structure of BLOB-fielddata is used

If the size of BLOB-field contents isvery large a three-level structureis used ndash a quasi-record stores ref-erences to BLOB pointer pageswhich contain references to theactual BLOB data

The whole structure of BLOB stor-age (except for the quasi-recordof course) is implemented by onepage type ndash the BLOB page typeDifferent types of BLOB-pages dif-fer from each other in the presenceof a flag (value 0 or 1) defininghow the server should interpret thegiven page

BLOB Page

The blob page consists of the fol-lowing parts

The special header contains thefollowing information

bullThe number of the first blob pagein this blob It is used to check thatpages belong to one blob

bullA sequence number This isimportant in checking the integrityof a BLOB For a BLOB pointerpage it is equal to zero

bullThe length of data on a page Asa page may or may not be filled tothe full extent the length of actualdata is indicated in the header

Maximum BLOB size

As the internal structure for storingBLOB data can have only 3 levelsof organization and the size ofdata page is also limited it is pos-sible to calculate the maximum sizeof a BLOB

However this is a theoretical limit(if you want you can calculate it)but in practice the limit will bemuch lower The reason for thislower limit is that the length ofBLOB-field data is determined by avariable of ULONG type ie itsmaximal size will be equal to 4gigabytes

Moreover in reality this practicallimit is reduced if a UDF is to beused for BLOB processing Aninternal UDF implementationassumes that the maximum BLOB

size will be 2 gigabytes So if youplan to have very large BLOBfields in your database you shouldexperiment with storing data of alarge size beforehand

The segment size mystery

Developers of database applica-tions often ask what the SegmentSize parameter in the definition ofa BLOB is why we need it andwhether or not we should set itwhen creating Blob-fields

In reality there is no need to setthis parameter Actually it is a bitof a relic used by the GPRE utilitywhen pre-processing EmbeddedSQL When working with BLOBsGPRE declares a buffer of speci-fied size based on the segmentsize Setting the segment size hasno influence over the allocationand the size of segments whenstoring the BLOB on disk It alsohas no influence on performanceTherefore the segment size can besafely set to any value but it is setto 80 bytes by default

Information for those who want toknow everything the number 80was chosen because 80 symbolscould be allocated in alphanumer-ic terminals

TestBed

13

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

In issue 1 we published DmitriYemanovrsquos article about the inter-nals of savepoints While that articlewas still on the desk Borlandannounced the release of InterBase751 introducing amongst otherthings a NO SAVEPOINT optionfor transaction management Is thisan important improvement for Inter-Base We decided to give thisimplementation a close look and testit some to discover what it is allabout

Testing NO SAVEPOINT

In order to analyze the problem thatthe new transaction option wasintended to address and to assessits real value we performed severalvery simple SQL tests The tests areall 100 reproducible so you willbe able to verify our results easily

Database for testing

The test database file was created inInterBase 751 page size = 4096character encoding is NONE Itcontains two tables one stored pro-cedure and three generators

For the test we will use only one

Testing the NO SAVEPOINTfeature in InterBase 751 Author Alexey Kovyazin

akib-aidcom

Author Vlad Horsunhvladuserssourceforgenet

table with the following structure

CREATE TABLE TEST (ID NUMERIC(182)NAME VARCHAR(120)DESCRIPTION VARCHAR(250)CNT INTEGERQRT DOUBLE PRECISIONTS_CHANGE TIMESTAMPTS_CREATE TIMESTAMPNOTES BLOB

)This table contains 100000 records which will be updated during the testThe stored procedure and generators are used to fill the table with test dataYou can increase the quantity of records in the test table by calling thestored procedure to insert them

SELECT FROM INSERTRECS(1000)

The second table TEST2DROP has the same structure as the first and is filledwith the same records as TESTINSERT INTO TEST2DROP SELECT FROM TESTAs you will see the second table will be dropped immediately after connectWe are just using it as a way to increase database size cheaply the pagesoccupied by the TEST2DROP table will be released for reuse after we dropthe table With this trick we avoid the impact of database file growth on thetest results

Setting test environment

All that is needed to perform this test is the trial installation package of Inter-Base 751 the test database and an SQL script

Download the InterBase 751 trialversion from wwwborlandcomThe installation process is obviousand well-described in the InterBasedocumentation

You can download a backup of thetest database ready to use fromhttpwwwibdevelopercomissue2testspbackupzip (~4 Mb) oralternatively an SQL script for cre-ating it fromhttpwwwibdevelopercomissue2testspdatabasescriptzip (~1Kb)

If you download the databasebackup the test tables are alreadypopulated with records and youcan proceed straight to the sectionldquoPreparing to testrdquo below

If you choose instead to use theSQL script you will create data-base yourself Make sure youinsert 100000 records into tableTEST using the INSERTRECS storedprocedure and then copy all ofthem to TEST2DROP three or fourtimes

After that perform a backup of thisdatabase and you will be on thesame position as if you had down-

TestBed

14

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

loaded ldquoready for userdquo backup

Hardware is not a material issue for these tests since we are only comparing performance with and without the NOSAVEPOINT option Our test platform was a modest computer with Pentium-4 2 GHz with 512 RAM and an 80GBSamsung HDD

Preparing to test

A separate copy of the test database is used for each test case in order to eliminate any interference between state-ments We create four fresh copies of the database for this purpose Supposing all files are in a directory calledCTEST simply create the four test databases from your test backup file

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp1ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp2ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp3ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp4ib

SQL test scripts

The first script tests a regular one-pass update without the NO SAVEPOINT option

For convenience the important commands are clarified with comments

connect CTESTtestsp1ib USER SYSDBA Password masterkey Connect drop table TEST2DROP Drop table TEST2DROP to free database pagescommitselect count() from test Walk down all records in TEST to place them into cachecommitset time on enable time statistics for performed statementsset stat on enables writesfetchesmemory statisticscommit perform bulk update of all records in TEST table update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommit

quit The second script tests performance for the same UPDATE with the NO SAVEPOINT option

TestBed

15

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

connect Ctestsp2ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitSET TRANSACTION NO SAVEPOINT enable NO SAVEPOINTupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit

Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script both scripts arethe same simply testing the behavior of engine in case of the single bulk UPDATE

To test sequential UPDATEs we added several UPDATE statements--we recommend using five The script for testing without NO SAVEPOINT would be

connect Etestsp3ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit

Fyracle 089

Janus has released anew version of Oracle-mode Firebird FyracleFyracle is a specialized

build of Firebird 15it adds temporary

tables hierarchicalqueries and

a PLSQL engine

Version 089 addssupport for storedprocedures written

in Java

Fyracle dramaticallyreduces the cost of port-ing Oracle-based appli-

cations to Firebird

Common usageincludes the

Compiere open sourceERP package

mid-market deploymentsof Developer

2000 applicationsand demo CDsof applications

without license trouble

Read more at

wwwjanus-soft-warecom

News amp Events

IBAnalyst 19

IBSurgeon has issued newversion of IBAnalyst

Now it can better analyzeInterBase or Firebird data-base statistics with usingmetadata information (in thiscase connection to databaseis required)

IBAnalyst is a tool that assistsa user to analyze in detailFirebird or InterBase data-base statistics and identifypossible problems with data-base performance mainte-nance and how an applica-tion interacts with the data-base

It graphically displays data-base statistics and can thenautomatically make intelli-gent suggestions aboutimproving database per-formance and databasemaintenance

wwwibsurgeoncomnewshtml

News amp Events

You can download all the scripts and the raw results of their execution from this locationhttpwwwibdevelopercomissue2testresultszip

TestBed

16

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

How to perform the test

The easiest way to perform the test is to use isqls INPUT command

Suppose you have the scripts located in ctestscripts

gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql

Test resultsThe single bulk UPDATE

First letrsquos perform the test where the single-pass bulk UPDATE is performed

This is an excerpt from the one-pass script with default transaction settings

SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980

SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3

Fast Report 319

The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains

Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors

TrialBuy Now

News amp Events

TECT Softwarepresents

New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet

SPGen Stored Procedure Gen-erator for Firebird and Inter-

Base creates a standard set ofstored procs for any table full

details can be found here

httpwwwtectsoftnetProductsFirebird

FIBSPGenaspx

And FBMail Firebird EMail(FBMail) is a cross platform PHP

utility which easily allows thesending of emails direct from

within a database

This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected

to the internet

Full detailscan be found here

httpwwwtectsoftnetProductsFirebird

FBMailaspx

News amp Events

This is an excerpt from the one-pass script with NO SAVEPOINT enabled

CopyTiger 100

CopyTigera new productfrom Microtecis an advanced

databasereplication amp

synchronisationtool for

InterBaseFirebird

powered bytheir CopyCatcomponent set(see the articleabout CopyCat

in this issue)

For moreinformation about

CopyTiger seewwwmicrotecfr

copycatct

Downloadan evaluation

version

TestBed

17

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154

SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3

Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not

Sequential bulk UPDATE statements

With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large

Table 1

Test results for 5 sequentalUPDATEs

News amp Events

SQLHammer 15Enterprise Edition

The interestingdeveloper toolSQLHammer

now hasan Enterprise Edition

SQLHammer isa high-performance tool

for rapiddatabase development

and administration

It includes a commonintegrated development

environmentregistered custom

componentsbased on the

Borland Package Librarymechanism

and an Open Tools APIwritten in

DelphiObject Pascal

wwwmetadataforgecom

News amp Events

TestBed

18

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

For convenience the results are tabulated above

The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT

Figure 1

Time to perform UPDATEs with and without NO SAVEPOINT

Table 1

Test results for 5 sequental UPDATEs

In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used

The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters

Figure 2

Memory usage while performing UPDATEs with and without NO SAVEPOINT

With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below

Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains

A few words about versions

You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The

TestBed

19

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

old version does not disappearimmediately but is retained as abackversion

In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk

The UNDO log concept

You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested

A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point

So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources

The NO SAVEPOINToption

The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)

Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes

The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management

First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see

ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)

Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions

Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes

Initial situation

Consider the implementation details of the undo-log Figure 3 shows the initial situation

Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record

Figure 3

Initial situation before any UPDATE - only the one record version exists Undo log is empty

The first UPDATE

The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager

It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk

TestBed

20

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 4

UPDATE1 create small delta version on disk and put record number into UNDO log

This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled

The second UPDATE

When the second UPDATE statement is performed on the same set of records we have adifferent situation

Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used

To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory

The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required

Figure 5

The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log

This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters

When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first

The third UPDATE

The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further

TestBed

21

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 6

The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one

Figure 7

If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint

Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)

Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log

A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know

The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE

Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit

BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log

For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7

In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope

SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping

Development area

22

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article

The sources in the bibliography arelisted in the order of their appear-ance in my mind

The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability

of all the methods taken from thearticles and of course I have testedall my own methods

Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise

I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc

And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle

The problem state-ment

What is the problem

Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common

1Everyone got used to them andfelt comfortable with them

2They are quite fast (if you usethem according to certain knownstandards)

3They use SQL which is an easycomprehensible and time-proveddata manipulation method

4They are based upon a strongmathematical theory

5They are convenient for applica-

tion development - if you developyour applications just like 20-30years ago

As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and

Author Vladimir Kotlyarevskyvladcontekru

Development area

23

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip

As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored

What do we need

The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing

RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task

The OID

All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it

First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have

any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database

OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex

Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)

Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world

And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones

Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles

The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers

ClassId

All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is

Development area

24

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain

Name Description cre-ation_date change_dateowner

Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect

links is often a very complicatedtask to put it mildly

Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later

The OBJECTS Table

Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure

So let us design this table

Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)

The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)

Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method

Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage

There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student

create domain TOID as integer not nullcreate table OBJECTS(

OID TOID primary keyClassId TOIDName varchar(32)

A simple plain dictionary can be retrieved from the OBJECTStable by the following query

select OID Name Description from OBJECTS where ClassId = LookupId

Development area

25

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description

from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)

Later I will demonstrate how to add alittle more intelligence to such simpledictionaries

Storing of more complexobjects

It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping

Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders

create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))

which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to

ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query

select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id

As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did

If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M

create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))

One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present

Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database

Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this

create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))

connected 1M with OBJECTS by OID There is also a table

create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))

which describes type metadata ndash an attribute set (their names and types) for eachobject type

All attributes for particular object where the OID is known are retrieved by the query

select attribute_id value from attributeswhere OID = oid

or with names of attributes

select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id

Development area

26

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries

select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id

Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query

Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the

database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base

The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1

Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage

Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB

It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-

utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation

To be continuedhellip

FastReport 3 - new generation of the reporting tools

Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix

reports support support most popular DB-engines

Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags

(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)

Access to DB fields styles text flow URLs Anchors

httpwwwfast-reportcom

Development area

27

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

Replicating and synchronizingInterBaseFirebirddatabases using CopyCat

AuthorJonathan Neve

jonathanmicrotecfr

wwwmicrotecfr

PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication

Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases

Data logging

Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)

Multi-node replication

Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after

In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it

replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)

When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it

Two-way replication

One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed

The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change

Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself

Primary key synchronization

One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-

Development area

28

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values

Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server

When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique

Conflict management

Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-

sion of the record this is a conflict

CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved

This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently

Difficult database structures

There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node

Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged

To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an

Development area

29

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable

PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms

1 As a set of Delphi C++ Builder components

2 As a standalone Win32 replication tool

We will now present each one of these products

1 CopyCat Delphi C++ Builder Component Set

CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions

Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server

Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table

Below is a concise guide for getting started with the CopyCat component suite

Download and install the evaluation components fromhttpwwwmicrotecfrcopycat

Prepare databases

1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components

2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)

3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo

4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base

5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo

Development area

30

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data

6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority

7) Apply all the generated SQL to all databases that should be replicated

8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)

Replicate

1 In Delphi open the ldquoReplicatorrdquo example project

2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property

3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases

4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)

5 Compile and run the example

6 Press the ldquoReplicate nowrdquo button

2 CopyTiger the CopyCat Win32standalone replication tool

For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER

Features include

bull Easy to use installer

bull Independent server Administrator tool (CTAdmin)

bull Configuration wizard for setting up links to master slave databases

bull Robust replication engine based on Microtec CopyCat

bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections

bull Simple amp intuitive control panel

bullAutomatic email notification on certain events (conflicts PK violations etc)

Visit the CopyTiger homepage to download a time-limited trial version

httpwwwmicrotecfrcopycatct

SUMMARY

In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit

1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications

2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure

CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool

You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr

As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold

Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code

ibd-852-150

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 12: The InterBase and Firebird Developer Magazine, Issue 2, 2005

TestBed

13

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

In issue 1 we published DmitriYemanovrsquos article about the inter-nals of savepoints While that articlewas still on the desk Borlandannounced the release of InterBase751 introducing amongst otherthings a NO SAVEPOINT optionfor transaction management Is thisan important improvement for Inter-Base We decided to give thisimplementation a close look and testit some to discover what it is allabout

Testing NO SAVEPOINT

In order to analyze the problem thatthe new transaction option wasintended to address and to assessits real value we performed severalvery simple SQL tests The tests areall 100 reproducible so you willbe able to verify our results easily

Database for testing

The test database file was created inInterBase 751 page size = 4096character encoding is NONE Itcontains two tables one stored pro-cedure and three generators

For the test we will use only one

Testing the NO SAVEPOINTfeature in InterBase 751 Author Alexey Kovyazin

akib-aidcom

Author Vlad Horsunhvladuserssourceforgenet

table with the following structure

CREATE TABLE TEST (ID NUMERIC(182)NAME VARCHAR(120)DESCRIPTION VARCHAR(250)CNT INTEGERQRT DOUBLE PRECISIONTS_CHANGE TIMESTAMPTS_CREATE TIMESTAMPNOTES BLOB

)This table contains 100000 records which will be updated during the testThe stored procedure and generators are used to fill the table with test dataYou can increase the quantity of records in the test table by calling thestored procedure to insert them

SELECT FROM INSERTRECS(1000)

The second table TEST2DROP has the same structure as the first and is filledwith the same records as TESTINSERT INTO TEST2DROP SELECT FROM TESTAs you will see the second table will be dropped immediately after connectWe are just using it as a way to increase database size cheaply the pagesoccupied by the TEST2DROP table will be released for reuse after we dropthe table With this trick we avoid the impact of database file growth on thetest results

Setting test environment

All that is needed to perform this test is the trial installation package of Inter-Base 751 the test database and an SQL script

Download the InterBase 751 trialversion from wwwborlandcomThe installation process is obviousand well-described in the InterBasedocumentation

You can download a backup of thetest database ready to use fromhttpwwwibdevelopercomissue2testspbackupzip (~4 Mb) oralternatively an SQL script for cre-ating it fromhttpwwwibdevelopercomissue2testspdatabasescriptzip (~1Kb)

If you download the databasebackup the test tables are alreadypopulated with records and youcan proceed straight to the sectionldquoPreparing to testrdquo below

If you choose instead to use theSQL script you will create data-base yourself Make sure youinsert 100000 records into tableTEST using the INSERTRECS storedprocedure and then copy all ofthem to TEST2DROP three or fourtimes

After that perform a backup of thisdatabase and you will be on thesame position as if you had down-

TestBed

14

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

loaded ldquoready for userdquo backup

Hardware is not a material issue for these tests since we are only comparing performance with and without the NOSAVEPOINT option Our test platform was a modest computer with Pentium-4 2 GHz with 512 RAM and an 80GBSamsung HDD

Preparing to test

A separate copy of the test database is used for each test case in order to eliminate any interference between state-ments We create four fresh copies of the database for this purpose Supposing all files are in a directory calledCTEST simply create the four test databases from your test backup file

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp1ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp2ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp3ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp4ib

SQL test scripts

The first script tests a regular one-pass update without the NO SAVEPOINT option

For convenience the important commands are clarified with comments

connect CTESTtestsp1ib USER SYSDBA Password masterkey Connect drop table TEST2DROP Drop table TEST2DROP to free database pagescommitselect count() from test Walk down all records in TEST to place them into cachecommitset time on enable time statistics for performed statementsset stat on enables writesfetchesmemory statisticscommit perform bulk update of all records in TEST table update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommit

quit The second script tests performance for the same UPDATE with the NO SAVEPOINT option

TestBed

15

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

connect Ctestsp2ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitSET TRANSACTION NO SAVEPOINT enable NO SAVEPOINTupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit

Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script both scripts arethe same simply testing the behavior of engine in case of the single bulk UPDATE

To test sequential UPDATEs we added several UPDATE statements--we recommend using five The script for testing without NO SAVEPOINT would be

connect Etestsp3ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit

Fyracle 089

Janus has released anew version of Oracle-mode Firebird FyracleFyracle is a specialized

build of Firebird 15it adds temporary

tables hierarchicalqueries and

a PLSQL engine

Version 089 addssupport for storedprocedures written

in Java

Fyracle dramaticallyreduces the cost of port-ing Oracle-based appli-

cations to Firebird

Common usageincludes the

Compiere open sourceERP package

mid-market deploymentsof Developer

2000 applicationsand demo CDsof applications

without license trouble

Read more at

wwwjanus-soft-warecom

News amp Events

IBAnalyst 19

IBSurgeon has issued newversion of IBAnalyst

Now it can better analyzeInterBase or Firebird data-base statistics with usingmetadata information (in thiscase connection to databaseis required)

IBAnalyst is a tool that assistsa user to analyze in detailFirebird or InterBase data-base statistics and identifypossible problems with data-base performance mainte-nance and how an applica-tion interacts with the data-base

It graphically displays data-base statistics and can thenautomatically make intelli-gent suggestions aboutimproving database per-formance and databasemaintenance

wwwibsurgeoncomnewshtml

News amp Events

You can download all the scripts and the raw results of their execution from this locationhttpwwwibdevelopercomissue2testresultszip

TestBed

16

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

How to perform the test

The easiest way to perform the test is to use isqls INPUT command

Suppose you have the scripts located in ctestscripts

gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql

Test resultsThe single bulk UPDATE

First letrsquos perform the test where the single-pass bulk UPDATE is performed

This is an excerpt from the one-pass script with default transaction settings

SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980

SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3

Fast Report 319

The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains

Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors

TrialBuy Now

News amp Events

TECT Softwarepresents

New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet

SPGen Stored Procedure Gen-erator for Firebird and Inter-

Base creates a standard set ofstored procs for any table full

details can be found here

httpwwwtectsoftnetProductsFirebird

FIBSPGenaspx

And FBMail Firebird EMail(FBMail) is a cross platform PHP

utility which easily allows thesending of emails direct from

within a database

This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected

to the internet

Full detailscan be found here

httpwwwtectsoftnetProductsFirebird

FBMailaspx

News amp Events

This is an excerpt from the one-pass script with NO SAVEPOINT enabled

CopyTiger 100

CopyTigera new productfrom Microtecis an advanced

databasereplication amp

synchronisationtool for

InterBaseFirebird

powered bytheir CopyCatcomponent set(see the articleabout CopyCat

in this issue)

For moreinformation about

CopyTiger seewwwmicrotecfr

copycatct

Downloadan evaluation

version

TestBed

17

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154

SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3

Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not

Sequential bulk UPDATE statements

With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large

Table 1

Test results for 5 sequentalUPDATEs

News amp Events

SQLHammer 15Enterprise Edition

The interestingdeveloper toolSQLHammer

now hasan Enterprise Edition

SQLHammer isa high-performance tool

for rapiddatabase development

and administration

It includes a commonintegrated development

environmentregistered custom

componentsbased on the

Borland Package Librarymechanism

and an Open Tools APIwritten in

DelphiObject Pascal

wwwmetadataforgecom

News amp Events

TestBed

18

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

For convenience the results are tabulated above

The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT

Figure 1

Time to perform UPDATEs with and without NO SAVEPOINT

Table 1

Test results for 5 sequental UPDATEs

In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used

The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters

Figure 2

Memory usage while performing UPDATEs with and without NO SAVEPOINT

With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below

Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains

A few words about versions

You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The

TestBed

19

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

old version does not disappearimmediately but is retained as abackversion

In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk

The UNDO log concept

You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested

A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point

So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources

The NO SAVEPOINToption

The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)

Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes

The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management

First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see

ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)

Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions

Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes

Initial situation

Consider the implementation details of the undo-log Figure 3 shows the initial situation

Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record

Figure 3

Initial situation before any UPDATE - only the one record version exists Undo log is empty

The first UPDATE

The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager

It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk

TestBed

20

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 4

UPDATE1 create small delta version on disk and put record number into UNDO log

This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled

The second UPDATE

When the second UPDATE statement is performed on the same set of records we have adifferent situation

Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used

To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory

The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required

Figure 5

The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log

This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters

When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first

The third UPDATE

The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further

TestBed

21

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 6

The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one

Figure 7

If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint

Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)

Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log

A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know

The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE

Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit

BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log

For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7

In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope

SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping

Development area

22

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article

The sources in the bibliography arelisted in the order of their appear-ance in my mind

The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability

of all the methods taken from thearticles and of course I have testedall my own methods

Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise

I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc

And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle

The problem state-ment

What is the problem

Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common

1Everyone got used to them andfelt comfortable with them

2They are quite fast (if you usethem according to certain knownstandards)

3They use SQL which is an easycomprehensible and time-proveddata manipulation method

4They are based upon a strongmathematical theory

5They are convenient for applica-

tion development - if you developyour applications just like 20-30years ago

As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and

Author Vladimir Kotlyarevskyvladcontekru

Development area

23

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip

As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored

What do we need

The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing

RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task

The OID

All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it

First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have

any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database

OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex

Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)

Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world

And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones

Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles

The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers

ClassId

All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is

Development area

24

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain

Name Description cre-ation_date change_dateowner

Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect

links is often a very complicatedtask to put it mildly

Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later

The OBJECTS Table

Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure

So let us design this table

Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)

The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)

Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method

Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage

There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student

create domain TOID as integer not nullcreate table OBJECTS(

OID TOID primary keyClassId TOIDName varchar(32)

A simple plain dictionary can be retrieved from the OBJECTStable by the following query

select OID Name Description from OBJECTS where ClassId = LookupId

Development area

25

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description

from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)

Later I will demonstrate how to add alittle more intelligence to such simpledictionaries

Storing of more complexobjects

It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping

Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders

create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))

which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to

ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query

select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id

As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did

If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M

create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))

One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present

Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database

Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this

create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))

connected 1M with OBJECTS by OID There is also a table

create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))

which describes type metadata ndash an attribute set (their names and types) for eachobject type

All attributes for particular object where the OID is known are retrieved by the query

select attribute_id value from attributeswhere OID = oid

or with names of attributes

select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id

Development area

26

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries

select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id

Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query

Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the

database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base

The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1

Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage

Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB

It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-

utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation

To be continuedhellip

FastReport 3 - new generation of the reporting tools

Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix

reports support support most popular DB-engines

Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags

(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)

Access to DB fields styles text flow URLs Anchors

httpwwwfast-reportcom

Development area

27

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

Replicating and synchronizingInterBaseFirebirddatabases using CopyCat

AuthorJonathan Neve

jonathanmicrotecfr

wwwmicrotecfr

PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication

Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases

Data logging

Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)

Multi-node replication

Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after

In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it

replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)

When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it

Two-way replication

One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed

The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change

Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself

Primary key synchronization

One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-

Development area

28

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values

Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server

When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique

Conflict management

Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-

sion of the record this is a conflict

CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved

This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently

Difficult database structures

There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node

Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged

To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an

Development area

29

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable

PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms

1 As a set of Delphi C++ Builder components

2 As a standalone Win32 replication tool

We will now present each one of these products

1 CopyCat Delphi C++ Builder Component Set

CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions

Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server

Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table

Below is a concise guide for getting started with the CopyCat component suite

Download and install the evaluation components fromhttpwwwmicrotecfrcopycat

Prepare databases

1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components

2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)

3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo

4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base

5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo

Development area

30

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data

6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority

7) Apply all the generated SQL to all databases that should be replicated

8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)

Replicate

1 In Delphi open the ldquoReplicatorrdquo example project

2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property

3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases

4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)

5 Compile and run the example

6 Press the ldquoReplicate nowrdquo button

2 CopyTiger the CopyCat Win32standalone replication tool

For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER

Features include

bull Easy to use installer

bull Independent server Administrator tool (CTAdmin)

bull Configuration wizard for setting up links to master slave databases

bull Robust replication engine based on Microtec CopyCat

bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections

bull Simple amp intuitive control panel

bullAutomatic email notification on certain events (conflicts PK violations etc)

Visit the CopyTiger homepage to download a time-limited trial version

httpwwwmicrotecfrcopycatct

SUMMARY

In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit

1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications

2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure

CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool

You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr

As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold

Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code

ibd-852-150

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 13: The InterBase and Firebird Developer Magazine, Issue 2, 2005

TestBed

14

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

loaded ldquoready for userdquo backup

Hardware is not a material issue for these tests since we are only comparing performance with and without the NOSAVEPOINT option Our test platform was a modest computer with Pentium-4 2 GHz with 512 RAM and an 80GBSamsung HDD

Preparing to test

A separate copy of the test database is used for each test case in order to eliminate any interference between state-ments We create four fresh copies of the database for this purpose Supposing all files are in a directory calledCTEST simply create the four test databases from your test backup file

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp1ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp2ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp3ib

gbak ndashc ndashuser SYSDBA ndashpass masterkey CTESTtestspbackupgbk CTESTtestsp4ib

SQL test scripts

The first script tests a regular one-pass update without the NO SAVEPOINT option

For convenience the important commands are clarified with comments

connect CTESTtestsp1ib USER SYSDBA Password masterkey Connect drop table TEST2DROP Drop table TEST2DROP to free database pagescommitselect count() from test Walk down all records in TEST to place them into cachecommitset time on enable time statistics for performed statementsset stat on enables writesfetchesmemory statisticscommit perform bulk update of all records in TEST table update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommit

quit The second script tests performance for the same UPDATE with the NO SAVEPOINT option

TestBed

15

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

connect Ctestsp2ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitSET TRANSACTION NO SAVEPOINT enable NO SAVEPOINTupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit

Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script both scripts arethe same simply testing the behavior of engine in case of the single bulk UPDATE

To test sequential UPDATEs we added several UPDATE statements--we recommend using five The script for testing without NO SAVEPOINT would be

connect Etestsp3ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit

Fyracle 089

Janus has released anew version of Oracle-mode Firebird FyracleFyracle is a specialized

build of Firebird 15it adds temporary

tables hierarchicalqueries and

a PLSQL engine

Version 089 addssupport for storedprocedures written

in Java

Fyracle dramaticallyreduces the cost of port-ing Oracle-based appli-

cations to Firebird

Common usageincludes the

Compiere open sourceERP package

mid-market deploymentsof Developer

2000 applicationsand demo CDsof applications

without license trouble

Read more at

wwwjanus-soft-warecom

News amp Events

IBAnalyst 19

IBSurgeon has issued newversion of IBAnalyst

Now it can better analyzeInterBase or Firebird data-base statistics with usingmetadata information (in thiscase connection to databaseis required)

IBAnalyst is a tool that assistsa user to analyze in detailFirebird or InterBase data-base statistics and identifypossible problems with data-base performance mainte-nance and how an applica-tion interacts with the data-base

It graphically displays data-base statistics and can thenautomatically make intelli-gent suggestions aboutimproving database per-formance and databasemaintenance

wwwibsurgeoncomnewshtml

News amp Events

You can download all the scripts and the raw results of their execution from this locationhttpwwwibdevelopercomissue2testresultszip

TestBed

16

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

How to perform the test

The easiest way to perform the test is to use isqls INPUT command

Suppose you have the scripts located in ctestscripts

gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql

Test resultsThe single bulk UPDATE

First letrsquos perform the test where the single-pass bulk UPDATE is performed

This is an excerpt from the one-pass script with default transaction settings

SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980

SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3

Fast Report 319

The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains

Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors

TrialBuy Now

News amp Events

TECT Softwarepresents

New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet

SPGen Stored Procedure Gen-erator for Firebird and Inter-

Base creates a standard set ofstored procs for any table full

details can be found here

httpwwwtectsoftnetProductsFirebird

FIBSPGenaspx

And FBMail Firebird EMail(FBMail) is a cross platform PHP

utility which easily allows thesending of emails direct from

within a database

This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected

to the internet

Full detailscan be found here

httpwwwtectsoftnetProductsFirebird

FBMailaspx

News amp Events

This is an excerpt from the one-pass script with NO SAVEPOINT enabled

CopyTiger 100

CopyTigera new productfrom Microtecis an advanced

databasereplication amp

synchronisationtool for

InterBaseFirebird

powered bytheir CopyCatcomponent set(see the articleabout CopyCat

in this issue)

For moreinformation about

CopyTiger seewwwmicrotecfr

copycatct

Downloadan evaluation

version

TestBed

17

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154

SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3

Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not

Sequential bulk UPDATE statements

With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large

Table 1

Test results for 5 sequentalUPDATEs

News amp Events

SQLHammer 15Enterprise Edition

The interestingdeveloper toolSQLHammer

now hasan Enterprise Edition

SQLHammer isa high-performance tool

for rapiddatabase development

and administration

It includes a commonintegrated development

environmentregistered custom

componentsbased on the

Borland Package Librarymechanism

and an Open Tools APIwritten in

DelphiObject Pascal

wwwmetadataforgecom

News amp Events

TestBed

18

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

For convenience the results are tabulated above

The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT

Figure 1

Time to perform UPDATEs with and without NO SAVEPOINT

Table 1

Test results for 5 sequental UPDATEs

In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used

The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters

Figure 2

Memory usage while performing UPDATEs with and without NO SAVEPOINT

With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below

Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains

A few words about versions

You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The

TestBed

19

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

old version does not disappearimmediately but is retained as abackversion

In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk

The UNDO log concept

You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested

A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point

So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources

The NO SAVEPOINToption

The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)

Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes

The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management

First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see

ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)

Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions

Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes

Initial situation

Consider the implementation details of the undo-log Figure 3 shows the initial situation

Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record

Figure 3

Initial situation before any UPDATE - only the one record version exists Undo log is empty

The first UPDATE

The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager

It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk

TestBed

20

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 4

UPDATE1 create small delta version on disk and put record number into UNDO log

This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled

The second UPDATE

When the second UPDATE statement is performed on the same set of records we have adifferent situation

Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used

To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory

The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required

Figure 5

The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log

This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters

When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first

The third UPDATE

The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further

TestBed

21

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 6

The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one

Figure 7

If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint

Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)

Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log

A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know

The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE

Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit

BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log

For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7

In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope

SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping

Development area

22

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article

The sources in the bibliography arelisted in the order of their appear-ance in my mind

The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability

of all the methods taken from thearticles and of course I have testedall my own methods

Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise

I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc

And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle

The problem state-ment

What is the problem

Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common

1Everyone got used to them andfelt comfortable with them

2They are quite fast (if you usethem according to certain knownstandards)

3They use SQL which is an easycomprehensible and time-proveddata manipulation method

4They are based upon a strongmathematical theory

5They are convenient for applica-

tion development - if you developyour applications just like 20-30years ago

As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and

Author Vladimir Kotlyarevskyvladcontekru

Development area

23

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip

As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored

What do we need

The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing

RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task

The OID

All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it

First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have

any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database

OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex

Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)

Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world

And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones

Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles

The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers

ClassId

All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is

Development area

24

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain

Name Description cre-ation_date change_dateowner

Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect

links is often a very complicatedtask to put it mildly

Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later

The OBJECTS Table

Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure

So let us design this table

Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)

The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)

Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method

Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage

There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student

create domain TOID as integer not nullcreate table OBJECTS(

OID TOID primary keyClassId TOIDName varchar(32)

A simple plain dictionary can be retrieved from the OBJECTStable by the following query

select OID Name Description from OBJECTS where ClassId = LookupId

Development area

25

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description

from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)

Later I will demonstrate how to add alittle more intelligence to such simpledictionaries

Storing of more complexobjects

It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping

Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders

create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))

which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to

ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query

select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id

As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did

If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M

create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))

One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present

Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database

Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this

create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))

connected 1M with OBJECTS by OID There is also a table

create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))

which describes type metadata ndash an attribute set (their names and types) for eachobject type

All attributes for particular object where the OID is known are retrieved by the query

select attribute_id value from attributeswhere OID = oid

or with names of attributes

select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id

Development area

26

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries

select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id

Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query

Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the

database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base

The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1

Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage

Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB

It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-

utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation

To be continuedhellip

FastReport 3 - new generation of the reporting tools

Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix

reports support support most popular DB-engines

Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags

(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)

Access to DB fields styles text flow URLs Anchors

httpwwwfast-reportcom

Development area

27

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

Replicating and synchronizingInterBaseFirebirddatabases using CopyCat

AuthorJonathan Neve

jonathanmicrotecfr

wwwmicrotecfr

PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication

Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases

Data logging

Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)

Multi-node replication

Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after

In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it

replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)

When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it

Two-way replication

One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed

The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change

Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself

Primary key synchronization

One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-

Development area

28

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values

Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server

When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique

Conflict management

Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-

sion of the record this is a conflict

CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved

This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently

Difficult database structures

There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node

Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged

To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an

Development area

29

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable

PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms

1 As a set of Delphi C++ Builder components

2 As a standalone Win32 replication tool

We will now present each one of these products

1 CopyCat Delphi C++ Builder Component Set

CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions

Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server

Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table

Below is a concise guide for getting started with the CopyCat component suite

Download and install the evaluation components fromhttpwwwmicrotecfrcopycat

Prepare databases

1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components

2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)

3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo

4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base

5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo

Development area

30

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data

6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority

7) Apply all the generated SQL to all databases that should be replicated

8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)

Replicate

1 In Delphi open the ldquoReplicatorrdquo example project

2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property

3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases

4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)

5 Compile and run the example

6 Press the ldquoReplicate nowrdquo button

2 CopyTiger the CopyCat Win32standalone replication tool

For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER

Features include

bull Easy to use installer

bull Independent server Administrator tool (CTAdmin)

bull Configuration wizard for setting up links to master slave databases

bull Robust replication engine based on Microtec CopyCat

bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections

bull Simple amp intuitive control panel

bullAutomatic email notification on certain events (conflicts PK violations etc)

Visit the CopyTiger homepage to download a time-limited trial version

httpwwwmicrotecfrcopycatct

SUMMARY

In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit

1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications

2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure

CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool

You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr

As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold

Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code

ibd-852-150

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 14: The InterBase and Firebird Developer Magazine, Issue 2, 2005

TestBed

15

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

connect Ctestsp2ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitSET TRANSACTION NO SAVEPOINT enable NO SAVEPOINTupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit

Except for the inclusion of the SET TRANSACTION NO SAVEPOINT statement in the second script both scripts arethe same simply testing the behavior of engine in case of the single bulk UPDATE

To test sequential UPDATEs we added several UPDATE statements--we recommend using five The script for testing without NO SAVEPOINT would be

connect Etestsp3ib USER SYSDBA Password masterkeydrop table TEST2DROP commitselect count() from testcommitset time onset stat oncommitupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPupdate TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPcommitquit

Fyracle 089

Janus has released anew version of Oracle-mode Firebird FyracleFyracle is a specialized

build of Firebird 15it adds temporary

tables hierarchicalqueries and

a PLSQL engine

Version 089 addssupport for storedprocedures written

in Java

Fyracle dramaticallyreduces the cost of port-ing Oracle-based appli-

cations to Firebird

Common usageincludes the

Compiere open sourceERP package

mid-market deploymentsof Developer

2000 applicationsand demo CDsof applications

without license trouble

Read more at

wwwjanus-soft-warecom

News amp Events

IBAnalyst 19

IBSurgeon has issued newversion of IBAnalyst

Now it can better analyzeInterBase or Firebird data-base statistics with usingmetadata information (in thiscase connection to databaseis required)

IBAnalyst is a tool that assistsa user to analyze in detailFirebird or InterBase data-base statistics and identifypossible problems with data-base performance mainte-nance and how an applica-tion interacts with the data-base

It graphically displays data-base statistics and can thenautomatically make intelli-gent suggestions aboutimproving database per-formance and databasemaintenance

wwwibsurgeoncomnewshtml

News amp Events

You can download all the scripts and the raw results of their execution from this locationhttpwwwibdevelopercomissue2testresultszip

TestBed

16

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

How to perform the test

The easiest way to perform the test is to use isqls INPUT command

Suppose you have the scripts located in ctestscripts

gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql

Test resultsThe single bulk UPDATE

First letrsquos perform the test where the single-pass bulk UPDATE is performed

This is an excerpt from the one-pass script with default transaction settings

SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980

SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3

Fast Report 319

The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains

Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors

TrialBuy Now

News amp Events

TECT Softwarepresents

New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet

SPGen Stored Procedure Gen-erator for Firebird and Inter-

Base creates a standard set ofstored procs for any table full

details can be found here

httpwwwtectsoftnetProductsFirebird

FIBSPGenaspx

And FBMail Firebird EMail(FBMail) is a cross platform PHP

utility which easily allows thesending of emails direct from

within a database

This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected

to the internet

Full detailscan be found here

httpwwwtectsoftnetProductsFirebird

FBMailaspx

News amp Events

This is an excerpt from the one-pass script with NO SAVEPOINT enabled

CopyTiger 100

CopyTigera new productfrom Microtecis an advanced

databasereplication amp

synchronisationtool for

InterBaseFirebird

powered bytheir CopyCatcomponent set(see the articleabout CopyCat

in this issue)

For moreinformation about

CopyTiger seewwwmicrotecfr

copycatct

Downloadan evaluation

version

TestBed

17

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154

SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3

Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not

Sequential bulk UPDATE statements

With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large

Table 1

Test results for 5 sequentalUPDATEs

News amp Events

SQLHammer 15Enterprise Edition

The interestingdeveloper toolSQLHammer

now hasan Enterprise Edition

SQLHammer isa high-performance tool

for rapiddatabase development

and administration

It includes a commonintegrated development

environmentregistered custom

componentsbased on the

Borland Package Librarymechanism

and an Open Tools APIwritten in

DelphiObject Pascal

wwwmetadataforgecom

News amp Events

TestBed

18

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

For convenience the results are tabulated above

The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT

Figure 1

Time to perform UPDATEs with and without NO SAVEPOINT

Table 1

Test results for 5 sequental UPDATEs

In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used

The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters

Figure 2

Memory usage while performing UPDATEs with and without NO SAVEPOINT

With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below

Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains

A few words about versions

You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The

TestBed

19

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

old version does not disappearimmediately but is retained as abackversion

In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk

The UNDO log concept

You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested

A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point

So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources

The NO SAVEPOINToption

The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)

Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes

The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management

First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see

ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)

Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions

Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes

Initial situation

Consider the implementation details of the undo-log Figure 3 shows the initial situation

Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record

Figure 3

Initial situation before any UPDATE - only the one record version exists Undo log is empty

The first UPDATE

The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager

It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk

TestBed

20

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 4

UPDATE1 create small delta version on disk and put record number into UNDO log

This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled

The second UPDATE

When the second UPDATE statement is performed on the same set of records we have adifferent situation

Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used

To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory

The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required

Figure 5

The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log

This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters

When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first

The third UPDATE

The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further

TestBed

21

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 6

The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one

Figure 7

If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint

Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)

Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log

A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know

The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE

Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit

BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log

For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7

In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope

SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping

Development area

22

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article

The sources in the bibliography arelisted in the order of their appear-ance in my mind

The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability

of all the methods taken from thearticles and of course I have testedall my own methods

Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise

I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc

And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle

The problem state-ment

What is the problem

Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common

1Everyone got used to them andfelt comfortable with them

2They are quite fast (if you usethem according to certain knownstandards)

3They use SQL which is an easycomprehensible and time-proveddata manipulation method

4They are based upon a strongmathematical theory

5They are convenient for applica-

tion development - if you developyour applications just like 20-30years ago

As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and

Author Vladimir Kotlyarevskyvladcontekru

Development area

23

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip

As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored

What do we need

The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing

RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task

The OID

All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it

First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have

any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database

OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex

Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)

Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world

And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones

Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles

The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers

ClassId

All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is

Development area

24

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain

Name Description cre-ation_date change_dateowner

Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect

links is often a very complicatedtask to put it mildly

Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later

The OBJECTS Table

Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure

So let us design this table

Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)

The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)

Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method

Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage

There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student

create domain TOID as integer not nullcreate table OBJECTS(

OID TOID primary keyClassId TOIDName varchar(32)

A simple plain dictionary can be retrieved from the OBJECTStable by the following query

select OID Name Description from OBJECTS where ClassId = LookupId

Development area

25

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description

from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)

Later I will demonstrate how to add alittle more intelligence to such simpledictionaries

Storing of more complexobjects

It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping

Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders

create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))

which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to

ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query

select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id

As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did

If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M

create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))

One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present

Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database

Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this

create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))

connected 1M with OBJECTS by OID There is also a table

create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))

which describes type metadata ndash an attribute set (their names and types) for eachobject type

All attributes for particular object where the OID is known are retrieved by the query

select attribute_id value from attributeswhere OID = oid

or with names of attributes

select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id

Development area

26

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries

select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id

Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query

Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the

database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base

The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1

Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage

Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB

It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-

utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation

To be continuedhellip

FastReport 3 - new generation of the reporting tools

Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix

reports support support most popular DB-engines

Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags

(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)

Access to DB fields styles text flow URLs Anchors

httpwwwfast-reportcom

Development area

27

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

Replicating and synchronizingInterBaseFirebirddatabases using CopyCat

AuthorJonathan Neve

jonathanmicrotecfr

wwwmicrotecfr

PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication

Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases

Data logging

Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)

Multi-node replication

Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after

In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it

replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)

When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it

Two-way replication

One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed

The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change

Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself

Primary key synchronization

One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-

Development area

28

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values

Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server

When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique

Conflict management

Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-

sion of the record this is a conflict

CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved

This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently

Difficult database structures

There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node

Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged

To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an

Development area

29

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable

PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms

1 As a set of Delphi C++ Builder components

2 As a standalone Win32 replication tool

We will now present each one of these products

1 CopyCat Delphi C++ Builder Component Set

CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions

Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server

Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table

Below is a concise guide for getting started with the CopyCat component suite

Download and install the evaluation components fromhttpwwwmicrotecfrcopycat

Prepare databases

1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components

2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)

3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo

4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base

5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo

Development area

30

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data

6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority

7) Apply all the generated SQL to all databases that should be replicated

8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)

Replicate

1 In Delphi open the ldquoReplicatorrdquo example project

2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property

3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases

4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)

5 Compile and run the example

6 Press the ldquoReplicate nowrdquo button

2 CopyTiger the CopyCat Win32standalone replication tool

For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER

Features include

bull Easy to use installer

bull Independent server Administrator tool (CTAdmin)

bull Configuration wizard for setting up links to master slave databases

bull Robust replication engine based on Microtec CopyCat

bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections

bull Simple amp intuitive control panel

bullAutomatic email notification on certain events (conflicts PK violations etc)

Visit the CopyTiger homepage to download a time-limited trial version

httpwwwmicrotecfrcopycatct

SUMMARY

In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit

1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications

2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure

CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool

You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr

As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold

Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code

ibd-852-150

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 15: The InterBase and Firebird Developer Magazine, Issue 2, 2005

TestBed

16

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

How to perform the test

The easiest way to perform the test is to use isqls INPUT command

Suppose you have the scripts located in ctestscripts

gtisqlUse CONNECT or CREATE DATABASE to specify a databaseSQLgtinput ctestscriptsscript01sql

Test resultsThe single bulk UPDATE

First letrsquos perform the test where the single-pass bulk UPDATE is performed

This is an excerpt from the one-pass script with default transaction settings

SQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10509940Delta memory = 214016Max memory = 10509940Elapsed time= 463 secBuffers = 2048Reads = 2111Writes 375Fetches = 974980

SQLgt commitCurrent memory = 10340752Delta memory = -169188Max memory = 10509940Elapsed time= 003 secBuffers = 2048Reads = 1Writes 942Fetches = 3

Fast Report 319

The new version of thefamous Fast Report isnow out FastReportreg 3is an add-in componentthat gives your applica-tions the ability to gener-ate reports quickly andefficiently FastReportregprovides all the tools youneed to develop reportsAll variants of FastRe-portreg 3 contains

Visual report designerwith rulers guides andzooming wizard forbasic types of reportexport filters for html tiffbmp jpg xls pdf out-puts Dot matrix reportssupport support for mostpopular DB-engines FullWYSIWYG text rotation0360 degrees memoobject supports simplehtml-tags (font color b iu sub sup) improvedstretching (StretchModeShiftMode properties)access to DB fieldsstyles text flow URLsAnchors

TrialBuy Now

News amp Events

TECT Softwarepresents

New versions of nice utilitiesfrom TECT Softwarewwwtectsoftnet

SPGen Stored Procedure Gen-erator for Firebird and Inter-

Base creates a standard set ofstored procs for any table full

details can be found here

httpwwwtectsoftnetProductsFirebird

FIBSPGenaspx

And FBMail Firebird EMail(FBMail) is a cross platform PHP

utility which easily allows thesending of emails direct from

within a database

This utility is specifically aimedat ISPs that support Firebirdbut can easily be used on anycomputer which is connected

to the internet

Full detailscan be found here

httpwwwtectsoftnetProductsFirebird

FBMailaspx

News amp Events

This is an excerpt from the one-pass script with NO SAVEPOINT enabled

CopyTiger 100

CopyTigera new productfrom Microtecis an advanced

databasereplication amp

synchronisationtool for

InterBaseFirebird

powered bytheir CopyCatcomponent set(see the articleabout CopyCat

in this issue)

For moreinformation about

CopyTiger seewwwmicrotecfr

copycatct

Downloadan evaluation

version

TestBed

17

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154

SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3

Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not

Sequential bulk UPDATE statements

With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large

Table 1

Test results for 5 sequentalUPDATEs

News amp Events

SQLHammer 15Enterprise Edition

The interestingdeveloper toolSQLHammer

now hasan Enterprise Edition

SQLHammer isa high-performance tool

for rapiddatabase development

and administration

It includes a commonintegrated development

environmentregistered custom

componentsbased on the

Borland Package Librarymechanism

and an Open Tools APIwritten in

DelphiObject Pascal

wwwmetadataforgecom

News amp Events

TestBed

18

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

For convenience the results are tabulated above

The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT

Figure 1

Time to perform UPDATEs with and without NO SAVEPOINT

Table 1

Test results for 5 sequental UPDATEs

In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used

The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters

Figure 2

Memory usage while performing UPDATEs with and without NO SAVEPOINT

With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below

Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains

A few words about versions

You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The

TestBed

19

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

old version does not disappearimmediately but is retained as abackversion

In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk

The UNDO log concept

You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested

A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point

So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources

The NO SAVEPOINToption

The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)

Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes

The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management

First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see

ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)

Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions

Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes

Initial situation

Consider the implementation details of the undo-log Figure 3 shows the initial situation

Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record

Figure 3

Initial situation before any UPDATE - only the one record version exists Undo log is empty

The first UPDATE

The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager

It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk

TestBed

20

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 4

UPDATE1 create small delta version on disk and put record number into UNDO log

This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled

The second UPDATE

When the second UPDATE statement is performed on the same set of records we have adifferent situation

Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used

To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory

The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required

Figure 5

The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log

This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters

When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first

The third UPDATE

The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further

TestBed

21

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 6

The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one

Figure 7

If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint

Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)

Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log

A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know

The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE

Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit

BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log

For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7

In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope

SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping

Development area

22

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article

The sources in the bibliography arelisted in the order of their appear-ance in my mind

The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability

of all the methods taken from thearticles and of course I have testedall my own methods

Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise

I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc

And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle

The problem state-ment

What is the problem

Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common

1Everyone got used to them andfelt comfortable with them

2They are quite fast (if you usethem according to certain knownstandards)

3They use SQL which is an easycomprehensible and time-proveddata manipulation method

4They are based upon a strongmathematical theory

5They are convenient for applica-

tion development - if you developyour applications just like 20-30years ago

As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and

Author Vladimir Kotlyarevskyvladcontekru

Development area

23

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip

As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored

What do we need

The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing

RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task

The OID

All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it

First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have

any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database

OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex

Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)

Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world

And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones

Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles

The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers

ClassId

All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is

Development area

24

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain

Name Description cre-ation_date change_dateowner

Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect

links is often a very complicatedtask to put it mildly

Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later

The OBJECTS Table

Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure

So let us design this table

Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)

The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)

Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method

Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage

There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student

create domain TOID as integer not nullcreate table OBJECTS(

OID TOID primary keyClassId TOIDName varchar(32)

A simple plain dictionary can be retrieved from the OBJECTStable by the following query

select OID Name Description from OBJECTS where ClassId = LookupId

Development area

25

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description

from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)

Later I will demonstrate how to add alittle more intelligence to such simpledictionaries

Storing of more complexobjects

It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping

Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders

create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))

which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to

ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query

select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id

As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did

If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M

create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))

One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present

Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database

Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this

create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))

connected 1M with OBJECTS by OID There is also a table

create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))

which describes type metadata ndash an attribute set (their names and types) for eachobject type

All attributes for particular object where the OID is known are retrieved by the query

select attribute_id value from attributeswhere OID = oid

or with names of attributes

select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id

Development area

26

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries

select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id

Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query

Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the

database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base

The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1

Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage

Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB

It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-

utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation

To be continuedhellip

FastReport 3 - new generation of the reporting tools

Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix

reports support support most popular DB-engines

Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags

(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)

Access to DB fields styles text flow URLs Anchors

httpwwwfast-reportcom

Development area

27

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

Replicating and synchronizingInterBaseFirebirddatabases using CopyCat

AuthorJonathan Neve

jonathanmicrotecfr

wwwmicrotecfr

PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication

Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases

Data logging

Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)

Multi-node replication

Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after

In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it

replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)

When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it

Two-way replication

One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed

The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change

Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself

Primary key synchronization

One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-

Development area

28

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values

Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server

When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique

Conflict management

Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-

sion of the record this is a conflict

CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved

This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently

Difficult database structures

There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node

Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged

To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an

Development area

29

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable

PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms

1 As a set of Delphi C++ Builder components

2 As a standalone Win32 replication tool

We will now present each one of these products

1 CopyCat Delphi C++ Builder Component Set

CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions

Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server

Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table

Below is a concise guide for getting started with the CopyCat component suite

Download and install the evaluation components fromhttpwwwmicrotecfrcopycat

Prepare databases

1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components

2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)

3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo

4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base

5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo

Development area

30

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data

6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority

7) Apply all the generated SQL to all databases that should be replicated

8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)

Replicate

1 In Delphi open the ldquoReplicatorrdquo example project

2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property

3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases

4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)

5 Compile and run the example

6 Press the ldquoReplicate nowrdquo button

2 CopyTiger the CopyCat Win32standalone replication tool

For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER

Features include

bull Easy to use installer

bull Independent server Administrator tool (CTAdmin)

bull Configuration wizard for setting up links to master slave databases

bull Robust replication engine based on Microtec CopyCat

bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections

bull Simple amp intuitive control panel

bullAutomatic email notification on certain events (conflicts PK violations etc)

Visit the CopyTiger homepage to download a time-limited trial version

httpwwwmicrotecfrcopycatct

SUMMARY

In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit

1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications

2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure

CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool

You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr

As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold

Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code

ibd-852-150

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 16: The InterBase and Firebird Developer Magazine, Issue 2, 2005

CopyTiger 100

CopyTigera new productfrom Microtecis an advanced

databasereplication amp

synchronisationtool for

InterBaseFirebird

powered bytheir CopyCatcomponent set(see the articleabout CopyCat

in this issue)

For moreinformation about

CopyTiger seewwwmicrotecfr

copycatct

Downloadan evaluation

version

TestBed

17

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

SQLgt SET TRANSACTION NO SAVEPOINTSQLgt update TEST set ID = ID+1 QRT = QRT+1 NAME=NAME||1 ts_change = CURRENT_TIMESTAMPCurrent memory = 10352244Delta memory = 55296Max memory = 10352244Elapsed time= 472 secBuffers = 2048Reads = 2102Writes 350Fetches = 967154

SQLgt commitCurrent memory = 10344052Delta memory = -8192Max memory = 10352244Elapsed time= 015 secBuffers = 2048Reads = 1Writes 938Fetches = 3

Performance appears to be almost the same whether the NO SAVEPOINT option is enabled or not

Sequential bulk UPDATE statements

With the mutlti-pass script (sequential UPDATEs) the raw test results are rather large

Table 1

Test results for 5 sequentalUPDATEs

News amp Events

SQLHammer 15Enterprise Edition

The interestingdeveloper toolSQLHammer

now hasan Enterprise Edition

SQLHammer isa high-performance tool

for rapiddatabase development

and administration

It includes a commonintegrated development

environmentregistered custom

componentsbased on the

Borland Package Librarymechanism

and an Open Tools APIwritten in

DelphiObject Pascal

wwwmetadataforgecom

News amp Events

TestBed

18

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

For convenience the results are tabulated above

The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT

Figure 1

Time to perform UPDATEs with and without NO SAVEPOINT

Table 1

Test results for 5 sequental UPDATEs

In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used

The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters

Figure 2

Memory usage while performing UPDATEs with and without NO SAVEPOINT

With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below

Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains

A few words about versions

You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The

TestBed

19

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

old version does not disappearimmediately but is retained as abackversion

In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk

The UNDO log concept

You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested

A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point

So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources

The NO SAVEPOINToption

The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)

Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes

The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management

First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see

ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)

Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions

Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes

Initial situation

Consider the implementation details of the undo-log Figure 3 shows the initial situation

Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record

Figure 3

Initial situation before any UPDATE - only the one record version exists Undo log is empty

The first UPDATE

The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager

It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk

TestBed

20

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 4

UPDATE1 create small delta version on disk and put record number into UNDO log

This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled

The second UPDATE

When the second UPDATE statement is performed on the same set of records we have adifferent situation

Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used

To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory

The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required

Figure 5

The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log

This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters

When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first

The third UPDATE

The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further

TestBed

21

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 6

The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one

Figure 7

If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint

Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)

Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log

A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know

The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE

Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit

BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log

For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7

In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope

SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping

Development area

22

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article

The sources in the bibliography arelisted in the order of their appear-ance in my mind

The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability

of all the methods taken from thearticles and of course I have testedall my own methods

Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise

I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc

And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle

The problem state-ment

What is the problem

Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common

1Everyone got used to them andfelt comfortable with them

2They are quite fast (if you usethem according to certain knownstandards)

3They use SQL which is an easycomprehensible and time-proveddata manipulation method

4They are based upon a strongmathematical theory

5They are convenient for applica-

tion development - if you developyour applications just like 20-30years ago

As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and

Author Vladimir Kotlyarevskyvladcontekru

Development area

23

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip

As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored

What do we need

The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing

RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task

The OID

All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it

First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have

any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database

OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex

Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)

Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world

And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones

Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles

The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers

ClassId

All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is

Development area

24

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain

Name Description cre-ation_date change_dateowner

Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect

links is often a very complicatedtask to put it mildly

Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later

The OBJECTS Table

Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure

So let us design this table

Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)

The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)

Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method

Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage

There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student

create domain TOID as integer not nullcreate table OBJECTS(

OID TOID primary keyClassId TOIDName varchar(32)

A simple plain dictionary can be retrieved from the OBJECTStable by the following query

select OID Name Description from OBJECTS where ClassId = LookupId

Development area

25

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description

from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)

Later I will demonstrate how to add alittle more intelligence to such simpledictionaries

Storing of more complexobjects

It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping

Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders

create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))

which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to

ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query

select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id

As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did

If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M

create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))

One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present

Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database

Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this

create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))

connected 1M with OBJECTS by OID There is also a table

create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))

which describes type metadata ndash an attribute set (their names and types) for eachobject type

All attributes for particular object where the OID is known are retrieved by the query

select attribute_id value from attributeswhere OID = oid

or with names of attributes

select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id

Development area

26

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries

select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id

Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query

Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the

database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base

The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1

Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage

Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB

It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-

utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation

To be continuedhellip

FastReport 3 - new generation of the reporting tools

Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix

reports support support most popular DB-engines

Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags

(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)

Access to DB fields styles text flow URLs Anchors

httpwwwfast-reportcom

Development area

27

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

Replicating and synchronizingInterBaseFirebirddatabases using CopyCat

AuthorJonathan Neve

jonathanmicrotecfr

wwwmicrotecfr

PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication

Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases

Data logging

Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)

Multi-node replication

Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after

In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it

replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)

When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it

Two-way replication

One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed

The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change

Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself

Primary key synchronization

One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-

Development area

28

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values

Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server

When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique

Conflict management

Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-

sion of the record this is a conflict

CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved

This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently

Difficult database structures

There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node

Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged

To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an

Development area

29

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable

PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms

1 As a set of Delphi C++ Builder components

2 As a standalone Win32 replication tool

We will now present each one of these products

1 CopyCat Delphi C++ Builder Component Set

CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions

Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server

Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table

Below is a concise guide for getting started with the CopyCat component suite

Download and install the evaluation components fromhttpwwwmicrotecfrcopycat

Prepare databases

1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components

2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)

3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo

4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base

5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo

Development area

30

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data

6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority

7) Apply all the generated SQL to all databases that should be replicated

8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)

Replicate

1 In Delphi open the ldquoReplicatorrdquo example project

2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property

3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases

4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)

5 Compile and run the example

6 Press the ldquoReplicate nowrdquo button

2 CopyTiger the CopyCat Win32standalone replication tool

For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER

Features include

bull Easy to use installer

bull Independent server Administrator tool (CTAdmin)

bull Configuration wizard for setting up links to master slave databases

bull Robust replication engine based on Microtec CopyCat

bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections

bull Simple amp intuitive control panel

bullAutomatic email notification on certain events (conflicts PK violations etc)

Visit the CopyTiger homepage to download a time-limited trial version

httpwwwmicrotecfrcopycatct

SUMMARY

In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit

1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications

2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure

CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool

You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr

As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold

Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code

ibd-852-150

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 17: The InterBase and Firebird Developer Magazine, Issue 2, 2005

TestBed

18

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

For convenience the results are tabulated above

The first UPDATE statement has almost the same execution time with and without the NOSAVEPOINT option However memory consumption is reduced fivefold when we use NOSAVEPOINT

Figure 1

Time to perform UPDATEs with and without NO SAVEPOINT

Table 1

Test results for 5 sequental UPDATEs

In the second UPDATE we start to see the difference With default transaction settings thisUPDATE takes a very long time - 47 seconds - compared to only 7210 ms with NO SAVE-POINT enabled With default transaction settings we can see that memory usage is signifi-cant wherease with NO SAVEPOINT no additional memory is used

The third and all following UPDATE statements with default settings show equal time andmemory usage values and the growth of writes parameters

Figure 2

Memory usage while performing UPDATEs with and without NO SAVEPOINT

With NO SAVEPOINT usage we observe that timememory values and writes growth areall small and virtually equal for each pass The corresponding graphs are below

Inside the UNDO logSo what happened during the execution of the test scripts What is the secret behind thismagic that NO SAVEPOINT does Is there any cost attached to these gains

A few words about versions

You probably know already that InterBase is a multi-record-version database enginemeaning that each time a record is changed a new version of that record is produced The

TestBed

19

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

old version does not disappearimmediately but is retained as abackversion

In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk

The UNDO log concept

You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested

A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point

So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources

The NO SAVEPOINToption

The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)

Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes

The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management

First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see

ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)

Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions

Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes

Initial situation

Consider the implementation details of the undo-log Figure 3 shows the initial situation

Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record

Figure 3

Initial situation before any UPDATE - only the one record version exists Undo log is empty

The first UPDATE

The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager

It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk

TestBed

20

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 4

UPDATE1 create small delta version on disk and put record number into UNDO log

This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled

The second UPDATE

When the second UPDATE statement is performed on the same set of records we have adifferent situation

Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used

To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory

The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required

Figure 5

The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log

This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters

When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first

The third UPDATE

The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further

TestBed

21

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 6

The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one

Figure 7

If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint

Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)

Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log

A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know

The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE

Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit

BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log

For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7

In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope

SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping

Development area

22

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article

The sources in the bibliography arelisted in the order of their appear-ance in my mind

The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability

of all the methods taken from thearticles and of course I have testedall my own methods

Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise

I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc

And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle

The problem state-ment

What is the problem

Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common

1Everyone got used to them andfelt comfortable with them

2They are quite fast (if you usethem according to certain knownstandards)

3They use SQL which is an easycomprehensible and time-proveddata manipulation method

4They are based upon a strongmathematical theory

5They are convenient for applica-

tion development - if you developyour applications just like 20-30years ago

As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and

Author Vladimir Kotlyarevskyvladcontekru

Development area

23

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip

As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored

What do we need

The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing

RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task

The OID

All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it

First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have

any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database

OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex

Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)

Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world

And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones

Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles

The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers

ClassId

All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is

Development area

24

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain

Name Description cre-ation_date change_dateowner

Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect

links is often a very complicatedtask to put it mildly

Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later

The OBJECTS Table

Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure

So let us design this table

Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)

The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)

Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method

Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage

There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student

create domain TOID as integer not nullcreate table OBJECTS(

OID TOID primary keyClassId TOIDName varchar(32)

A simple plain dictionary can be retrieved from the OBJECTStable by the following query

select OID Name Description from OBJECTS where ClassId = LookupId

Development area

25

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description

from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)

Later I will demonstrate how to add alittle more intelligence to such simpledictionaries

Storing of more complexobjects

It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping

Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders

create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))

which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to

ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query

select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id

As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did

If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M

create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))

One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present

Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database

Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this

create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))

connected 1M with OBJECTS by OID There is also a table

create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))

which describes type metadata ndash an attribute set (their names and types) for eachobject type

All attributes for particular object where the OID is known are retrieved by the query

select attribute_id value from attributeswhere OID = oid

or with names of attributes

select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id

Development area

26

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries

select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id

Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query

Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the

database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base

The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1

Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage

Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB

It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-

utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation

To be continuedhellip

FastReport 3 - new generation of the reporting tools

Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix

reports support support most popular DB-engines

Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags

(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)

Access to DB fields styles text flow URLs Anchors

httpwwwfast-reportcom

Development area

27

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

Replicating and synchronizingInterBaseFirebirddatabases using CopyCat

AuthorJonathan Neve

jonathanmicrotecfr

wwwmicrotecfr

PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication

Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases

Data logging

Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)

Multi-node replication

Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after

In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it

replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)

When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it

Two-way replication

One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed

The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change

Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself

Primary key synchronization

One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-

Development area

28

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values

Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server

When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique

Conflict management

Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-

sion of the record this is a conflict

CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved

This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently

Difficult database structures

There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node

Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged

To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an

Development area

29

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable

PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms

1 As a set of Delphi C++ Builder components

2 As a standalone Win32 replication tool

We will now present each one of these products

1 CopyCat Delphi C++ Builder Component Set

CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions

Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server

Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table

Below is a concise guide for getting started with the CopyCat component suite

Download and install the evaluation components fromhttpwwwmicrotecfrcopycat

Prepare databases

1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components

2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)

3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo

4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base

5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo

Development area

30

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data

6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority

7) Apply all the generated SQL to all databases that should be replicated

8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)

Replicate

1 In Delphi open the ldquoReplicatorrdquo example project

2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property

3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases

4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)

5 Compile and run the example

6 Press the ldquoReplicate nowrdquo button

2 CopyTiger the CopyCat Win32standalone replication tool

For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER

Features include

bull Easy to use installer

bull Independent server Administrator tool (CTAdmin)

bull Configuration wizard for setting up links to master slave databases

bull Robust replication engine based on Microtec CopyCat

bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections

bull Simple amp intuitive control panel

bullAutomatic email notification on certain events (conflicts PK violations etc)

Visit the CopyTiger homepage to download a time-limited trial version

httpwwwmicrotecfrcopycatct

SUMMARY

In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit

1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications

2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure

CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool

You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr

As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold

Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code

ibd-852-150

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 18: The InterBase and Firebird Developer Magazine, Issue 2, 2005

TestBed

19

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

old version does not disappearimmediately but is retained as abackversion

In fact the first time a backversion iswritten to disk it is as a delta ver-sion which saves disk space andmemory usage by writing out onlythe differences between the old andthe new versions The engine canrebuild the full old version from thenew version and chains of delta ver-sions It is only if the same record isupdated more than once within thesame transaction that the full back-version is written to disk

The UNDO log concept

You may recall from Dmitrirsquos articlehow each transaction is implicitlyenclosed in a frame of savepointseach having its own undo log Thislog stores a record of each changein sequence ready for the possibili-ty that a rollback will be requested

A backversion materializes when-ever an UPDATE or DELETE state-ment is performed The engine hasto maintain all these backversions inthe undo log for the relevant save-point

So the Undo Log is a mechanism tomanage backversions for save-points in order to enable the associ-ated changes to be rolled back Theprocess of Undo logging is quitecomplex and maintaining it canconsume a lot of resources

The NO SAVEPOINToption

The NO SAVEPOINT option inInterBase 751 is a workaround forthe problem of performance lossduring bulk updates that do multiplepasses of a table The theory is ifusing the implicit savepoint man-agement causes problems then letskill the savepoints No savepoints ndashno problem -)

Besides ISQL it has been surfacedas a transaction parameter in bothDSQL and ESQL At the API level anew transaction parameter block(TPB) option isc_tpb_no_savepointcan be passed in the isc_start_trans-action() function call to disablesavepoints management Syntaxdetails for the latter flavors and forthe new tpb option can be found inthe 751 release notes

The effect of specifying the NOSAVEPOINT transaction parameteris that no undo log will be createdHowever along with the perform-ance gain for sequential bulkupdates it brings some costs fortransaction management

First and most obvious is that withNO SAVEPOINT enabled anyerror handling that relies on save-points is unavailable Any errorduring a NO SAVEPOINT transac-tion precludes all subsequent exe-cution and leads to rollback (see

ldquoRelease Notesrdquo for InterBase 75 SP1 ldquoNew in InterBase 751rdquo page 2-2)

Secondly when a NO SAVEPOINT transaction is rolled back it is marked as rolled back in the transactioninventory page Record version garbage thereby gets stuck in the interesting category and prevents theOIT from advancing Sweep is needed to advance the OIT and back out dead record versions

Fuller details of the NO SAVEPOINT option are provided in the InterBase 751 Release Notes

Initial situation

Consider the implementation details of the undo-log Figure 3 shows the initial situation

Recall that we perform this test on freshly-restored database so it is guaranteed that only one version existsfor any record

Figure 3

Initial situation before any UPDATE - only the one record version exists Undo log is empty

The first UPDATE

The first UPDATE statement creates delta backversions on disk (see figure 4) Since deltas store only thedifferences between the old and new versions they are quite small This operation is fast and it is easywork for the memory manager

It is simple to visualize the undo log when we perform the first UPDATEDELETE statement inside the trans-action ndash the engine just records the numbers of all affected records into the bitmap structure If it needs toroll back the changes associated with this savepoint it can read the stored numbers of the affected recordsthen walk down to the version of each and restore it from the updated version and the backversion storedon disk

TestBed

20

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 4

UPDATE1 create small delta version on disk and put record number into UNDO log

This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled

The second UPDATE

When the second UPDATE statement is performed on the same set of records we have adifferent situation

Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used

To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory

The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required

Figure 5

The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log

This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters

When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first

The third UPDATE

The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further

TestBed

21

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 6

The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one

Figure 7

If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint

Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)

Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log

A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know

The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE

Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit

BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log

For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7

In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope

SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping

Development area

22

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article

The sources in the bibliography arelisted in the order of their appear-ance in my mind

The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability

of all the methods taken from thearticles and of course I have testedall my own methods

Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise

I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc

And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle

The problem state-ment

What is the problem

Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common

1Everyone got used to them andfelt comfortable with them

2They are quite fast (if you usethem according to certain knownstandards)

3They use SQL which is an easycomprehensible and time-proveddata manipulation method

4They are based upon a strongmathematical theory

5They are convenient for applica-

tion development - if you developyour applications just like 20-30years ago

As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and

Author Vladimir Kotlyarevskyvladcontekru

Development area

23

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip

As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored

What do we need

The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing

RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task

The OID

All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it

First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have

any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database

OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex

Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)

Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world

And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones

Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles

The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers

ClassId

All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is

Development area

24

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain

Name Description cre-ation_date change_dateowner

Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect

links is often a very complicatedtask to put it mildly

Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later

The OBJECTS Table

Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure

So let us design this table

Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)

The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)

Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method

Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage

There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student

create domain TOID as integer not nullcreate table OBJECTS(

OID TOID primary keyClassId TOIDName varchar(32)

A simple plain dictionary can be retrieved from the OBJECTStable by the following query

select OID Name Description from OBJECTS where ClassId = LookupId

Development area

25

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description

from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)

Later I will demonstrate how to add alittle more intelligence to such simpledictionaries

Storing of more complexobjects

It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping

Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders

create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))

which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to

ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query

select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id

As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did

If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M

create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))

One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present

Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database

Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this

create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))

connected 1M with OBJECTS by OID There is also a table

create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))

which describes type metadata ndash an attribute set (their names and types) for eachobject type

All attributes for particular object where the OID is known are retrieved by the query

select attribute_id value from attributeswhere OID = oid

or with names of attributes

select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id

Development area

26

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries

select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id

Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query

Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the

database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base

The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1

Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage

Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB

It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-

utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation

To be continuedhellip

FastReport 3 - new generation of the reporting tools

Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix

reports support support most popular DB-engines

Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags

(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)

Access to DB fields styles text flow URLs Anchors

httpwwwfast-reportcom

Development area

27

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

Replicating and synchronizingInterBaseFirebirddatabases using CopyCat

AuthorJonathan Neve

jonathanmicrotecfr

wwwmicrotecfr

PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication

Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases

Data logging

Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)

Multi-node replication

Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after

In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it

replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)

When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it

Two-way replication

One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed

The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change

Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself

Primary key synchronization

One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-

Development area

28

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values

Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server

When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique

Conflict management

Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-

sion of the record this is a conflict

CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved

This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently

Difficult database structures

There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node

Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged

To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an

Development area

29

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable

PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms

1 As a set of Delphi C++ Builder components

2 As a standalone Win32 replication tool

We will now present each one of these products

1 CopyCat Delphi C++ Builder Component Set

CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions

Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server

Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table

Below is a concise guide for getting started with the CopyCat component suite

Download and install the evaluation components fromhttpwwwmicrotecfrcopycat

Prepare databases

1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components

2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)

3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo

4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base

5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo

Development area

30

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data

6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority

7) Apply all the generated SQL to all databases that should be replicated

8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)

Replicate

1 In Delphi open the ldquoReplicatorrdquo example project

2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property

3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases

4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)

5 Compile and run the example

6 Press the ldquoReplicate nowrdquo button

2 CopyTiger the CopyCat Win32standalone replication tool

For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER

Features include

bull Easy to use installer

bull Independent server Administrator tool (CTAdmin)

bull Configuration wizard for setting up links to master slave databases

bull Robust replication engine based on Microtec CopyCat

bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections

bull Simple amp intuitive control panel

bullAutomatic email notification on certain events (conflicts PK violations etc)

Visit the CopyTiger homepage to download a time-limited trial version

httpwwwmicrotecfrcopycatct

SUMMARY

In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit

1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications

2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure

CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool

You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr

As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold

Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code

ibd-852-150

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 19: The InterBase and Firebird Developer Magazine, Issue 2, 2005

TestBed

20

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 4

UPDATE1 create small delta version on disk and put record number into UNDO log

This approach is very fast and economical on memory usage The enginedoes not waste too many resources to handle this undo log ndash in fact it reusesthe existing multi-versioning mechanism Resource consumption is merelythe memory used to store the bitmap structure with the backversion num-bers We dont see any significant difference here between a transactionwith the default settings and one with the NO SAVEPOINT option enabled

The second UPDATE

When the second UPDATE statement is performed on the same set of records we have adifferent situation

Here is a good place to note that the example we are considering is the most simple situa-tion where only the one global (transaction level) savepoint exists We will also look at thedifference in the Undo log when an explicit (or enclosed BEGINhellip END) savepoint is used

To preserve on-disk integrity (remember the lsquocareful writersquo principle ) the engine mustcompute a new delta between the old version (by transaction1) and new version (by trans-action2 update2) store it somewhere on disk fetch the current full version (by transac-tion2 update1) put it into the in-memory undo-log replace it with the new full version (withbackpointers set to the newly created delta) and erase the old now superseded delta Asyou can see ndash there is much more work to do both on disk and in memory

The engine could write all intermediate versions to disk but there is no reason to do soThese versions are visible only to the modifying transaction and would not be used unlessa rollback was required

Figure 5

The second UPDATE creates a new delta backversion for transaction 1erases from disk the delta version created by the first UPDATE and copiesthe version from UPDATE1 into the Undo log

This all makes hard work for the memory manager and the CPU as you can see from thegrowth of the ldquomax memrdquordquodelta memrdquo parameters values in the test that uses the defaulttransaction parameters

When NO SAVEPOINT is enabled we avoid the expense of maintaining the Undo log Asa result we see execution time readswrites and memory usage as low for subsequentupdates as for the first

The third UPDATE

The third and all subsequent UPDATEs are similar to the second UPDATE with one excep-tion ndash memory usage does not grow any further

TestBed

21

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 6

The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one

Figure 7

If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint

Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)

Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log

A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know

The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE

Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit

BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log

For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7

In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope

SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping

Development area

22

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article

The sources in the bibliography arelisted in the order of their appear-ance in my mind

The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability

of all the methods taken from thearticles and of course I have testedall my own methods

Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise

I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc

And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle

The problem state-ment

What is the problem

Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common

1Everyone got used to them andfelt comfortable with them

2They are quite fast (if you usethem according to certain knownstandards)

3They use SQL which is an easycomprehensible and time-proveddata manipulation method

4They are based upon a strongmathematical theory

5They are convenient for applica-

tion development - if you developyour applications just like 20-30years ago

As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and

Author Vladimir Kotlyarevskyvladcontekru

Development area

23

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip

As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored

What do we need

The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing

RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task

The OID

All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it

First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have

any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database

OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex

Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)

Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world

And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones

Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles

The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers

ClassId

All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is

Development area

24

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain

Name Description cre-ation_date change_dateowner

Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect

links is often a very complicatedtask to put it mildly

Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later

The OBJECTS Table

Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure

So let us design this table

Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)

The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)

Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method

Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage

There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student

create domain TOID as integer not nullcreate table OBJECTS(

OID TOID primary keyClassId TOIDName varchar(32)

A simple plain dictionary can be retrieved from the OBJECTStable by the following query

select OID Name Description from OBJECTS where ClassId = LookupId

Development area

25

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description

from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)

Later I will demonstrate how to add alittle more intelligence to such simpledictionaries

Storing of more complexobjects

It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping

Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders

create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))

which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to

ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query

select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id

As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did

If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M

create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))

One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present

Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database

Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this

create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))

connected 1M with OBJECTS by OID There is also a table

create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))

which describes type metadata ndash an attribute set (their names and types) for eachobject type

All attributes for particular object where the OID is known are retrieved by the query

select attribute_id value from attributeswhere OID = oid

or with names of attributes

select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id

Development area

26

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries

select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id

Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query

Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the

database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base

The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1

Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage

Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB

It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-

utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation

To be continuedhellip

FastReport 3 - new generation of the reporting tools

Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix

reports support support most popular DB-engines

Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags

(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)

Access to DB fields styles text flow URLs Anchors

httpwwwfast-reportcom

Development area

27

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

Replicating and synchronizingInterBaseFirebirddatabases using CopyCat

AuthorJonathan Neve

jonathanmicrotecfr

wwwmicrotecfr

PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication

Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases

Data logging

Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)

Multi-node replication

Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after

In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it

replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)

When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it

Two-way replication

One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed

The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change

Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself

Primary key synchronization

One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-

Development area

28

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values

Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server

When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique

Conflict management

Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-

sion of the record this is a conflict

CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved

This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently

Difficult database structures

There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node

Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged

To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an

Development area

29

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable

PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms

1 As a set of Delphi C++ Builder components

2 As a standalone Win32 replication tool

We will now present each one of these products

1 CopyCat Delphi C++ Builder Component Set

CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions

Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server

Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table

Below is a concise guide for getting started with the CopyCat component suite

Download and install the evaluation components fromhttpwwwmicrotecfrcopycat

Prepare databases

1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components

2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)

3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo

4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base

5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo

Development area

30

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data

6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority

7) Apply all the generated SQL to all databases that should be replicated

8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)

Replicate

1 In Delphi open the ldquoReplicatorrdquo example project

2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property

3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases

4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)

5 Compile and run the example

6 Press the ldquoReplicate nowrdquo button

2 CopyTiger the CopyCat Win32standalone replication tool

For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER

Features include

bull Easy to use installer

bull Independent server Administrator tool (CTAdmin)

bull Configuration wizard for setting up links to master slave databases

bull Robust replication engine based on Microtec CopyCat

bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections

bull Simple amp intuitive control panel

bullAutomatic email notification on certain events (conflicts PK violations etc)

Visit the CopyTiger homepage to download a time-limited trial version

httpwwwmicrotecfrcopycatct

SUMMARY

In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit

1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications

2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure

CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool

You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr

As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold

Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code

ibd-852-150

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 20: The InterBase and Firebird Developer Magazine, Issue 2, 2005

TestBed

21

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Testing the NO SAVEPOINT

Figure 6

The third UPDATE overwrites the UPDATE1 version in the Undo log with the UPDATE2version and its own version is written to disk as the latest one

Figure 7

If we have an explicit SAVEPOINT each new record version associated with it will have acorresponding backversion in the Undo log of that savepoint

Original design of IB implement second UPDATE another way but sometime after IB6 Bor-land changed original behavior and we see what we see nowBut this theme is for anotherarticle)

Why is the delta of memory usage zero The reason is that beyond the second UPDATEno record version is created From here on the update just replaces record data on diskwith the newest one and shifts the superseded version into the Undo log

A more interesting question is why we see an increase in disk reads and writes during thetest We would have expected that the third and following UPDATEs would do essentiallyequal numbers of read and writes to write the newest versions and move the previous onesto the undo log However we are actually seeing a growing count of writes We have noanswer for it but we would be pleased to know

The following figure (figure 6) helps to illustrate the situation in the Undo log during thesequential updates When NO SAVEPOINT is enabled the only pieces we need to performare replacing the version on disk and updating the original backversion It is fast as the firstUPDATE

Explicit SAVEPOINTWhen an UPDATE statement is going to be performed within its own explicit or implicit

BEGIN END savepoint framework the engine has to store a backversion for each associ-ated record version in the Undo log

For example if we used an explicit savepoint eg SAVEPOINT Savepoint1 upon perform-ing UPDATE2 we would have the situation illustrated in figure 7

In this case the memory consumption would be expected to increase each time an UPDATEoccurs within the explicit savepoints scope

SummaryThe new transaction option NO SAVEPOINT can solve the problem of excessive resourceusage growth that can occur with sequential bulk updates It should be beneficial whenapplied appropriately Because the option can create its own problems by inhibiting theadvance of the OIT it should be used with caution of course The developer will need to takeextra care about database housekeeping particularly with respect to timely sweeping

Development area

22

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article

The sources in the bibliography arelisted in the order of their appear-ance in my mind

The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability

of all the methods taken from thearticles and of course I have testedall my own methods

Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise

I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc

And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle

The problem state-ment

What is the problem

Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common

1Everyone got used to them andfelt comfortable with them

2They are quite fast (if you usethem according to certain knownstandards)

3They use SQL which is an easycomprehensible and time-proveddata manipulation method

4They are based upon a strongmathematical theory

5They are convenient for applica-

tion development - if you developyour applications just like 20-30years ago

As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and

Author Vladimir Kotlyarevskyvladcontekru

Development area

23

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip

As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored

What do we need

The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing

RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task

The OID

All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it

First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have

any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database

OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex

Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)

Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world

And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones

Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles

The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers

ClassId

All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is

Development area

24

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain

Name Description cre-ation_date change_dateowner

Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect

links is often a very complicatedtask to put it mildly

Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later

The OBJECTS Table

Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure

So let us design this table

Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)

The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)

Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method

Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage

There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student

create domain TOID as integer not nullcreate table OBJECTS(

OID TOID primary keyClassId TOIDName varchar(32)

A simple plain dictionary can be retrieved from the OBJECTStable by the following query

select OID Name Description from OBJECTS where ClassId = LookupId

Development area

25

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description

from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)

Later I will demonstrate how to add alittle more intelligence to such simpledictionaries

Storing of more complexobjects

It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping

Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders

create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))

which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to

ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query

select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id

As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did

If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M

create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))

One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present

Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database

Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this

create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))

connected 1M with OBJECTS by OID There is also a table

create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))

which describes type metadata ndash an attribute set (their names and types) for eachobject type

All attributes for particular object where the OID is known are retrieved by the query

select attribute_id value from attributeswhere OID = oid

or with names of attributes

select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id

Development area

26

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries

select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id

Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query

Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the

database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base

The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1

Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage

Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB

It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-

utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation

To be continuedhellip

FastReport 3 - new generation of the reporting tools

Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix

reports support support most popular DB-engines

Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags

(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)

Access to DB fields styles text flow URLs Anchors

httpwwwfast-reportcom

Development area

27

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

Replicating and synchronizingInterBaseFirebirddatabases using CopyCat

AuthorJonathan Neve

jonathanmicrotecfr

wwwmicrotecfr

PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication

Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases

Data logging

Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)

Multi-node replication

Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after

In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it

replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)

When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it

Two-way replication

One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed

The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change

Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself

Primary key synchronization

One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-

Development area

28

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values

Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server

When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique

Conflict management

Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-

sion of the record this is a conflict

CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved

This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently

Difficult database structures

There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node

Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged

To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an

Development area

29

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable

PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms

1 As a set of Delphi C++ Builder components

2 As a standalone Win32 replication tool

We will now present each one of these products

1 CopyCat Delphi C++ Builder Component Set

CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions

Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server

Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table

Below is a concise guide for getting started with the CopyCat component suite

Download and install the evaluation components fromhttpwwwmicrotecfrcopycat

Prepare databases

1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components

2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)

3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo

4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base

5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo

Development area

30

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data

6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority

7) Apply all the generated SQL to all databases that should be replicated

8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)

Replicate

1 In Delphi open the ldquoReplicatorrdquo example project

2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property

3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases

4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)

5 Compile and run the example

6 Press the ldquoReplicate nowrdquo button

2 CopyTiger the CopyCat Win32standalone replication tool

For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER

Features include

bull Easy to use installer

bull Independent server Administrator tool (CTAdmin)

bull Configuration wizard for setting up links to master slave databases

bull Robust replication engine based on Microtec CopyCat

bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections

bull Simple amp intuitive control panel

bullAutomatic email notification on certain events (conflicts PK violations etc)

Visit the CopyTiger homepage to download a time-limited trial version

httpwwwmicrotecfrcopycatct

SUMMARY

In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit

1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications

2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure

CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool

You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr

As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold

Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code

ibd-852-150

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 21: The InterBase and Firebird Developer Magazine, Issue 2, 2005

Development area

22

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

Object-OrientedDevelopment in RDBMSPart 1Thanks and apologies This article is mostly a compilationof methods that are already well-known though many times it turnedout that I on my own have reinvent-ed a well-known and quite goodwheel I have endeavored to pro-vide readers with links to publica-tions I know of that are discussingthe problem However if I missedsomeonersquos work in the bibliogra-phy and thus violated copyrightplease drop me a message atvladcontekru I apologizebeforehand for possible inconven-ience and promise to add any nec-essary information to the article

The sources in the bibliography arelisted in the order of their appear-ance in my mind

The described database structureshave been simplified in order toillustrate the problems under con-sideration as much clearly as possi-ble leaving out more unimportantelements I have tested the viability

of all the methods taken from thearticles and of course I have testedall my own methods

Mixing of object-oriented program-ming and RDBMS use is always acompromise I have endeavored torecommend several approaches inwhich this compromise is minimisedfor both components I have alsotried to describe both advantagesand disadvantages of such a com-promise

I should make it clear that the objectapproach to database design asdescribed is not appropriate forevery task It is still true for the OOPas a whole too no matter whatOOP apologists may say I wouldrecommend using it for such tasks asdocument storage and processingaccounting etc

And the last but not least I am verythankful to Dmitry KuzmenkoAlexander Nevsky and other peo-ple who helped me in writing thisarticle

The problem state-ment

What is the problem

Present-day relational databaseswere developed in times when thesun shone brighter the computerswere slower mathematics was infavour and OOP had yet to see thelight of day Due to that fact mostRDBMSsrsquo have the following char-acteristics in common

1Everyone got used to them andfelt comfortable with them

2They are quite fast (if you usethem according to certain knownstandards)

3They use SQL which is an easycomprehensible and time-proveddata manipulation method

4They are based upon a strongmathematical theory

5They are convenient for applica-

tion development - if you developyour applications just like 20-30years ago

As you see almost all these charac-teristics sound good except proba-bly the last one Today you canhardly find a software product (inalmost any area) consisting of morethan few thousand of lines which iswritten without OOP technologiesOOP languages have been used fora long time for building visualforms ie in UI development It isalso quite usual to apply OOP atthe business logic level if youimplement it either on a middle-tieror on a client But things fall apartwhen the deal comes closer to thedata storage issueshellip During the lastten years there were severalattempts to develop an object-ori-ented database system and as faras I know all those attempts wererather far from being successfulThe characteristics of an OODBMSare the antithesis of those for anRDBMS They are unusual and

Author Vladimir Kotlyarevskyvladcontekru

Development area

23

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip

As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored

What do we need

The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing

RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task

The OID

All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it

First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have

any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database

OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex

Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)

Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world

And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones

Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles

The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers

ClassId

All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is

Development area

24

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain

Name Description cre-ation_date change_dateowner

Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect

links is often a very complicatedtask to put it mildly

Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later

The OBJECTS Table

Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure

So let us design this table

Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)

The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)

Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method

Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage

There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student

create domain TOID as integer not nullcreate table OBJECTS(

OID TOID primary keyClassId TOIDName varchar(32)

A simple plain dictionary can be retrieved from the OBJECTStable by the following query

select OID Name Description from OBJECTS where ClassId = LookupId

Development area

25

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description

from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)

Later I will demonstrate how to add alittle more intelligence to such simpledictionaries

Storing of more complexobjects

It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping

Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders

create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))

which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to

ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query

select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id

As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did

If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M

create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))

One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present

Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database

Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this

create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))

connected 1M with OBJECTS by OID There is also a table

create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))

which describes type metadata ndash an attribute set (their names and types) for eachobject type

All attributes for particular object where the OID is known are retrieved by the query

select attribute_id value from attributeswhere OID = oid

or with names of attributes

select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id

Development area

26

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries

select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id

Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query

Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the

database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base

The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1

Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage

Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB

It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-

utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation

To be continuedhellip

FastReport 3 - new generation of the reporting tools

Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix

reports support support most popular DB-engines

Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags

(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)

Access to DB fields styles text flow URLs Anchors

httpwwwfast-reportcom

Development area

27

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

Replicating and synchronizingInterBaseFirebirddatabases using CopyCat

AuthorJonathan Neve

jonathanmicrotecfr

wwwmicrotecfr

PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication

Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases

Data logging

Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)

Multi-node replication

Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after

In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it

replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)

When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it

Two-way replication

One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed

The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change

Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself

Primary key synchronization

One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-

Development area

28

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values

Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server

When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique

Conflict management

Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-

sion of the record this is a conflict

CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved

This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently

Difficult database structures

There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node

Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged

To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an

Development area

29

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable

PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms

1 As a set of Delphi C++ Builder components

2 As a standalone Win32 replication tool

We will now present each one of these products

1 CopyCat Delphi C++ Builder Component Set

CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions

Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server

Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table

Below is a concise guide for getting started with the CopyCat component suite

Download and install the evaluation components fromhttpwwwmicrotecfrcopycat

Prepare databases

1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components

2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)

3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo

4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base

5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo

Development area

30

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data

6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority

7) Apply all the generated SQL to all databases that should be replicated

8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)

Replicate

1 In Delphi open the ldquoReplicatorrdquo example project

2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property

3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases

4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)

5 Compile and run the example

6 Press the ldquoReplicate nowrdquo button

2 CopyTiger the CopyCat Win32standalone replication tool

For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER

Features include

bull Easy to use installer

bull Independent server Administrator tool (CTAdmin)

bull Configuration wizard for setting up links to master slave databases

bull Robust replication engine based on Microtec CopyCat

bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections

bull Simple amp intuitive control panel

bullAutomatic email notification on certain events (conflicts PK violations etc)

Visit the CopyTiger homepage to download a time-limited trial version

httpwwwmicrotecfrcopycatct

SUMMARY

In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit

1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications

2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure

CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool

You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr

As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold

Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code

ibd-852-150

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 22: The InterBase and Firebird Developer Magazine, Issue 2, 2005

Development area

23

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

slow there are no standards fordata access and no underlyingmathematical theory Perhaps theOOP developer feels more comfort-able with them although I am notsurehellip

As a result everyone continuesusing RDBMS combining object-oriented business logic and domainobjects with relational access to thedatabase where these objects arestored

What do we need

The thing we need is simple ndash todevelop a set of standard methodsthat will help us to simplify theprocess of tailoring the OO-layer ofbusiness logic and a relational stor-age together In other words ourtask is to find out how to storeobjects in a relational databaseand how to implement linksbetween the objects At the sametime we want to keep all the advan-tages provided by the relationaldatabase design and accessspeed flexibility and the power ofrelation processing

RDBMS as an objectstorage First letrsquos develop a database struc-ture that would be suitable foraccomplishing the specified task

The OID

All objects are unique and theymust be easily identifiable That iswhy all the objects stored in thedatabase should have unique ID-keys from a single set (similar toobject pointers in run-time) Theseidentifiers are used to link to anobject to load an object into a run-time environment etc In the [1]article these identifiers are calledOIDs (ie Object IDs) in [2] ndash UINs(Unique Identification Number) orhyperkey Let us call them OIDsthough ldquohyperkeyrdquo is also quite abeautiful word isnrsquot it

First of all I would like to make acouple of points concerning keyuniqueness Database developerswho are used to the classicalapproach to database designwould probably be quite surprisedat the idea that sometimes it makessense to make a table key uniquenot only within a single table (interms of OOP ndash not only within acertain class) but also within thewhole database (all classes) How-ever such strict uniqueness offersimportant advantages which willbecome obvious quite soon More-over it often makes sense to pro-vide complete uniqueness in a Uni-verse which provides considerablebenefits in distributed databasesand replication development At thesame time strict uniqueness of a keywithin the database does not have

any disadvantages Even in thepure relational model it does notmatter whether the surrogate key isunique within a single table or thewhole database

OIDs should never have any realworld meaning In other words thekey should be completely surro-gate I will not list here all the prosand cons of surrogate keys in com-parison with natural ones thosewho are interested can refer to the[4] article The simplest explanationis that everything dealing with thereal world may change (includingthe vehicle engine number networkcard number name passport num-ber social security card numberand even sex

Nobody can change their date ofbirth ndash at least not their de factodate of birth But birth dates are notunique anyway)

Remember the maxim everythingthat can go bad will go bad (con-sequently everything that cannotgo badhellip hum But letrsquos not talkabout such gloomy things )Changes to some OIDs wouldimmediately lead to changes in allidentifiers and links and thus asMr Scott Ambler wrote [1] couldresult in a ldquohuge maintenancenightmarerdquo As for the surrogatekey there is no need to change it atleast in terms of dependency on thechanging world

And what is more nobody requiresrun-time pointers to contain someadditional information about anobject except for a memoryaddress However there are somepeople who vigorously reject usageof surrogates The most brilliantargument against surrogates Irsquoveever heard is that they conflict withthe relational theoryraquo This state-ment is quite arguable since surro-gate keys in some sense are muchcloser to that theory than naturalones

Those who are interested in morestrong evidence supporting the useof OIDs with the characteristicsdescribed above (pure surrogateunique at least within the data-base) should refer to [1] [2] and[4] articles

The simplest method of OID imple-mentation in a relational databaseis a field of ldquointegerrdquo type and afunction for generating unique val-ues of this type In larger or distrib-uted databases it probably makessense to use ldquoint64rdquo or a combina-tion of several integers

ClassId

All objects stored in a databaseshould have a persistent analogueof RTTI which must be immediatelyavailable through the object identi-fier Then if we know the OID of anobject keeping in mind that it is

Development area

24

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain

Name Description cre-ation_date change_dateowner

Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect

links is often a very complicatedtask to put it mildly

Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later

The OBJECTS Table

Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure

So let us design this table

Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)

The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)

Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method

Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage

There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student

create domain TOID as integer not nullcreate table OBJECTS(

OID TOID primary keyClassId TOIDName varchar(32)

A simple plain dictionary can be retrieved from the OBJECTStable by the following query

select OID Name Description from OBJECTS where ClassId = LookupId

Development area

25

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description

from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)

Later I will demonstrate how to add alittle more intelligence to such simpledictionaries

Storing of more complexobjects

It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping

Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders

create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))

which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to

ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query

select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id

As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did

If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M

create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))

One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present

Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database

Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this

create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))

connected 1M with OBJECTS by OID There is also a table

create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))

which describes type metadata ndash an attribute set (their names and types) for eachobject type

All attributes for particular object where the OID is known are retrieved by the query

select attribute_id value from attributeswhere OID = oid

or with names of attributes

select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id

Development area

26

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries

select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id

Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query

Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the

database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base

The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1

Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage

Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB

It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-

utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation

To be continuedhellip

FastReport 3 - new generation of the reporting tools

Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix

reports support support most popular DB-engines

Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags

(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)

Access to DB fields styles text flow URLs Anchors

httpwwwfast-reportcom

Development area

27

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

Replicating and synchronizingInterBaseFirebirddatabases using CopyCat

AuthorJonathan Neve

jonathanmicrotecfr

wwwmicrotecfr

PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication

Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases

Data logging

Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)

Multi-node replication

Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after

In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it

replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)

When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it

Two-way replication

One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed

The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change

Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself

Primary key synchronization

One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-

Development area

28

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values

Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server

When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique

Conflict management

Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-

sion of the record this is a conflict

CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved

This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently

Difficult database structures

There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node

Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged

To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an

Development area

29

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable

PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms

1 As a set of Delphi C++ Builder components

2 As a standalone Win32 replication tool

We will now present each one of these products

1 CopyCat Delphi C++ Builder Component Set

CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions

Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server

Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table

Below is a concise guide for getting started with the CopyCat component suite

Download and install the evaluation components fromhttpwwwmicrotecfrcopycat

Prepare databases

1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components

2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)

3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo

4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base

5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo

Development area

30

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data

6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority

7) Apply all the generated SQL to all databases that should be replicated

8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)

Replicate

1 In Delphi open the ldquoReplicatorrdquo example project

2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property

3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases

4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)

5 Compile and run the example

6 Press the ldquoReplicate nowrdquo button

2 CopyTiger the CopyCat Win32standalone replication tool

For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER

Features include

bull Easy to use installer

bull Independent server Administrator tool (CTAdmin)

bull Configuration wizard for setting up links to master slave databases

bull Robust replication engine based on Microtec CopyCat

bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections

bull Simple amp intuitive control panel

bullAutomatic email notification on certain events (conflicts PK violations etc)

Visit the CopyTiger homepage to download a time-limited trial version

httpwwwmicrotecfrcopycatct

SUMMARY

In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit

1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications

2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure

CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool

You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr

As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold

Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code

ibd-852-150

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 23: The InterBase and Firebird Developer Magazine, Issue 2, 2005

Development area

24

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

unique within the database we canimmediately figure out what typethe object is This is the first advan-tage of OID uniqueness Such anobjective may be accomplished bya ClassId object attribute whichrefers to the known types list whichbasically is a table (let us call itldquoCLASSESrdquo) This table may includeany kind of information ndash from justa simple type name to detailed typemetadata necessary for the appli-cation domain

Name Description cre-ation_date change_dateowner

Imagine a file system where userhas to remember the handles or theinodes of files instead of their file-names It is frequently convenientthough not always necessary tohave an object naming method thatis independent of the object typeFor example each object may havea short name and a long name It isalso sometimes convenient to haveother object attributes for differentpurposes akin to file attributessuch as ldquocreation_daterdquoldquochange_daterdquo and ldquoownerrdquo (theldquoownerrdquo attribute can be a stringcontaining the ownerrsquos name aswell as a link to an object of theldquouserrdquo type) The ldquoDeletedrdquo attrib-ute is also necessary as an indicatorof the unavailability of an objectThe physical deletion of records in adatabase full of direct and indirect

links is often a very complicatedtask to put it mildly

Thus each object has to havemandatory attributes (ldquoOIDrdquo andldquoClassIdrdquo) and desirable attributes(ldquoNamerdquo ldquoDescriptionrdquo ldquocre-ation_daterdquo ldquochange_daterdquo andldquoownerrdquo) Of course when devel-oping an application system youcan always add extra attributesspecific to your particular require-ments This issue will be considereda little later

The OBJECTS Table

Reading this far leads us to the con-clusion that the simplest solution isto store the standard attributes of allobjects in a single table You couldsupport a set of these fixed attrib-utes separately for each type dupli-cating 5 or 6 fields in each tablebut why It is not a duplication ofinformation but it is still a duplica-tion of entities Besides a singletable would allow us to have a cen-tral ldquowell-knownrdquo entry point forsearching of any object and that isone of the most important advan-tages of the described structure

So let us design this table

Description varchar(128)Deleted smallintCreation_date timestamp default CURRENT_TIMESTAMPChange_date timestamp default CURRENT_TIMESTAMP Owner TOID)

The ClassId attribute (the object type identifier) refers to the OBJECTS tablethat is the type description is also an object of a certain type which has forexample well-known ClassId = ndash100 (You need to add a description of theknown OID) Then the list of all persistent object types that are known to thesystem is sampled by a query select OID Name Description fromOBJECTS where ClassId = -100)

Each object stored in our database will have a single record in the OBJECTStable referring to it and hence a corresponding set of attributes Whereverwe see a link to a certain unidentified object this fact about all objectsenables us to find out easily what that object is - using a standard method

Quite often only basic information is needed For instance we need just aname to display in a lookup combobox In this case we do not need any-thing but the OBJECTS table and a standard method of obtaining the nameof an object via the link to it This is the second advantage

There are also types simple lookup dictionaries for example which do nothave any attributes other than those which already are in OBJECTS Usual-ly it is a short name (code) and a full long name that can easily be stored inthe ldquoNamerdquo and ldquoDescriptionrdquo fields of the OBJECTS table Do you remem-ber how many simple dictionaries are in your accounting system It is like-ly that not less than a half Thus you may consider that you have implement-ed half of these dictionaries - that is the third advantage Why should dif-ferent dictionaries (entities) be stored in different tables (relations) whenthey have the same attributes It does not matter that you were told to do sowhen you were a student

create domain TOID as integer not nullcreate table OBJECTS(

OID TOID primary keyClassId TOIDName varchar(32)

A simple plain dictionary can be retrieved from the OBJECTStable by the following query

select OID Name Description from OBJECTS where ClassId = LookupId

Development area

25

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description

from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)

Later I will demonstrate how to add alittle more intelligence to such simpledictionaries

Storing of more complexobjects

It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping

Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders

create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))

which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to

ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query

select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id

As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did

If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M

create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))

One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present

Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database

Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this

create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))

connected 1M with OBJECTS by OID There is also a table

create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))

which describes type metadata ndash an attribute set (their names and types) for eachobject type

All attributes for particular object where the OID is known are retrieved by the query

select attribute_id value from attributeswhere OID = oid

or with names of attributes

select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id

Development area

26

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries

select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id

Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query

Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the

database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base

The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1

Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage

Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB

It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-

utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation

To be continuedhellip

FastReport 3 - new generation of the reporting tools

Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix

reports support support most popular DB-engines

Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags

(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)

Access to DB fields styles text flow URLs Anchors

httpwwwfast-reportcom

Development area

27

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

Replicating and synchronizingInterBaseFirebirddatabases using CopyCat

AuthorJonathan Neve

jonathanmicrotecfr

wwwmicrotecfr

PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication

Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases

Data logging

Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)

Multi-node replication

Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after

In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it

replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)

When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it

Two-way replication

One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed

The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change

Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself

Primary key synchronization

One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-

Development area

28

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values

Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server

When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique

Conflict management

Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-

sion of the record this is a conflict

CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved

This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently

Difficult database structures

There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node

Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged

To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an

Development area

29

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable

PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms

1 As a set of Delphi C++ Builder components

2 As a standalone Win32 replication tool

We will now present each one of these products

1 CopyCat Delphi C++ Builder Component Set

CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions

Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server

Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table

Below is a concise guide for getting started with the CopyCat component suite

Download and install the evaluation components fromhttpwwwmicrotecfrcopycat

Prepare databases

1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components

2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)

3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo

4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base

5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo

Development area

30

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data

6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority

7) Apply all the generated SQL to all databases that should be replicated

8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)

Replicate

1 In Delphi open the ldquoReplicatorrdquo example project

2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property

3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases

4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)

5 Compile and run the example

6 Press the ldquoReplicate nowrdquo button

2 CopyTiger the CopyCat Win32standalone replication tool

For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER

Features include

bull Easy to use installer

bull Independent server Administrator tool (CTAdmin)

bull Configuration wizard for setting up links to master slave databases

bull Robust replication engine based on Microtec CopyCat

bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections

bull Simple amp intuitive control panel

bullAutomatic email notification on certain events (conflicts PK violations etc)

Visit the CopyTiger homepage to download a time-limited trial version

httpwwwmicrotecfrcopycatct

SUMMARY

In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit

1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications

2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure

CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool

You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr

As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold

Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code

ibd-852-150

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 24: The InterBase and Firebird Developer Magazine, Issue 2, 2005

Development area

25

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

if you know ClassId for this dictionaryelement If you only know the typename the query becomes a bit morecomplexselect OID Name Description

from OBJECTS where ClassId = (select OID from OBJECTS where ClassId = -100 and Name = LookupName)

Later I will demonstrate how to add alittle more intelligence to such simpledictionaries

Storing of more complexobjects

It is clear that some objects in realdatabases are more complex thanthose which can be stored in theOBJECTS table The method for stor-ing them depends on the applicationdomain and the objectrsquos internalstructure Letrsquos look at three well-known methods of object-relationalmapping

Method 1 The objects are stored justas in a standard relational databasewith the type attributes mapped totable attributes For example docu-ment objects of the ldquoOrderrdquo typewith such attributes as order num-ber comments customer andorder amount are stored in thetable Orders

create table Orders (OID TOID primary key customer TOID sum_total NUMERIC(152))

which relates one-to-one to the OBJECTS table by theOID field The order number and ldquocommentsrdquo attrib-utes are stored in the ldquoNamerdquo and ldquoDescriptionrdquo fieldsof the OBJECTS table ldquoOrdersrdquo also refers to

ldquoOBJECTSrdquo via the ldquocustomerrdquo field since a customer isalso an object being for example a part of the Part-ners dictionary You can retrieve all attributes ofldquoOrderrdquo type with the following query

select oOID oName as Number oDescription as Comment ordcustomer ordsum_totalfrom Objects o Orders ordwhere oOID = ordOID and ordOID = id

As you see everything is simple and usual You couldalso create a view ldquoorders_viewrdquo and make everythinglook as it always did

If an order has a ldquolinesrdquo section and a real-world ordershould definitely have such a section we can create sep-arate table for it call it eg ldquoorder_linesrdquo and relate itwith the ldquoOrdersrdquo table by a relation 1M

create table order_lines (id integer not null primary keyobject_id TOID reference to order object ndash 1M relation item_id TOID reference to ordered item amount numeric(154)cost numeric(154)sum numeric(154) computed by (amountcost))

One very important advantage of this storage method is that it allows you towork with object sets as you would with normal relational tables (which theyactually are) All the advantages of the relational approach are present

Nevertheless there are two main disadvantages implementation of the system ofobject-relational mapping for this method is less than simple and there are some diffi-culties in the organization of type inheritance This method is described in detail in [1]and [3] These articles also describe the implementation of type inheritance methods ina database

Method 2 (See [5]) All object attributes of any type are stored in the form of arecord set in a single table A simple example would look like this

create table attributes (OID TOID link to the master-object of this attribute attribute_id integer not null value varchar(256) constraint attributes_pk primary key (OID attribute_id))

connected 1M with OBJECTS by OID There is also a table

create table class_attributes (OID TOID here is a link to a description-object of the type attribute_id integer not nullattribute_name varchar(32)attribute_type integerconstraint class_attributes_pk primary key (OIDattribute_id))

which describes type metadata ndash an attribute set (their names and types) for eachobject type

All attributes for particular object where the OID is known are retrieved by the query

select attribute_id value from attributeswhere OID = oid

or with names of attributes

select aattribute_id caattribute_name avaluefrom attributes a class_attributes ca objects owhere aOID = oid and aOID = oOID and oClassId = caOID and aattribute_id = caattribute_id

Development area

26

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries

select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id

Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query

Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the

database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base

The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1

Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage

Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB

It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-

utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation

To be continuedhellip

FastReport 3 - new generation of the reporting tools

Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix

reports support support most popular DB-engines

Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags

(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)

Access to DB fields styles text flow URLs Anchors

httpwwwfast-reportcom

Development area

27

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

Replicating and synchronizingInterBaseFirebirddatabases using CopyCat

AuthorJonathan Neve

jonathanmicrotecfr

wwwmicrotecfr

PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication

Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases

Data logging

Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)

Multi-node replication

Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after

In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it

replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)

When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it

Two-way replication

One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed

The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change

Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself

Primary key synchronization

One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-

Development area

28

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values

Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server

When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique

Conflict management

Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-

sion of the record this is a conflict

CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved

This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently

Difficult database structures

There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node

Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged

To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an

Development area

29

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable

PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms

1 As a set of Delphi C++ Builder components

2 As a standalone Win32 replication tool

We will now present each one of these products

1 CopyCat Delphi C++ Builder Component Set

CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions

Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server

Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table

Below is a concise guide for getting started with the CopyCat component suite

Download and install the evaluation components fromhttpwwwmicrotecfrcopycat

Prepare databases

1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components

2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)

3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo

4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base

5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo

Development area

30

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data

6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority

7) Apply all the generated SQL to all databases that should be replicated

8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)

Replicate

1 In Delphi open the ldquoReplicatorrdquo example project

2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property

3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases

4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)

5 Compile and run the example

6 Press the ldquoReplicate nowrdquo button

2 CopyTiger the CopyCat Win32standalone replication tool

For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER

Features include

bull Easy to use installer

bull Independent server Administrator tool (CTAdmin)

bull Configuration wizard for setting up links to master slave databases

bull Robust replication engine based on Microtec CopyCat

bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections

bull Simple amp intuitive control panel

bullAutomatic email notification on certain events (conflicts PK violations etc)

Visit the CopyTiger homepage to download a time-limited trial version

httpwwwmicrotecfrcopycatct

SUMMARY

In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit

1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications

2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure

CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool

You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr

As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold

Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code

ibd-852-150

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 25: The InterBase and Firebird Developer Magazine, Issue 2, 2005

Development area

26

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

OOD and RDBMS Part 1

In the context of this method youcan also emulate a relational stan-dard Instead of selecting objectattributes in several records (oneattribute per a record) you can getall attributes in a single record byjoining or by using subqueries

select oOID oName as Number oDescription as Commenta1value as customera2value as sum_totalfrom OBJECTS o left join attributes a1 on a1OID = oOID and a1attribute_id = 1left join attributes a2 on a2OID = oOID and a2attribute_id = 2where oOID = id

Clearly the more attributes objecthas the slower the loading processwill be since each attribute requiresan additional join in the query

Returning to the example with theldquoOrderrdquo document we see that theorder number and commentsattributes are still stored in theOBJECTS table but customer andorder amount are stored in twoseparate records of the ldquoattributesrdquotable This approach is described byAnatoliy Tentser in article [5] Itsadvantages are rather important astandardized method of retrievingand storing object attributes easeof extending and changing a typeease in implementing object inheri-tance very simple structure for the

database a significant benefit ben-efit since with this approach thenumber of tables would notincrease no matter how many dif-ferent types were stored in a data-base

The main disadvatage is that thismethod is so different from the stan-dard relational model that manystandard techniques used on rela-tional databases cannot be appliedThe speed of object retrieval is sig-nificantly lower than the previousmethod 1 and it decreases alsowith the growth of the attributecount of a type It is not very suit-able for working with objects usingSQL from inside the database instored procedures for exampleThere is also a certain level of dataredundancy because three fields(ldquoOIDrdquo ldquoattribute_idrdquo andldquovaluerdquo) are applicable to everyattribute instead of just the one fielddescribed in Method 1

Method 3 Everything is stored inBLOB and one of the persistent for-mats is applied ndash a custom formator for example dfm (VCL stream-ing) from the Borland VCL or XMLor anything you like There is noth-ing to comment on here The advan-tages are obvious object retrievallogic is simple and fast no extradatabase structures are necessaryndash just a single BLOB field you canstore any custom objects includingabsolutely unstructured objects(such as MS Word documentsHTML pages etc) The disadvan-tage is also obvious there is nothingrelational about this approach andyou would have to perform all dataprocessing outside of the databaseusing the database only for objectstorage

Editorrsquos note Even more impor-tant it is impossible to query thedatabase for objects with certainattributes as the attribute data isstored in the BLOB

It was not difficult to come to the fol-lowing conclusions (well I know ndasheveryone already knows them )All the three methods described areundoubtedly useful and have a rightto live Moreover it sometimesmakes sense to use all three of themin a single application But whenchoosing among these three meth-ods take into account the listedadvantages and disadvantages ofeach method If your applicationinvolves massive relational dataprocessing like searching or group-ing or it is likely to be added in thefuture using a certain object attrib-

utes it would be better to to use method 1 since it is the closest tothe standard relational model and you will retain all the power ofSQL If on the other hand data processing is not particularly com-plex and data sets are not too large andor you need a simpledatabase structure then it makes sense to use method 2 If a data-base is used only as an object storage and all operations are per-formed with run-time instances of objects without using native data-base tools like SQL (except work with attributes stored in theOBJECTS table) the third method would probably be the bestchoice due to the speed and ease of implementation

To be continuedhellip

FastReport 3 - new generation of the reporting tools

Visual report designer with rulers guides andzooming wizards for base type reports exportfilters to html tiff bmp jpg xls pdf Dot matrix

reports support support most popular DB-engines

Full WYSIWYG text rotation 0360 degreesmemo object supports simple html-tags

(font color b i u sub sup) improved stretching(StretchMode ShiftMode properties)

Access to DB fields styles text flow URLs Anchors

httpwwwfast-reportcom

Development area

27

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

Replicating and synchronizingInterBaseFirebirddatabases using CopyCat

AuthorJonathan Neve

jonathanmicrotecfr

wwwmicrotecfr

PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication

Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases

Data logging

Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)

Multi-node replication

Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after

In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it

replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)

When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it

Two-way replication

One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed

The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change

Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself

Primary key synchronization

One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-

Development area

28

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values

Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server

When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique

Conflict management

Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-

sion of the record this is a conflict

CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved

This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently

Difficult database structures

There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node

Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged

To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an

Development area

29

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable

PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms

1 As a set of Delphi C++ Builder components

2 As a standalone Win32 replication tool

We will now present each one of these products

1 CopyCat Delphi C++ Builder Component Set

CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions

Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server

Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table

Below is a concise guide for getting started with the CopyCat component suite

Download and install the evaluation components fromhttpwwwmicrotecfrcopycat

Prepare databases

1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components

2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)

3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo

4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base

5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo

Development area

30

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data

6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority

7) Apply all the generated SQL to all databases that should be replicated

8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)

Replicate

1 In Delphi open the ldquoReplicatorrdquo example project

2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property

3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases

4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)

5 Compile and run the example

6 Press the ldquoReplicate nowrdquo button

2 CopyTiger the CopyCat Win32standalone replication tool

For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER

Features include

bull Easy to use installer

bull Independent server Administrator tool (CTAdmin)

bull Configuration wizard for setting up links to master slave databases

bull Robust replication engine based on Microtec CopyCat

bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections

bull Simple amp intuitive control panel

bullAutomatic email notification on certain events (conflicts PK violations etc)

Visit the CopyTiger homepage to download a time-limited trial version

httpwwwmicrotecfrcopycatct

SUMMARY

In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit

1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications

2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure

CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool

You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr

As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold

Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code

ibd-852-150

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 26: The InterBase and Firebird Developer Magazine, Issue 2, 2005

Development area

27

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

Replicating and synchronizingInterBaseFirebirddatabases using CopyCat

AuthorJonathan Neve

jonathanmicrotecfr

wwwmicrotecfr

PART 1 BASICS OF DATABASEREPLICATIONA replicator is a tool for keepingseveral databases synchronized(either wholly or in part) on a con-tinuous basis Such a tool can havemany applications it can allow foroff-line local data editing with apunctual synchronization uponreconnection to main database itcan also be used over a slow con-nection as an alternative to a directconnection to the central databaseanother use would be to make anautomatic off-site incrementalbackup by using simple one-wayreplication

Creating a replicator can be quitetricky Lets examine some of thekey design issues involved in data-base replication and explain howthese issues are implemented inMicrotec CopyCat a set of Delphi C++Builder components for per-forming replication between Inter-base and FireBird databases

Data logging

Before anything can be replicatedall changes to each database mustof course be logged CopyCat cre-ates a log table and triggers foreach table that is to be replicatedThese triggers insert into the logtable all the information concerningthe record that was changed (tablename primary key value(s) etc)

Multi-node replication

Replicating to and from severalnodes adds another degree of com-plexity Every change that is madeto one database must be applied toall the others Furthermore whenone database applies this change itmust indicate to the originatingdatabase that the change has beenapplied without in any way hinder-ing the other databases from repli-cating the same change eitherbefore simultaneously or after

In CopyCat these problems aresolved using a simple and flexiblesystem Each replication node canbe have one parent node and sev-eral sub-nodes towards which it

replicates its changes Each nodeslist of sub-nodes is stored in a tablein the nodes database (Incidental-ly the parent node is configured inthe replicator software itself ratherthan in the database and thereforeno software is needed on nodeshaving no parent ndash which allowsthese servers to run Linux or anyother OS supported byInterbaseFireBird)

When a data change occurs in areplicated table one line is gener-ated per sub-node Thus each sub-node fetches only the log lines thatconcern it

Two-way replication

One obvious difficulty involved in two-way replication is how toavoid changes that have been replicated to one database fromreplicating back to the original database Since all the changes tothe database are logged the changes made by the replicator arealso logged and will therefore bounce back and forth between thesource and the target databases How can this problem be avoid-ed

The solution CopyCat uses is related to the sub-node managementsystem described above Each sub-node is assigned a name whichis used when the sub-node logs in to the database When a sub-node replicates its own changes to its parent the replication triggerslog the change for all the nodes sub-nodes except the current userThus only sub-nodes other than the originator receive the change

Conversely CopyCat logs in to the nodes local database using thenode name of its parent as user name Thus any change made tothe local database during replication will be logged for all sub-nodes other than the nodes parent and any change made to theparent node will be logged to other sub-nodes but not to the origi-nating node itself

Primary key synchronization

One problem with replication is that since data is edited off-linethere is no centralized way to ensure that the value of a fieldremains unique One common answer to this problem is to useGUID values This is a good solution if youre implementing a newdatabase (except that GUID fields are rather large and thereforenot very well suited for a primary key field) but if you have an exist-

Development area

28

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values

Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server

When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique

Conflict management

Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-

sion of the record this is a conflict

CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved

This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently

Difficult database structures

There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node

Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged

To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an

Development area

29

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable

PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms

1 As a set of Delphi C++ Builder components

2 As a standalone Win32 replication tool

We will now present each one of these products

1 CopyCat Delphi C++ Builder Component Set

CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions

Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server

Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table

Below is a concise guide for getting started with the CopyCat component suite

Download and install the evaluation components fromhttpwwwmicrotecfrcopycat

Prepare databases

1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components

2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)

3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo

4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base

5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo

Development area

30

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data

6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority

7) Apply all the generated SQL to all databases that should be replicated

8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)

Replicate

1 In Delphi open the ldquoReplicatorrdquo example project

2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property

3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases

4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)

5 Compile and run the example

6 Press the ldquoReplicate nowrdquo button

2 CopyTiger the CopyCat Win32standalone replication tool

For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER

Features include

bull Easy to use installer

bull Independent server Administrator tool (CTAdmin)

bull Configuration wizard for setting up links to master slave databases

bull Robust replication engine based on Microtec CopyCat

bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections

bull Simple amp intuitive control panel

bullAutomatic email notification on certain events (conflicts PK violations etc)

Visit the CopyTiger homepage to download a time-limited trial version

httpwwwmicrotecfrcopycatct

SUMMARY

In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit

1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications

2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure

CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool

You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr

As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold

Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code

ibd-852-150

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 27: The InterBase and Firebird Developer Magazine, Issue 2, 2005

Development area

28

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

ing database that needs replicationit would be very difficult to replaceall primary or unique key fields byGUID values

Since GUID fields are in manycases not feasible CopyCat imple-ments another solution CopyCatallows you to define for each pri-mary key field (as well as up tothree other fields for which unicity isto be maintained) a synchroniza-tion method In most cases this willbe either a generator or a storedprocedure call though it could beany valid SQL clause Upon repli-cation this SQL statement is calledon the server side in order to calcu-late a unique key value and theresulting value is then applied to thelocal database Only after the keyvalues (if any) have been changedlocally is the record replicated tothe server

When replicating from the parentnode to the local node howeverthis behaviour does not take placethe primary key values on the serv-er are considered to be unique

Conflict management

Suppose a replication node and itsparent both modify the same recordduring the same time period Whenthe replicator connects to its parentto replicate its changes it has noway of telling which of the twonodes has the most up-to-date ver-

sion of the record this is a conflict

CopyCat automatically detects con-flicts logs them to a dedicatedtable and disables replication ofthat record in either direction untilthe conflict is resolved The conflictstable holds the user names of bothnodes involved in the conflict aswell as a field calledldquoCHOSEN_USERrdquo In order tosolve the conflict the user simplyhas to put in this field the name ofthe node which has the correct ver-sion of the record and automatical-ly upon the next replication therecord will be replicated and theconflict resolved

This system was carefully designedto function correctly even in some ofthe complex scenarios that are pos-sible with CopyCat For instancethe conflict may in reality bebetween two nodes that are notdirectly connected to each othersince CopyCat nodes only evercommunicate directly with their par-ent there is no way to tell if anothernode may not have a conflictingupdate for a certain record Fur-thermore its entirely possible thattwo nodes (having the same parent)should simultaneously attempt toreplicate the same record to theirparent By using a snapshot-typetransaction and careful ordering ofthe replication process these issuesare handled transparently

Difficult database structures

There are certain database archi-tectures that are difficult to repli-cate Consider for example aldquoSTOCKrdquo table containing one lineper product and a field holding thecurrent stock value Suppose thatfor a certain product the currentstock value being 45 node A adds1 item to stock setting the stockvalue to 46 Simultaneously nodeB adds 2 items to stock thereby set-ting the current stock value to 47How can such a table then be repli-cated Neither A nor B have thecorrect value for the field since nei-ther take into consideration thechanges from the other node

Most replicators would require suchan architecture to be alteredInstead of having one record holdthe current stock value of productthere could be one line per changeThis would solve the problem How-ever restructuring large databases(and the end-user applications thatusually go with them) could be arather major task CopyCat wasspecifically designed to avoid theseproblems altogether rather thanrequire the database structure to bechanged

To solve this kind of problem Copy-Cat introduces stored procedureldquoreplicationrdquo That is a mechanismfor logging stored procedure callsand replicating them to othernodes When dealing with an

Development area

29

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable

PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms

1 As a set of Delphi C++ Builder components

2 As a standalone Win32 replication tool

We will now present each one of these products

1 CopyCat Delphi C++ Builder Component Set

CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions

Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server

Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table

Below is a concise guide for getting started with the CopyCat component suite

Download and install the evaluation components fromhttpwwwmicrotecfrcopycat

Prepare databases

1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components

2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)

3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo

4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base

5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo

Development area

30

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data

6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority

7) Apply all the generated SQL to all databases that should be replicated

8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)

Replicate

1 In Delphi open the ldquoReplicatorrdquo example project

2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property

3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases

4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)

5 Compile and run the example

6 Press the ldquoReplicate nowrdquo button

2 CopyTiger the CopyCat Win32standalone replication tool

For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER

Features include

bull Easy to use installer

bull Independent server Administrator tool (CTAdmin)

bull Configuration wizard for setting up links to master slave databases

bull Robust replication engine based on Microtec CopyCat

bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections

bull Simple amp intuitive control panel

bullAutomatic email notification on certain events (conflicts PK violations etc)

Visit the CopyTiger homepage to download a time-limited trial version

httpwwwmicrotecfrcopycatct

SUMMARY

In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit

1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications

2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure

CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool

You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr

As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold

Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code

ibd-852-150

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 28: The InterBase and Firebird Developer Magazine, Issue 2, 2005

Development area

29

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

unreplicatable table (like in the example above) one solution is to make astored procedure which can be called for updating the table and usingstored procedure replication in order to replicate each of these calls Thuscontinuing the example above instead of replicating the values of theSTOCK table the nodes would replicate only their changes to these valuesthereby correctly synchronizing and merging the changes to the STOCKtable

PART 2GETTING STARTED WITH COPYCATCopycat is available in two distinct forms

1 As a set of Delphi C++ Builder components

2 As a standalone Win32 replication tool

We will now present each one of these products

1 CopyCat Delphi C++ Builder Component Set

CopyCat can be obtained as a set of Delphi C++ Builder componentsenabling you to use replication features painlessly and in many different sit-uations and sparing you the tedious task of writing and testing custom data-base synchronization solutions

Using these components replication or synchronization facilities can beseamlessly built into existing solutions As an example we have used theCopyCat components in an application used by our development team ontheir laptops to keep track of programming tasks When the developersneed to go on-site to visit a customer the application runs in local mode onthe laptop When they return and connect up to the company network anychanges they have made on-site are automatically synchronized with themain Interbase database running on a Linux server

Many more applications are possible since the CopyCat components arevery flexible and allow for synchronization of even a single table

Below is a concise guide for getting started with the CopyCat component suite

Download and install the evaluation components fromhttpwwwmicrotecfrcopycat

Prepare databases

1) Open Delphi and compile the data provider package(s) for the DACthat you plan to use (currently IBX and FIBPlus are supported) These arecomponents for interfacing between the CopyCat components and theunderlying data-access components

2) Open and run the ldquoConfigurationrdquo example project (requires the IBXprovider)

3) On the ldquoGeneralrdquo tab fill in the connection parameters and press ldquoCon-nect to Databaserdquo

4) On the ldquoUsersrdquo tab provide the list of sub-nodes for the current data-base

5) On the ldquoTablesrdquo tab for each table that you want to replicate set a pri-ority (relative to the other tables) and double-click on the ldquoPKn generatorrdquo

Development area

30

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data

6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority

7) Apply all the generated SQL to all databases that should be replicated

8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)

Replicate

1 In Delphi open the ldquoReplicatorrdquo example project

2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property

3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases

4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)

5 Compile and run the example

6 Press the ldquoReplicate nowrdquo button

2 CopyTiger the CopyCat Win32standalone replication tool

For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER

Features include

bull Easy to use installer

bull Independent server Administrator tool (CTAdmin)

bull Configuration wizard for setting up links to master slave databases

bull Robust replication engine based on Microtec CopyCat

bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections

bull Simple amp intuitive control panel

bullAutomatic email notification on certain events (conflicts PK violations etc)

Visit the CopyTiger homepage to download a time-limited trial version

httpwwwmicrotecfrcopycatct

SUMMARY

In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit

1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications

2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure

CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool

You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr

As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold

Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code

ibd-852-150

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 29: The InterBase and Firebird Developer Magazine, Issue 2, 2005

Development area

30

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Replicating and synchronizing

columns to (optionally) fill the primary key synchronization method Oncethese settings have been made set the ldquoCreatedrdquo field to Y so as to gen-erate the meta-data

6) On the ldquoProceduresrdquo tab set ldquoCreatedrdquo to Y for each procedure thatyou want to replicate after having set a priority

7) Apply all the generated SQL to all databases that should be replicated

8) For each database to be replicated set the list of sub-nodes (in theRPL$USERS table)

Replicate

1 In Delphi open the ldquoReplicatorrdquo example project

2 Drop a provider component on the form and hook it up to the TCcRepli-cators DBProvider property

3 Setup the LocalDB and RemoteDB properties of the TCcReplicator withthe connection parameters for the local and remote databases

4 Fill in the user name of the local andremote nodes as well as the SYSDBAuser name and password (needed forprimary key synchronization)

5 Compile and run the example

6 Press the ldquoReplicate nowrdquo button

2 CopyTiger the CopyCat Win32standalone replication tool

For those who have replication or synchronization needs but do not havethe time or resources to develop their own integrated system Microtec pro-poses a standalone database replication tool based on the CopyCat tech-nology called COPYTIGER

Features include

bull Easy to use installer

bull Independent server Administrator tool (CTAdmin)

bull Configuration wizard for setting up links to master slave databases

bull Robust replication engine based on Microtec CopyCat

bull Fault-tolerant connection mechanism allowing for automatic resumption of lost database connections

bull Simple amp intuitive control panel

bullAutomatic email notification on certain events (conflicts PK violations etc)

Visit the CopyTiger homepage to download a time-limited trial version

httpwwwmicrotecfrcopycatct

SUMMARY

In todays connected world database replication and synchronization are topics of great interest amongall industry professionals With the advent of Microtec CopyCat the Interbase Firebird community isobtaining a two-fold benefit

1 By encapsulating all the functionality of a replicator into Delphi components CopyCat makes it easierthan ever to integrate replication and synchronization facilities into custom applications

2 By providing a standalone tool for the replication of Interbase Firebird databases Microtec is respond-ing to another great need in the community ndash that of having a powerful and easy-to-use replication tooland one that can be connected to an existing database without disrupting its current structure

CopyCat is being actively developed by Microtec and many new features are being worked on such assupport for replicating between heterogeneous database types (PostgreSQL Oracle MSSQL MySQLNexusDB ) as well as a Linux Kylix version for the components and the standalone tool

You can find more information about CopyCat athttpwwwmicrotecfrcopycat or by contacting us at copycatmicrotecfr

As as promotional operation Microtec is offering a 20 discount to IBDeveloper readers for the pur-chase of any CopyCat product for the first 100 licenses sold

Click on the BUY NOW icon on our web site and enter the followingShareIt coupon code

ibd-852-150

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 30: The InterBase and Firebird Developer Magazine, Issue 2, 2005

Thankfully prior to working withInterBase I was interested in differ-ent data structures how they arestored and what algorithms theyuse This helped me to interpret theoutput of gstat At that time I decid-ed to write a tool that could analyzegstat output to help in tuning thedatabase or at least to identify thecause of performance problems

Long story but the outcome wasthat IBAnalyst was created In spiteof my experience it still allows me tofind very interesting things or per-formance issues in different data-bases

Real systems have runtime perform-ance that fluctuates like a wave Theamplitude of such lsquowavesrsquo can below or high so you can see howperformance differs from day today (or hour by hour) Actual per-formance depends on many factorsincluding the design of the applica-tion server configuration transac-tion concurrency version garbagein the database and so on To findout what is happening in a data-base (both positive and negativeaspects of performance) you shouldat the very least take a peek at the

Development area

31

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

Understanding YourDatabase with IBAnalystI have been working with InterBasesince 1994 Back then most data-bases were small and did notrequire any tuning Of course therewere occasions when I had tochange ibconfig on a server andreconfigured hardware or the OSbut that was almost all I could do totune performance

Four years ago our companybegan to provide technical supportand training of InterBase usersWorking with many productiondatabases also taught me a lot ofdifferent things However most ofwhat I learned concerned applica-tions ndash transaction parametersusage optimizing queries andresult sets

Of course I had known for quitesome time about gstat ndash the tool thatgives database statistics informa-tion If you have ever looked ingstat output or read opguidepdfabout it you would know that thestatistical output looks like just abunch of numbers and nothing elseOk you can discover fragmentationinformation for a particular table orindex but what other useful infor-mation can be obtained

database statistics from time to time

Lets take a look at the capabilitiesof IBAnalyst IBAnalyst can takestatistics from gstat or the ServicesAPI and compile them into a reportgiving you complete informationabout the database its tables and

indices It has in-place warningswhich are available during thebrowsing of statistics it alsoincludes hint comments and recom-mendation reports

The summary shown in Figure 1provides general information about

Figure 1 Summary of database statistics

Author Dmitri Kouzmenkokdvib-aidcom

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 31: The InterBase and Firebird Developer Magazine, Issue 2, 2005

1- InterBase 75 and Firebird 15 have special features that can periodi-cally flush unsaved pages if Forced Writes is Off2 - Oldest transaction is the same Oldest interesting transaction mentionedeverywhere Gstat output does not show this transaction as interesting3 - Really it is the oldest transaction that was active when the oldest trans-action currently active started This is because only new transaction startmoves Oldest active forward In the production systems with regulartransactions it can be considered as currently oldest active transaction

Development area

32

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

your database The warnings orcomments shown are based oncarefully gathered knowledgeobtained from a large number ofreal-world production databases

Note All figures in this article con-tain gstat statistics which were takenfrom a real-world production data-base (with the permission of its own-ers)

As I said before raw database sta-tistics look cryptic and are hard tointerpret IBAnalyst highlights anypotential problems clearly in yellowor red and the detail of the problemcan be read simply by placing thecursor over the relevant entry andreading the hint that is displayed

What can we discover from theabove figure This is a dialect 3database with a page size of 4096bytes Six to eight years ago devel-opers used a default page size of1024 bytes but in more recenttimes such a small page size couldlead to many performance prob-lems Since this database has apage size of 4k there is no warningdisplayed as this page size is okay

Next we can see that the ForcedWrite parameter is set to OFF andmarked red InterBase 4x and 5xby default had this parameter ONForced Writes itself is a write cachemethod when ON it writeschanged data immediately to diskbut OFF means that writes will be

stored for unknown time by theoperating system in its file cacheInterBase 6 creates databases withForced Writes OFF

Why is this marked in red on theIBAnalyst report The answer is sim-ple ndash using asynchronous writes cancause database corruption in casesof power OS or server failure

Tip It is interesting that modernHDD interfaces (ATA SATA SCSI)do not show any major difference inperformance with Forced Write setOn or Off1

Next on the report is the mysterioussweep interval If positive it setsthe size of the gap between the old-est2 and oldest snapshot transactionat which the engine is alerted to theneed to start an automatic garbagecollection On some systems hittingthis threshold will cause a suddenperformance loss effect and as aresult it is sometimes recommendedthat the sweep interval be set to 0(disabling automatic sweepingentirely) Here the sweep interval ismarked yellow because the valueof the sweep gap is negativewhich it can be in InterBase 60Firebird and Yaffil statistics but not inInterBase 7x When the value of thesweep gap is greater than thesweep interval (if the sweep intervalis not 0) the report entry for thesweep interval will be marked redwith an appropriate hint

We will examine the next 8 rows asa group as they all display aspectsof the transaction state of the data-base

bull The Oldest transaction is theoldest non-committed transactionAny lower transaction numbers arefor committed transactions and norecord versions are available forsuch transactions Transaction num-bers higher than the oldest transac-tion are for transactions that can bein any state This is also called theoldest interesting transactionbecause it freezes when a transac-tion is ended with rollback andserver can not undo its changes atthat moment

bull The Oldest snapshot ndash the old-est active (ie not yet committed)transaction that existed at the startof the transaction that is currentlythe Oldest Active transactionIndicates lowest snapshot transac-tion number that is interested inrecord versions

bull The Oldest active3 ndash the oldestcurrently active transaction

bull The Next transaction ndash thetransaction number that will beassigned to new transaction

bull Active transactions ndash IBAnalystwill give a warning if the oldestactive transaction number is30 lower than the daily transac-tions count The statistics do not tell

if there any other active transactions between oldest active andnext transaction but there can be such transactions Usually ifthe oldest active gets stuck there are two possible causes a) thatsome transaction is active for a long time or b) the applicationdesign allows transactions to run for a long time Both causes pre-vent garbage collection and consume server resources

bull Transactions per day ndash this is calculated from Next transactiondivided by the number of days passed since the creation of thedatabase to the point where the statistics are retrieved This can becorrect only for production databases or for databases that areperiodically restored from backup causing transaction numberingto be reset

As you have already learned if there any warnings they are shownas colored lines with clear descriptive hints on how to fix or preventthe problem

It should be noted that database statistics are not always useful Sta-tistics that are gathered during work and housekeeping operationscan be meaningless

Do not gather statistics if you

bull Just restored your database

bull Performed backup (gbak ndashb dbgdb) without the ndashg switch

bull Recently performed a manual sweep (gfix ndashsweep)

Statistics you get on such occasions will be practically useless It isalso correct that during normal work there can be times where data-base is in perfect state for example when applications make lessdatabase load than usual (users are at lunch or its a quiet time in thebusiness day)

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 32: The InterBase and Firebird Developer Magazine, Issue 2, 2005

Development area

33

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

the main reason for performance degradation For some tablesthere can be a lot of versions that are still in use The server cannot decide whether they really are in use because active transac-tions potentially need any of these versions Accordingly the serv-er does not consider these versions as garbage and it takes longerand longer to construct a correct record from the big chain of ver-sions whenever a transaction happens to read it In Figure 2 youcan see two tables that have the versions count three times higherthan the record count Using this information you can also checkwhether the fact that your applications are updating these tables sofrequently is by design or because of coding mistake or an appli-cation design flaw

The Index viewIIndices are used by database engine to enforce primary key for-eign key and unique constraints They also speed up the retrieval ofdata Unique indices are the best for retrieving data but the level ofbenefit from non-unique indices depends on the diversity of theindexed data

For example look at ADDR_ADDRESS_IDX6 First of all the index

How to seize the momentwhen there is somethingwrong with the databaseYour applications can be designedso well that they will always workwith transactions and data correct-ly not making sweep gaps notaccumulating a lot of active trans-actions not keeping long runningsnapshots and so on Usually it doesnot happen (sorry colleagues)

The most common reason is thatdevelopers test their applicationsrunning only two or three simultane-ous users When the application isthen used in a production environ-ment with fifteen or more simultane-ous users the database can behaveunpredictably Of course multi-usermode can work okay because most

multi-user conflicts can be testedwith two or three concurrently run-ning applications However withlarger numbers of users garbagecollection problems can arise Suchpotential problems can be caught ifyou gather database statistics at thecorrect moments

Table informationLets take look at another sampleoutput from IBAnalyst

The IBAnalyst Table statistics view isalso very useful It can show whichtables have a lot of record versionswhere a large number ofupdatesdeletes were made frag-mented tables with fragmentationcaused by updatedelete or byblobs and so on You can see whichtables are being updated frequent-Figure 2 Table statistics

ly and what the table size is inmegabytes Most of these warningsare customizable

In this database example there areseveral activities First of all yellowcolor in the VerLen column warnsthat space taken by record versionsis larger than that occupied by therecords themselves This can resultfrom updating a lot of fields in arecord or by bulk deletes See therows in which MaxVers column ismarked in blue This shows that onlyone version per record is stored andconsequently this is caused by bulkdeletes So both indications can tellthat this is really bulk deletes andnumber in Versions column is closeto number of deleted records

Long-living active transactions pre-vent garbage collection and this is name itself tells that it was created manually If statistics

were taken by the Services API with metadata info youcan see what columns are indexed (in IBAnalyst 183and greater) For the index under examination you cansee that it has 34999 keys TotalDup is 34995 andMaxDup is 25056 Both duplicate columns are markedin red This is because there are only 4 unique key val-ues amongst all the keys in this index as can be seenfrom the Uniques column Furthermore the greatestduplicate chain (key pointing to records with the samecolumn value) is 25056 ndash ie almost all keys store oneof four unique values As a result this index could

bull Reduce the speed of the restore process Okay thir-ty-five thousand keys is not a big deal for modern data-bases and hardware but the impact should be notedanyway

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 33: The InterBase and Firebird Developer Magazine, Issue 2, 2005

Development area

34

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Using IBAnalyst

bull Slow down garbage collection Indices with a low count of uniquevalues can impede garbage collection by up to ten times in compar-ison with a completely unique index This problem has been solvedin InterBase 7175 and Firebird 20

bull Produce unnecessary page reads when the optimizer reads theindex It depends on the value being searched in a particular query- searching by an index that has a larger value for MaxDup will beslower Searching by value that has fewer duplicate values will befaster but only you know what data is stored in that indexed column

That is why IBAnalyst draws your attention to such indices markingthem red and yellow and including them in the Recommendationsreport Unfortunately most of the ldquobad indices are automaticallycreated to enforce foreign-key constraints In some cases this prob-lem can be solved by preventing using triggers deletes or updatesof primary key in lookup tables But if it is not possible to implementsuch changes IBAnalyst will show you bad indices on ForeignKeys every time you view statistics

Reports

There is no need to look through the entire report each time spotting cell color and reading hints for new warningsMore direct and detailed information can be had by using the Recommendations feature of IBAnalyst Just load thestatistics and go to the ReportsView Recommendations menu This report provides a step-by-step analysis includ-ing more detailed descriptive warnings about forced writes sweep interval database activity transaction statedatabase page size sweeping transaction inventory pages fragmented tables tables with lot of record versionsmassive deletesupdates deep indices optimizer-unfriendly indices useless indices and even empty tables All ofthis information and the accompanying suggestions are dynamically created based on the statistics being loaded

As an example of the report output letrsquos have a look at a report generated for the database statistics you sawearlier in this article Overall size of transaction inventory pages (TIP) is big - 94 kilobytes or 23 pagesRead_committed transaction uses global TIP but snapshot transactions make own copies of TIP in memory Big TIPsize can slowdown performance Try to run sweep manually (gfix -sweep) to decrease TIP size

Here is another quote from tableindices part of report Versioned tables count 8 Large amount of record versionsusually slowdown performance If there are a lot of record versions in table than garbage collection does not workor records are not being read by any select statement You can try select count() on that tables to enforce garbagecollection but this can take long time (if there are lot of versions and non-unique indices exist) and can be unsuccess-

Table Records Versions RecVers sizeCLIENTS_PR 3388 10944 92DICT_PRICE 30 1992 45DOCS 9 2225 64N_PART 13835 72594 83REGISTR_NC 241 4085 56SKL_NC 1640 7736 170STAT_QUICK 17649 85062 110UO_LOCK 283 8490 144

ful if there is at least one transaction interested in these versions Hereis the list of tables with versionrecord ratio greater than 3

SummaryIBAnalyst is an invaluable tool that assists a user in performingdetailed analysis of Firebird or InterBase database statistics andidentifying possible problems with a database in terms of perform-ance maintenance and how an application interacts with the data-base It takes cryptic database statistics and displays them in aneasy-to-understand graphical manner and will automatically makesensible suggestions about improving database performance andeasing database maintenance

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 34: The InterBase and Firebird Developer Magazine, Issue 2, 2005

Readers feedback

35

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Comments to ldquoTemporary tablesrdquo

One thing Id like to ask you tochange is re temp tables I suggestyou even create another myth boxfor it It is the sentence

Surely most often temporarytables were necessary to thosedevelopers who had been workingwith MS SQL before they started touse InterBaseFirebird

This myth does not want to dieTemporary tables (TT) are not ameans for underskilled DB kidswho cannot write any complex SQLstatement They are the means ofdealing with data that is structural-ly dynamic but still needs to beprocessed like data with a fixedstructure So all OLAP systemsbased on RDBMS are heavilydependent on this feature - or if it isnot present it requires a whole lotof unnecessary and complicatedworkarounds Im talking out ofexperience

Then there are situations where theoptimizer simply loses the plot

because of the complexity of astatement If developers have a fall-back method to reduce complexityin those cases thats an advantageMuch better than asking developersto supply their own query plans

Also serious RDBMS like Informixhad them at least 15 years agowhen MSs database expertise didnot go further than MS Access Cer-tainly those MS database develop-ers who need TTs to be able to dotheir job would not have managedto deal with an Informix server ifcomplexity was their main problem

The two preceding paragraphswere about local temp tables Thereare also global temporary tables(GTTs) A point in favour of GTTs isthat one can give users their ownworkspace within a database with-out having to setup some clumsyadministration for it No user idscattered around in tables wherethey dont belong no explicitcleanup no demanding role man-agement Just setup tables as temp

and from then on it is transparent tousersapplications that the datainside is usersession-specific Webapplications would be a goodexample

The argument reminds me a bit ofMySQL reasoning when it comes tofeatures their RDBMS doesdidnot have Transactions ForeignKeys Triggers etc were all badand unnecessary because they didnot have them (officially they slowdown the whole system and intro-duce dependencies) Of coursethey do To call a flat file system aRDBMS is obviously good for mar-keting Now they are putting in allthose essential database featureswhich were declared crap by themnot long ago And you can seealready that their marketing nowtells us how important those fea-tures are I bet we wont see MySQLbenchmarks for a while -)

We should not make a similar mis-take Temporary tables are impor-tant if the nature of a system is

dynamic either re usersessiondata isolation or re data where thestructure is unknown in advancebut needs to be processed like DBdata with a fixed structure ThatFirebird does not have them isplainly a lack of an important fea-ture in the same category as crossDB operations (only through qliwhich means unusable) Both fea-tures could make Firebird muchmore suitable as the basis for OLAPsystems an area where Firebird islacking considerably

Well to be fair Firebird developersare working on both topics

Volker Rehn

volkerrehnbigpondcom

We received a lot of feedback emails for article ldquoWorking with temporary tables in InterBase 75rdquo which was published in issue 1The one of them is impressed me and with permission of its respective owner Irsquod like to publish it

Alexey Kovyazin Chief Editor

Readers feedback

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations

Page 35: The InterBase and Firebird Developer Magazine, Issue 2, 2005

Miscellaneous

36

The InterBase and Firebird Developer Magazine 2005 ISSUE 2

wwwibdevelopercomcopy2005 wwwibdevelopercom All right reserved

Invitations SubscriptionDonations Sponsorships

This is the first officialbook on Firebird mdash thefree independentopen source relationaldatabase server thatemerged in 2000

Based on the actualFirebird Project thisbook will provide youall you need to knowabout Firebird data-base development likeinstallation multi-plat-form configurationSQL language inter-faces and mainte-nance

Subscribe now

To receive future issuesnotifications send email to

subscribeibdevelopercom

MagazineCD

Donations